How would I create a formula which returns the Top 5 most frequent words in a column?
I have been tasked with creating metrics which show the current 'Top 5' most frequent countries in a column. I am struggling to create a formula which returns the correct value as new countries will be added to the list in future so COUNTIF would not work. So imagine the column read as follows:
Germany
France
France
Germany
Germany
UK
UK
India
France
India
France
Japan
The formula would return the following
- France
- Germany
- UK
- India
- Japan
Answers
-
What if there is a tie? For instance, what happens if the next entry is Germany?
Would it still go:
- France
- Germany
- UK
- India
- Japan
or
- France / Germany
- UK
- India
- Japan
If the first, how would you determine whether France or Germany came first, or does it not matter?
-
Ties wouldn't matter too much! Listing them side by side would be fine
-
Ok. In that case I would use something like this:
On the main sheet I would have a column that houses the count on each row for whatever country is listed.
=COUNTIFS(Country:Country, @cell = Country@row)
Then in a second sheet (Metrics Sheet) I would manually enter the numbers 1 through 5 in a column (called "Rank" in this example). Then to get the list I would use:
=JOIN(DISTINCT(COLLECT({Main Sheet Country Column}, {Main Sheet Count Column}, @cell = LARGE({Main Sheet Count Column}, Rank@row))), " / ")
-
Hi Paul,
The COUNTIF in the main sheet works perfectly, thank you. However, I'm getting a #INVALID DATA TYPE error in my Metric sheet, formula is on the screenshot attached if you wouldn't mind taking a look?
"Property List - Opportunities Range 3" - is the Country column
"Property List - Opportunities Range 4" - is the Country Count column
-
Try removing the # from the rank column. If you need that there for labels and whatnot, then you would need to adjust the LARGE portion to
=JOIN(DISTINCT(COLLECT(..............................................................................., @cell = LARGE({Range}, VALUE(SUBSTITUTE([Column3]@row, "#", ""))))), " / ")
-
That worked, the # isn't necessary so I removed it.
It's returning the same country for each rank and repeating it. Would I then need to adjust the LARGE part again? to get ranks 2 -5?
-
That's my fault. Try this:
=JOIN(DISTINCT(COLLECT(..............................................................................., @cell = LARGE(DISTINCT({Range}), VALUE(SUBSTITUTE([Column3]@row, "#", ""))))), " / ")
-
Hi Paul,
Thank you so much, that worked! It seems to count Spain and Spain (capitalised) as two separate entries so I will format them the same in the sheet to fix this issue!
-
Happy to help. 👍️
Yes. Different formats will be counted as different entries. If it is manual entry and you are concerned that people will continue to use different formats, you can create a hidden helper column on the main sheet that reformats all entries. The easiest would be all caps or all lower case using an UPPER or LOWER function.
My personal preference would be to use a Dropdown to allow a selection from the consistently formatted countries.
-
That's great Paul - thanks again for your help. Let's say I wanted to also provide the count of the Top 5 alongside the country. I would do a =COUNTIF([Opportunities Country Range], [Column3]@row. But if there's a tie then the formula wouldn't work, would I need to wrap it in a CONTAINS to ensure it works every time?
-
You should be able to get away with just using the LARGE function.
=LARGE(DISTINCT({Count Column}), [Column3]@row)
-
Hey @Paul Newcome,
Any thoughts on why I am getting an invalid value on the below?
Thanks in advance!
-
@Ilias Is there a reason you are using the DISTINCT function inside of the LARGE function?
-
@Paul Newcome I tried to follow the process above by keeping the "#" next to the number.
But no worries I managed to do it without it with the use of the formula below:
-
Hi @Paul Newcome, how would I the first formula :
=JOIN(DISTINCT(COLLECT({Main Sheet Country Column}, {Main Sheet Count Column}, @cell = LARGE({Main Sheet Count Column}, Rank@row))), " / ")
But without joining countries with the same count entry in the same cell, but instead, list them one after the other just like if they had different entries?
Help Article Resources
Categories
- All Categories
- 14 Welcome to the Community
- Customer Resources
- 64.5K Get Help
- 424 Global Discussions
- 136 Industry Talk
- 465 Announcements
- 4.8K Ideas & Feature Requests
- 143 Brandfolder
- 145 Just for fun
- 62 Community Job Board
- 464 Show & Tell
- 32 Member Spotlight
- 1 SmartStories
- 301 Events
- 39 Webinars
- 7.3K Forum Archives
Check out the Formula Handbook template!