locating and deleting duplicate rows
How do I locate duplicate rows and delete the duplicates, so I only have 1 row of information? This particular sheet has over 300 rows.
Answers
-
How is information coming into the sheet? Depending on the criteria you need met (beyond whether it's a duplicate or not), you may be able to filter out entries in the DataShuttle, form, etc. settings.
Alternatively, for visibility's sake, you can create a filtered view of sorts. One way to do this could be as follows:
-Create a system "Row ID" column that auto-assigns a row ID to each row. This is not necessary if duplicates are represented by a singular set of column data (i.e. all duplicates show the same value in the "Task" column)
-Then, create a helper column (checkbox type) that identifies first entries with the following column formula.
=IF(INDEX(COLLECT([Row ID]:[Row ID], Task:Task, Task@row), 1)=[Row ID]@row, 1, 0)
Essentially, this says that if the current row is the first entry with the collect criteria met, it marks the helper column as checked. Otherwise, it remains unchecked. I have it conditioned based on the Task column, but you can add multiple criteria to ensure that the first in the series is only marked as such when it matches multiple criteria.
-Finally, with the helper column in place, you can either apply a sheet filter or pull a report that filters the sheet, showing only the checked rows. All others will be your duplicate rows.
Hope this helps!:)
-
Hi @Brack
First, use + or JOIN to create a helper column to check for duplicates;
[Combined Data] = =JOIN(Task@row:Group@row)
Then, you can use the RANKEQ function and COLLECT Function like this;
=IF(RANKEQ([Row #]@row, COLLECT([Row #]:[Row #], [Combined data]:[Combined data], [Combined data]@row), 1) > 1, 1)
For example, Row # 136 gets 1, and # 137 gets 2 with the RANKEQ, so you can check if a row is a duplicate by checking if the number is greater than 1.
Alternatively, you can use the COUNTIF cell formula like this to get the equivalent of the RANKEQ value:
=COUNTIF([Combined data]$1:[Combined data]@row, [Combined data]@row)
This method is sometimes used in Excel.
I prefer the first formula to avoid the unreliable cell formula.
-
Additionally, if you want to clear the sheet, you can create an archive sheet (just save a copy of the current sheet) and set up a "Move Row" automation for all the entries called out somehow as a duplicate. Then, your source/intake sheet remains "clean".
-
I need the easiest way for it to look at these 2 columns and remove the dupes. I just need to see the line item once. I could have multiple rows with the same task, MFG part number etc. I hope this makes sense.
Categories
- All Categories
- 14 Welcome to the Community
- Smartsheet Customer Resources
- 64K Get Help
- 410 Global Discussions
- 220 Industry Talk
- 459 Announcements
- 4.8K Ideas & Feature Requests
- 143 Brandfolder
- 137 Just for fun
- 57 Community Job Board
- 459 Show & Tell
- 31 Member Spotlight
- 1 SmartStories
- 298 Events
- 37 Webinars
- 7.3K Forum Archives