locating and deleting duplicate rows

How do I locate duplicate rows and delete the duplicates, so I only have 1 row of information? This particular sheet has over 300 rows.

Answers

  • brianschmidt
    brianschmidt ✭✭✭✭✭✭
    edited 06/18/24

    How is information coming into the sheet? Depending on the criteria you need met (beyond whether it's a duplicate or not), you may be able to filter out entries in the DataShuttle, form, etc. settings.

    Alternatively, for visibility's sake, you can create a filtered view of sorts. One way to do this could be as follows:

    -Create a system "Row ID" column that auto-assigns a row ID to each row. This is not necessary if duplicates are represented by a singular set of column data (i.e. all duplicates show the same value in the "Task" column)

    -Then, create a helper column (checkbox type) that identifies first entries with the following column formula.

    =IF(INDEX(COLLECT([Row ID]:[Row ID], Task:Task, Task@row), 1)=[Row ID]@row, 1, 0)

    Essentially, this says that if the current row is the first entry with the collect criteria met, it marks the helper column as checked. Otherwise, it remains unchecked. I have it conditioned based on the Task column, but you can add multiple criteria to ensure that the first in the series is only marked as such when it matches multiple criteria.

    -Finally, with the helper column in place, you can either apply a sheet filter or pull a report that filters the sheet, showing only the checked rows. All others will be your duplicate rows.

    Hope this helps!:)

  • jmyzk_cloudsmart_jp
    jmyzk_cloudsmart_jp ✭✭✭✭✭✭

    Hi @Brack

    First, use + or JOIN to create a helper column to check for duplicates;

    [Combined Data] = =JOIN(Task@row:Group@row)

    Then, you can use the RANKEQ function and COLLECT Function like this;

    =IF(RANKEQ([Row #]@row, COLLECT([Row #]:[Row #], [Combined data]:[Combined data], [Combined data]@row), 1) > 1, 1)

    For example, Row # 136 gets 1, and # 137 gets 2 with the RANKEQ, so you can check if a row is a duplicate by checking if the number is greater than 1.

    Alternatively, you can use the COUNTIF cell formula like this to get the equivalent of the RANKEQ value:

    =COUNTIF([Combined data]$1:[Combined data]@row, [Combined data]@row)

    This method is sometimes used in Excel.

    I prefer the first formula to avoid the unreliable cell formula.

    https://app.smartsheet.com/b/publish?EQBCT=a37cabacfd6b4bc6b4035275a854fede

  • brianschmidt
    brianschmidt ✭✭✭✭✭✭

    Additionally, if you want to clear the sheet, you can create an archive sheet (just save a copy of the current sheet) and set up a "Move Row" automation for all the entries called out somehow as a duplicate. Then, your source/intake sheet remains "clean".

  • Brack
    Brack ✭✭

    I need the easiest way for it to look at these 2 columns and remove the dupes. I just need to see the line item once. I could have multiple rows with the same task, MFG part number etc. I hope this makes sense.