Find duplicate rows based on 2 column criteria

Options

I have a sheet that I'm trying to removed duplicates from based on 2 criteria.


In the example above, you can see that the APP_ID is the same for all 4 entries. But only the last one is a duplicate based on looking at the unique combination of PID and APP_ID.

It's ok for an APP_ID to repeat for different PID #s. But if the same PID and same APP_ID, then that row must be marked for removal.

Appreciate your help!


Tony

Answers

  • jmyzk_cloudsmart_jp
    jmyzk_cloudsmart_jp ✭✭✭✭✭✭
    Options

    Hi @twarner

    Generally speaking, replace this problem with the problem of counting how many times the values are the same in multiple columns until a certain row.

    If the result of the count is two or more, then it is a Duplicate.

    As we often face similar problems when creating Smartsheet solutions, I will summarize the methods here.

    There are several ways to do this.

    Use COUNTIF

    Using COUNTIF is a common method in Excel and other applications, where you specify a COUNTIF range of a column up to a row and use the value of the column of the row, and count how many times the value is in the range.

    In Smartsheet, the formula is as follows

    • =COUNTIF([PID-APP_ID]$1:[PID-APP_ID]@row, [PID-APP_ID]@row)
        • COUNTIF range of a column up to a row: [PID-APP_ID]$1:[PID-APP_ID]@row
        • The value of the column of the row:  [PID-APP_ID]@row

    The problem with this method is that [PID-APP_ID]$1 uses absolute reference, and you can not use the column function so that the formula can break.


    Use RANKEQ

    When using RANKEQ, you need to prepare a number column. And also, you need to ensure the uniqueness of the numbers by automatic number, etc., 

    Then, use the RANKEQ function to obtain the rank of that number in a specific row within the range of the numbers that meet some criteria.

    Use automatic number

    The easiest way is to use an automatic number column.

    • =RANKEQ([Row ID]@row, COLLECT([Row ID]:[Row ID], PID:PID, PID@row, [APP_ID]:[APP_ID], [APP_ID]@row), 1)

    Note that here, by COLLECT([Row ID]:[Row ID], PID:PID, PID@row, [APP_ID]:[APP_ID], [APP_ID]@row), we get the range of [Row ID] with the same PID and APP_ID as the PID and APP_ID of a particular row, and use it as the range for RANKEQ.

    Use Created system column.

    If, for some reason, the sheet cannot use auto-numbering and multiple PID and APP_ID values are not entered at the same time, i.e., users add them sequentially, there is a way to convert system column Created to a numerical value and use RANQKEQ based on this value. (If users do not simultaneously enter multiple PID and APP_ID values, the system column will guarantee the uniqueness of the converted Created values.)

    One possible case where you can not use auto-numbering is when rows containing auto-numbers are copied from other rows by Automation.

    Converting "Created" to a numerical value is a complex formula, but you can use it elsewhere once created.

    Example formula:

    • =(Created@row - BaseD#) * 24 * 60 + (VALUE(MID(Created@row, 10, FIND(":", Created@row) - 10)) + IF(AND(RIGHT(Created@row, 2) = "PM", MID(Created@row, 10, FIND(":", Created@row) - 10) > 12), 12, 0)) * 60 + VALUE(MID(Created@row, FIND(":", Created@row) + 1, 2))

    This expression takes the date, hour, and minute data from Created and calculates the number of minutes from a base date.

    Then, as in the case of automatic numbering, RANKEQ is used based on this Created Minute number.

    Please review the published sheet in the following URL and test how each method works in un-locked rows.


  • Paul Newcome
    Paul Newcome ✭✭✭✭✭✭
    Options

    Here is another method leveraging the Created Date type column:

    =IF(COUNTIFS(PID:PID, @cell = PID@row, [APP_ID]:[APP_ID], @cell = [APP_ID]@row)> 1, IF([Created Date]@row> MIN(COLLECT([Created Date]:[Created Date], PID:PID, @cell = PID@row, [APP_ID]:[APP_ID], @cell = [APP_ID]@row)), 1))

  • jmyzk_cloudsmart_jp
    jmyzk_cloudsmart_jp ✭✭✭✭✭✭
    Options

    @Paul Newcome 's

    IF([Created Date]@row> MIN(COLLECT([Created Date]:[Created Date], PID:PID, @cell = PID@row, [APP_ID]:[APP_ID], @cell = [APP_ID]@row)),❤️

  • Paul Newcome
    Paul Newcome ✭✭✭✭✭✭
    Options

    The first part of my formula was first checking if it was a duplicate, but I realize now that part isn't needed. [Created Date]@row can't be greater than [Created Date]@row if it is the only entry.


    =IF([Created Date]@row> MIN(COLLECT([Created Date]:[Created Date], PID:PID, @cell = PID@row, [APP_ID]:[APP_ID], @cell = [APP_ID]@row)), 1)

  • jmyzk_cloudsmart_jp
    jmyzk_cloudsmart_jp ✭✭✭✭✭✭
    edited 07/16/23
    Options

    Correction:

    Converting "Created" to a numerical value 

    Example formula:

    • =(Created@row - BaseD#) * 24 * 60 + (VALUE(MID(Created@row, 10, FIND(":", Created@row) - 10)) + IF(AND(RIGHT(Created@row, 2) = "PM", MID(Created@row, 10, FIND(":", Created@row) - 10) <> 12), 12, 0)) * 60 + VALUE(MID(Created@row, FIND(":", Created@row) + 1, 2))


  • Echomike
    Options

    I use a variation of this that also excludes blanks:

    =IF(AND([PID]@row <> "", [APP ID]@row <> "", COUNTIFS([PID]:[PID], [PID]@row, [APP ID]:[APP ID], [APP ID]@row) > 1), 1, 0)

Help Article Resources

Want to practice working with formulas directly in Smartsheet?

Check out the Formula Handbook template!