Deleting Duplicates by Column Formula

Options

I have a sheet that has rows copied from another sheet. Updated versions of the rows are copied over, and I need to remove the old one. I have added two columns, Duplicate and Tag for Deletion. The formulas are

Duplicate - =COUNTIF([Primary Column]$1:[Primary Column]@row, COLLECT([Primary Column]:[Primary Column], [Primary Column]:[Primary Column], [Primary Column]@row) = [Primary Column]@row)

Tag for Deletion - =IF(MAX(COLLECT(Duplicate:Duplicate, [Primary Column]:[Primary Column], [Primary Column]@row)) = Duplicate@row, 0, 1)

If the Tag for Deletion has a 1, an automation removes the line. This method works just fine, but I want to make the Duplicate formula a column formula, but I can't. I need either a way to define a range as the beginning of the sheet to @row that works with column formulas or a new method. I really appreciate your help.

Tags:

Answers

  • sharkasits
    sharkasits ✭✭✭✭✭
    Options

    You need to add an Auto Number field. (e.g., RowID)

    Duplicate = COUNT(COLLECT([Primary Column]:[Primary Column], [Primary Column]:[Primary Column], [Primary Column]@row, [RowID]:[RowID], <= [RowID]@row))

    Tag for Deletion - =IF(MAX(COLLECT(Duplicate:Duplicate, [Primary Column]:[Primary Column], [Primary Column]@row)) = Duplicate@row, 0, 1)


    Or you could skip the duplicate column and use this:

    Tag for Deletion - =IF(MAX(COLLECT([RowID]:[RowID], [Primary Column]:[Primary Column], [Primary Column]@row)) = [RowID]@row, 0, 1)

  • Jeff Reisman
    Jeff Reisman ✭✭✭✭✭✭
    Options

    @Bobert Scenarios like this are quite common among Smartsheet users; you are not alone. You already did some great work getting to where you are, so kudos!

    To achieve what you want, for each to be a column formula, we may need some more helper columns. First, if you don't already have it, add an numeric Auto-Number system column (no prefix, suffix, or leading zeros!) and call it RowID or something like that. Now make your Duplicate column formula less complex:

    =COUNTIF([Primary Column]:[Primary Column], [Primary Column]@row)

    For your deletion tag, you want to find the row with the highest RowID and the same Primary Column value, and tag anything that IS a duplicate row but is NOT the max RowID.

    =IF(AND(Duplicate@row > 1, RowID@row = MAX(COLLECT(RowID:RowID, [Primary Column]:[Primary Column], [Primary Column]@row), 0, IF(AND(Duplicate@row > 1, RowID@row <> MAX(COLLECT(RowID:RowID, [Primary Column]:[Primary Column], [Primary Column]@row), 1, 0))

    English: If the duplicate count is greater than 1 and the RowID is the highest where the primary column value is the same as this row, leave the box unchecked; but if the duplicate count is greater than 1 and the RowID is NOT the highest where the primary column value is the same as this row, then DO check the box; otherwise (like if the duplicate count is NOT greater than 1,) leave the box unchecked.

    Regards,

    Jeff Reisman

    Link: Smartsheet Functions Help Pages Link: Smartsheet Formula Error Messages

    If my answer helped solve your issue, please mark it as accepted so that other users can find it later. Thanks!

  • PeterLatcapital
    Options

    I followed this discussion, and it has been very helpful. Thank you. I got the count of duplicates with =COUNTIF([Authorization Number]:[Authorization Number], [Authorization Number]@row). And now I am attempting to mark some for deletion with:

    =IF(AND([Duplicate]@row > 1, [RowID]@row = MAX(COLLECT([RowID]:[RowID], [Authorization Number]:[Authorization Number], [Authorization Number]@row), 0, IF(AND([Duplicate]@row > 1, [RowID]@row <> MAX(COLLECT([RowID]:[RowID], [Authorization Number]:[Authorization Number], [Authorization Number]@row), 1,))))))

    I am getting a "unparseable" error. Is it my column property, do I need a check box or is the result 0 1 and 1.

    Help Please.

    Peter

  • Chris Shifflett
    Options

    You can try using the INDEX function along with ROWS or COLUMNS to define the range dynamically in your Duplicate formula. For example:

    Duplicate - =COUNTIF(INDEX([Primary Column], 1, 0), COLUMNS([Primary Column])-1, INDEX([Primary Column], ROWS([Primary Column]), 0), [Primary Column]@row) = [Primary Column]@row)

    This formula counts the number of occurrences of [Primary Column]@row in the range starting from the first row of [Primary Column] and ending at [Primary Column]@row. By using the INDEX and ROWS/COLUMNS functions, you can define the range dynamically, so that it always updates to the latest data in the sheet.

  • Genevieve P.
    Genevieve P. Employee Admin
    Options

    Hi @Chris Shifflett - the formula syntax you're suggesting won't work for Smartsheet, and Smartsheet does not have a "COLUMNS" or "ROWS" function. Is this a suggestion received from ChatGPT? If so, ChatGPT hasn't quite caught up with Smartsheet formulas. It's best to build a formula actually in a Smartsheet sheet before suggesting it as a solution, just in case.


    @PeterLatcapital in regards to your formula, it looks like some of your closing parentheses are in the wrong place. Try a structure like so:

    =IF(AND(---, [RowID]@row = MAX(COLLECT(---))), 0, IF(AND(---, [RowID]@row <> MAX(COLLECT(---))), 1))

    Try:

    =IF(AND([Duplicate]@row > 1, [RowID]@row = MAX(COLLECT([RowID]:[RowID], [Authorization Number]:[Authorization Number], [Authorization Number]@row))), 0, IF(AND([Duplicate]@row > 1, [RowID]@row <> MAX(COLLECT([RowID]:[RowID], [Authorization Number]:[Authorization Number], [Authorization Number]@row))), 1))

    Cheers,

    Genevieve

  • PeterLatcapital
    Options

    Brilliant! Thank you. It worked.

    Thank you, Genevieve.

    That's funny about possibly using GhatGPT. What is the world coming to?

  • Genevieve P.
    Genevieve P. Employee Admin
    Options

    No problem! Glad I could help.

    Haha I have seen some very good responses from ChatGPT... I find it quite spooky.

  • PeterLatcapital
    Options

    Hi Genevieve,

    Now I have a zero in the column of the row I want to delete, what automation will delete the row?

    Cheer,

    Peter

  • Genevieve P.
    Genevieve P. Employee Admin
    Options

    Hi @PeterLatcapital

    I would personally use a "Move Row" workflow on a time schedule, for example every day at 11pm, with the Condition that the column says 0:


    Cheers,

    Genevieve

  • PeterLatcapital
    Options

    Thank you so much. That seems like a simple solution.

Help Article Resources

Want to practice working with formulas directly in Smartsheet?

Check out the Formula Handbook template!