# Formula for Duplicate Entries

Options
This discussion was created from comments split from: Warning message when Duplicate entry is done through form.
«1

• ✭✭✭✭
Options

Can someone give an example of a formula you would use to catch duplicate entries in a sheet? If I have a "Name" column for example, how would that be implemented?

Options

You can find duplicates by using a COUNTIF function to find if something appears more than once, and if it does, return a specific value (such as a checkmark in a check box column).

Try something like this:

=IF(COUNTIF(Name:Name, Name@row) > 1, 1, 0)

I would apply this as a Column Formula (see here).

Here are some other Community Posts where finding duplicates is discussed:

Cheers!

Genevieve

• ✭✭✭✭✭✭
Options

Here is another approach:

=IF((COUNT([Column A]1:[Column A]12)) = (COUNT(DISTINCT([Column A]1:[Column A]12))), "No Dups", "Duplicates exist")

The formula compares a count of all entries in a column (limited range in this case)to a count if distinct values from the same range.

You can put this at the top of a sheet, or in your sheet summary, or drop the cell into a report.

• ✭✭✭✭
Options

Is there a way to only check the box if the duplicate isn't the first in the list? I have been using =IF(COUNTIF(Resources\$1:Resources@row, Resources@row) <> 1, 0, 1) as a formula but I need to make the formula a column formula. Is there any way to do this? @Genevieve P.

Options

Since you're using an absolute reference with a row number (Resources\$1) then no, you won't be able to make this into a column formula.

That said, if you lock the column, then as long as new rows are added either between two rows with that same formula or immediately below content in your sheet, the formula will auto-fill down your sheet automatically. See: Use or Override Automatic Formatting and Formula Autofill

Cheers,

Genevieve

• Options

Hi @Genevieve P. - I was wondering. I have a row id and created the formula to count for duplicates. Mine is a bit different:

My formula allows me to convert to column formula, but it is just blank? =IF(Duplicates@row > 1, [Row ID]@row = MAX(COLLECT([Row ID]:[Row ID], 0, IF(Duplicates@row > 1, [Row ID]@row <> MAX(COLLECT([Row ID]:[Row ID], 10)))))) Can you tell me what I am doing wrong?

edited 05/24/23
Options

Hey! Happy to help.

The first thing I notice is that you have two statements that say the same thing... "If the Duplicates cell to the right is greater than 1..."

The second thing is that your MAX(COLLECT( statements are missing their closing parentheses, and potentially another range? The MAX function won't be able to read the text in the Row ID, as MAX only looks at numbers and dates.

Would you be able to explain what you want it to do? If you want to bring back all the duplicate row names, I would suggest using JOIN collect, like so:

=IF(Duplicates@row > 1, JOIN(COLLECT([Column to Return]:[Column to Return], [Row ID]:[Row ID], [Row ID]@row), ", "))

I'm not sure what column you want to bring back, so I've put in [Column to Return] as an example. If this hasn't helped, it would be useful to see our formula in the Duplicates column and hear an explanation of what you want the current formula to do.

Cheers,

Genevieve

• ✭✭✭✭✭✭
Options

I would insert an Auto-number column (called "Auto" in this example). Ten you can adjust your COUNTIFS like so:

=IF(COUNTIFS(Resources:Resources, @cell = Resources@row, Auto:Auto, @cell< Auto@row)> 0, 1)

• Options

@Genevieve P. In the duplicates for deletion column, I want to identify the old data that needs to be removed from the smartsheet and keep the latest data. Is that possible?

Options

Hi Natalie,

What else do you have in your sheet? What I mean is, how would you define "old" data - do you have a Date Created column or an Auto Number like Paul is suggesting?

Can you also let us know what formula you have in the "Duplicates" column?

• edited 05/26/23
Options

I have the modified date and thought I could utilize that for the forumula to keep the latest update.

formula for finding duplicates: =IF(COUNTIF([Row ID]:[Row ID], [Row ID]@row) > 1, 1, 0)))

Options

Hey Natalie,

This is great, thank you!

Here's the formula I would use:

=IF(OR(Duplicates@row = 0, Modified@row = MAX(COLLECT(Modified:Modified, Duplicates:Duplicates, 1))), "Keep", "Delete")

This says that if the Duplicate column says 0, this is automatically a "keep" row. OR if the Duplicates column has a 1 in it, but the current Modified Date is the MAX date (the newest date) then keep that as well.

In my sheet I have the "Duplicates" column as a Flag column so I don't expect numbers other than 1 to appear:

You can change the words around however you'd like, but this should make it easier to identify which ones to keep or delete. You can also use conditional formatting to highlight rows that should be removed:

Cheers,

Genevieve

• ✭✭✭✭✭✭
Options

I would suggest some caution when leveraging the Modified (date) column.

In theory... If the formula changes it from unflagged to flagged, the modified column will update which in turn makes it the most recently updated row which then unflags the column and flags another, which starts the process all over again on another row.

In practice, you will end up with two rows that have the same exact modified date/time and even though they are duplicates, since they both meet the MAX criteria, they will both be flagged as "Keep" which completely defeats the purpose.

I have been trying to figure out a way around this for a while now. I think the trick is going to be very specific triggers in the Copy Row automation so that it only copies the row when certain fields are changed (to exclude the modified date/time change triggering and exclude the flag change triggering which would end up causing the same issue as above).

Options

Ah! Goodness I completely missed that the System Modified column would cause this trouble - of course.

What about a Date column that records a date with a workflow based on if specific columns are changed?

• ✭✭✭✭✭✭
Options

@Genevieve P. I had thought about that too, but that only works if you won't have more than one row updated in a single day. For example... If I update the first "Row 1" in your screenshot this morning and then the second "Row 1" in your screenshot this evening, I will have two rows that are "Row 1" with the same date. If only one row will be updated each day, then that would certainly work though.

## Help Article Resources

Want to practice working with formulas directly in Smartsheet?

Check out the Formula Handbook template!