Identifying duplicates when company name changes without changing source data

I am keeping a running report on engagement with certain organizations across 5 years. We receive a data export from a collaborating partner with a list of hundreds of orgs, thousands of rows. Each year, several orgs may change their names slightly (for example, if Super Duper Inc. changes its name to SuperSuper Inc. or Super Duper Kids Inc, but still in, say, Burbank, CA). Is there a formula for identifying these "similar" orgs to see if they need to be de-duped somehow (I use a helper column called "adjusted org name") rather than having to manually review? The purpose of this is to make sure we're not counting an org more than once when we run a pivot table (for instance, if Super Super Kids Inc got 4K in funding in 2022 and then SuperDuper Inc. received an in-kind donation of a TV in 2023, we want to see 2 engagements for 1 org over 2 years, not 2 engagements for 2 orgs).

Thanks.

Answers

  • Danielle Arteaga
    Danielle Arteaga ✭✭✭✭✭✭

    There are probably a lot of ways to do this, especially if you are using Data Shuttle to bring in the new data each year. But, I'm assuming that you are doing this more or less manually. So, here's a suggestion.

    First, bring the NEW data into Smartsheet on its own worksheet. On that worksheet, add a checkbox column called [New Name].

    In that column, place the following formula:

    =IFERROR(IF(MATCH([Entity Name]@row, {Existing Entity List Entity Name}, 0) > 0, 0, 1), 1)

    Then, on your EXISTING sheet, create a checkbox column called [Different Name]. In that column, place the following formula:

    =IFERROR(IF(MATCH([Entity Name]@row, {New Entity List Entity Name}, 0) > 0, 0, 1), 1)

    You can then create a report using these two sheets and a filter that captures any row that has the [New Name] or [Different Name] boxes checked.


    Note that this works well only if the entity names are similar enough to sort well alphabetically. If ABC Co., Ltd. changed its name to Gigantic Holding Co., LLC, it may not be as immediately obvious what the old and new names were.

    Ultimately, the most foolproof solution is to assign each entity a unique identifier (some alphanumeric combination) that never changes so that you can key all your data updates to that identifier, stabilizing your data even if the entity name changes. (Also allowing you to better automate this process.)

Help Article Resources

Want to practice working with formulas directly in Smartsheet?

Check out the Formula Handbook template!