Quality controlling data entry to prevent duplicates

I have a vendor database with over 1,000 unique vendors and am looking for a way to quality control new entries to avoid duplicates in a VENDOR NAME column (note that there is a separate VENDOR ID # column).

One way of doing this seems to be through formatting the VENDOR NAME column as a dropdown while NOT restricting to list values only. When someone goes to add a new vendor, if that vendor name was already entered previously, it will populate in the dropdown and thus alert the data entrant that the vendor already exists in the database. The drawbacks are that when genuine new vendors are added they do not automatically get added to the dropdown list for future reference, which means the dropdown data would need to be manually updated on a regular basis. And, the dropdown will only show entries when the spelling is identical starting from the beginning of the text string. For example, if a vendor named "ABC Happy Man LLC" was in the sheet and someone entered "Happy Man LLC" then the dropdown would not show the previous entry since the text string starts with "ABC".

Is there another approach to quality control our VENDOR NAME column to ensure that duplicate information is not being entered?

Best Answer

Answers

  • Danielle Arteaga
    Danielle Arteaga ✭✭✭✭✭✭

    How are you using the VENDOR ID# column? Is that auto-created or are the vendors assigned a unique number in some other system? My instinct is that you solution lies in using that ID # column, because the variations on spellings or formatting of a vendor's name will be very hard to QA through automation.

    If you can share more about how your sheet is set up, happy to help solution this with you!

  • LGraf
    LGraf ✭✭✭✭✭

    Hi Danielle, Smartsheet is the backbone of our administrative system so this is the home of our vendor database and where the VENDOR ID # is generated. The Vendor ID # column has a formula that pulls from a few other columns in the sheet: VENDOR TYPE, CONTRACT LETTER, DATE, plus the auto generated number. Only the VENDOR TYPE column has any chance of changing and thus affecting the VENDOR ID #.

    How can we improve the setup?

  • Carson Penticuff
    Carson Penticuff ✭✭✭✭✭✭

    The common approach to "detecting" duplicate values is to add a helper column. The helper column would be named something similar to [Duplicate] or [Duplicate Vendor] and the type would be checkbox. The formula for that column would be similar to:

    =IF(COUNTIFS([VENDOR NAME]:[VENDOR NAME], [VENDOR NAME]@row) > 1, 1, 0)

    This will check the box on all rows that have duplicates. You can then add conditional formatting to highlight that row if [Duplicate] is equal to 1. Unfortunately, this does not prevent duplicate entries, it just makes it more obvious to the person entering the data and anyone reviewing it later.

    As for the issue of the same vendor being entered with slight name variations.... that is hard. I do not see a way to detect that without using the API to run the entries through some type of AI or pattern matching libraries. Although it is not a perfect solution, the simplest option may be to establish some type of standard work or standard operating procedures for vendor naming conventions to decrease the likelihood of vendor name variations.

  • LGraf
    LGraf ✭✭✭✭✭

    Hi @Carson Penticuff,

    The DUPLICATE VENDOR column seems to be a more manageable solution than using a dropdown list to quality control vendor names, but as you mentioned it does not protect from typos. We'll likely use it nevertheless since it seems to be the best option.

    Thanks very much.

  • Josh Tyler
    Josh Tyler ✭✭✭

    @Carson Penticuff Quick Question on your formula. I too am looking to identify duplicates, but for our use case it is manual payment requests. As a control if I created the Duplicate Helper Column as a checkbox/flag, could the formula be written to factor in multiple data cells. For example

    1. Invoice Date
    2. Invoice Amount
    3. Invoice Number

    Could the formula be written where if ALL THREE matched it could flag the checkbox column?

  • Carson Penticuff
    Carson Penticuff ✭✭✭✭✭✭
    Answer ✓

    @Josh Tyler Absolutely! Give this a try:

    =IF(COUNTIFS([Invoice Date]:[Invoice Date], [Invoice Date]@row, [Invoice Amount]:[Invoice Amount], [Invoice Amount]@row, [Invoice Number]:[Invoice Number], [Invoice Number]@row) > 1, 1, 0)

  • Josh Tyler
    Josh Tyler ✭✭✭

    @Carson Penticuff YOU ARE AWESOME!!!!!!! I used what you provided, had to change just the conditions as I decided to use one different cell and IT WORKED! Thank you , Thank you, Thank you - Have an amazing weekend!😁

Help Article Resources

Want to practice working with formulas directly in Smartsheet?

Check out the Formula Handbook template!