Highlighting discrepancy cells in duplicate rows



I have sheet with participants registration information for a study, and where the form is completed by participants. The sheet has columns like First Name, Last Name, Email address, Country, City, Institution etc.

There are certain participants who are completing the registration more than once, and to identify the duplicates, I have used formulas, where if same email is appeared second time in a sheet, column name "occurrences" will be returned to 2, and I have also created another checkpoint which check for first name, last name, country and institution, where if any two rows has exactly same information for these columns, this column will be starred (the reason for using this second checkpoint is, because there will be a possibility that participant might register with new mail, the next time). For either of the two scenarios, I manually go and delete the duplicate entry (if the second entry also has the exact same info as first entry).

However, if the second entry by same participant who have already registered earlier has given different information (for example years of experience was submitted as 5 years in first entry, and 10 years in second entry), I just wanted to create a conditional formatting to highlight the cell that has discrepancy (in our example, I wanted to cell "Years of experience" to be highlighted automatically) for the duplicate rows of that participant. so that it will be easy on eyes to check the highlighted cells, rather than checking each cell for discrepancy. Is it possible to do? Can someone please give suggestions?



  • jmyzk_cloudsmart_jp
    jmyzk_cloudsmart_jp ✭✭✭✭✭✭
    edited 06/16/24

    Hi @Mounika

    To check for mismatches, you need to get the first entry in a particular column, such as [Years of Experience], for a specific duplicate type, [Email], or [Other Attributes], and compare that entry with the value of the duplicate entry.


    To get the first entry in the demo solution above, I added several helper columns as follows;

    [Other Attributes] =[First Name]@row + [Last Name]@row + Country@row + Institution@row

    [Occurrences (Email)] =RANKEQ([Row ID]@row, COLLECT([Row ID]:[Row ID], Email:Email, Email@row), 1)

    [Occurrences (Other)] =RANKEQ([Row ID]@row, COLLECT([Row ID]:[Row ID], [Other Attributes]:[Other Attributes], [Other Attributes]@row), 1)

    [Checkpoint (Email)] =IF([Occurrences (Email)]@row > 1, 1)

    [Checkpoint (Other)] =IF([Occurrences (Other)]@row > 1, 1)

    With the first two RANKEQ formulas, I can determine what [Row ID] is the one for the first entry of the duplicates by checking if the value is 1.

    With the above helper columns, I can get the first entry of [Years of Experiences] and [City] as follows;

    [Years of Experience 1] =IF([Checkpoint (Email)]@row, JOIN(COLLECT([Years of Experience]:[Years of Experience], Email:Email, Email@row, [Occurrences (Email)]:[Occurrences (Email)], 1)), IF([Checkpoint (Other)]@row, JOIN(COLLECT([Years of Experience]:[Years of Experience], [Other Attributes]:[Other Attributes], [Other Attributes]@row, [Occurrences (Other)]:[Occurrences (Other)], 1))))

    [City 1] =IF([Checkpoint (Email)]@row, JOIN(COLLECT(City:City, Email:Email, Email@row, [Occurrences (Email)]:[Occurrences (Email)], 1)), IF([Checkpoint (Other)]@row, JOIN(COLLECT(City:City, [Other Attributes]:[Other Attributes], [Other Attributes]@row, [Occurrences (Other)]:[Occurrences (Other)], 1))))

    In the above formula, I used JOIN(COLLECT()) functions, but you can also use INDEX(MATCH()) functions.

    Then, I can add helper columns to use in the conditional formatting by comparing the [Years of Experience] with [Years of Experience 1], for example.