Case sensitive COUNTIF

BSADK
BSADK ✭✭✭
edited 02/16/22 in Smartsheet Basics

I'm trying to mark duplicates in a particular column per the discussion here: https://community.smartsheet.com/discussion/comment/319478#Comment_319478

My issue is, the column I'm checking has a mix of uppercase and lowercase letters that define unique IDs, and COUNTIF unfortunately, is case insensitive, so it is marking cells that are not duplicates as duplicates. Is there any way to do a case sensitive check?

NOTES:

  1. I've already submitted a feature request, referencing the EXACT function that is available in excel which does exact, case sensitive matching.
  2. I've tried using FIND (which IS case sensitive) and JOIN as such: =FIND([Main Column]@row, JOIN([Main Column]:[Main Column]), FIND([Main Column]@row, JOIN([Main Column]:[Main Column])) + 1) but that doesn't work due to the 4000 character limit in SmartSheet.

Best Answers

  • Jeff Reisman
    Jeff Reisman ✭✭✭✭✭✭
    Answer ✓

    I think I have it here. You were on the right track with FIND. We just have to put that into a COUNTIF:

    =COUNTIF([Main Column]:[Main Column], FIND([Main Column]@row, @cell) > 0)

    Here I have my name listed in UPPER, lower, and Mixed cases. I get the correct count for each. Make this a column formula, and then filter by anything with a value greater than 1, and there are your true dupes.


    Regards,

    Jeff Reisman

    Link: Smartsheet Functions Help Pages Link: Smartsheet Formula Error Messages

    If my answer helped solve your issue, please mark it as accepted so that other users can find it later. Thanks!

  • Jeff Reisman
    Jeff Reisman ✭✭✭✭✭✭
    edited 02/17/22 Answer ✓

    @BSADK

    The formula I provided above counts the numbers of occurrences of exact case-sensitive matches in a range. You don't have to work so hard, as FIND and COUNTIF do what you need.

    Just put this in a "DupeCount" column:

    =COUNTIF([Main Column]:[Main Column], FIND([Main Column]@row, @cell) > 0)

    Any row showing a count higher than 1 is a dupe. Once you remove one of the duplicates, the count on the other(s) drops by 1.

    Regards,

    Jeff Reisman

    Link: Smartsheet Functions Help Pages Link: Smartsheet Formula Error Messages

    If my answer helped solve your issue, please mark it as accepted so that other users can find it later. Thanks!

  • BSADK
    BSADK ✭✭✭
    Answer ✓

    @Jeff Reisman I found an issue with that formula.

    The original problem is that COUNTIF, as we saw, isn't an EXACT match because it's case insensitive.

    Using FIND in this manner, though case sensitive, is also NOT an EXACT match, because it will match if any SUBSTRING inside matches, so it doesn't care if the lengths match or not.

    This was not an issue in the first sheet I was using and wrote this question about, because the "ID" field was the same length for everything in that same system. I found this out when I tried to apply this same formula to another Spreadsheet that had a column that represented a model number, where no-only could the lengths be different, but many were the same number with just an additional letter at the end.

    Truth be told, in this case, I didn't need a case sensitive match, and going back to the original COUNTIF would work for this scenario. HOWEVER, if someone else comes across this thread looking for an EXACT match -or- as the title suggests, duplicating COUNTIF function but in a case sensitive way, I figured I would share this change to the formula that should generate a true exact match.

    =COUNTIF([Main Column]:[Main Column], AND(LEN([Main Column]@row) = LEN(@cell), FIND([Main Column]@row, @cell) > 0))

    Note: I purposely put the LEN comparison first in case Smartsheets AND() is a "short-circuit" AND(). Meaning if the first term is 0, it won't bother checking the second term because it doesn't matter. That would shorten processing time if that's the case. I don't know if it is, though.

    ...still wish there was an EXCEL-like EXACT() function. I put in the feature request, but not hopeful it will ever come to pass.

Answers

  • Jeff Reisman
    Jeff Reisman ✭✭✭✭✭✭

    What kind of formula are you using to create your unique IDs that you have the same word in varying mixed cases? Can you add a numeric value in there that might help to differentiate the two ID values? It seems to me like that's the issue to fix, so that you have cleaner data.

    Regards,

    Jeff Reisman

    Link: Smartsheet Functions Help Pages Link: Smartsheet Formula Error Messages

    If my answer helped solve your issue, please mark it as accepted so that other users can find it later. Thanks!

  • BSADK
    BSADK ✭✭✭

    @Jeff Reisman If I could, I would. These IDs are actually being copied over from another system that generates them. Unfortunately, I have no control over that system whatsoever. The duplicate check is part of making sure they got ported over correctly.

    This system has thousands of entries, of which I'm (so far) extracting 660 of them to import into my Smartsheet. The IDs are then used to link to the references in this external system. If I don't have the correct ID, they won't get the correct reference.

    When I initially tried to import, there was an error, and there were 6 (true) duplicates that I had to figure out the correct IDs for. However, it has identified another 16 duplicates that are not actually duplicates, they have this lowercase/uppercase issue.

  • Jeff Reisman
    Jeff Reisman ✭✭✭✭✭✭
    Answer ✓

    I think I have it here. You were on the right track with FIND. We just have to put that into a COUNTIF:

    =COUNTIF([Main Column]:[Main Column], FIND([Main Column]@row, @cell) > 0)

    Here I have my name listed in UPPER, lower, and Mixed cases. I get the correct count for each. Make this a column formula, and then filter by anything with a value greater than 1, and there are your true dupes.


    Regards,

    Jeff Reisman

    Link: Smartsheet Functions Help Pages Link: Smartsheet Formula Error Messages

    If my answer helped solve your issue, please mark it as accepted so that other users can find it later. Thanks!

  • BSADK
    BSADK ✭✭✭
    edited 02/17/22

    Update:

    I've tried a couple things.

    First, I tried HAS() :

    The documentation for the HAS() function:

    States "This function searches for an exact match of a value"

    However, despite the documentation, HAS() is NOT an EXACT match, it is still case insensitive. Smartsheet should really change the documentation to reflect that. What they mean by "EXACT" is that it matches in length in addition to letters (but case insensitive on the letters). It doesn't match if it's a substring. However, since it doesn't consider case it's not really EXACT.

    Next, I tried combining FIND() with INDEX() and MATCH():

    I got a little closer (but "no cigar") to a solution with the following formula:

    =IF(FIND([Document ID]@row, INDEX([Document ID]:[Document ID], MATCH([Document ID]@row, [Document ID]:[Document ID], 0))), 0, 1)

    This gets pretty complex and may be hard to follow, but here's what this does:

    In this formula, I used MATCH() (case insensitive) to find the first matching value. Then, I used INDEX() to reference that exact value, then FIND() (case sensitive) to compare that value to the current row and see if it is an exact match. If it is an exact match, it returns a 0, if not it returns a 1.

    This gives 4 cases:

    1. Unique values regardless of case always match themselves and only themselves exactly, so they all get a "0".
    2. Actual duplicates - which will match exactly with themselves and their duplicates, also get a "0".
    3. The first duplicate - which will match itself first, also gets a "0"
    4. The second and later occurrences of a non-exact "duplicate" (case-insensitive) all get a 1.

    If I combine these results with the COUNTIF function, by subtracting this result, then:

    Case 1: All unique values = 0-0 = 0, and maintain a 0

    Case 2: Actual duplicates = 1-0 = 1, and maintain a 1 (*in MOST cases).

    Case 3 is still a problem - the first value will get (1-0 = 1) and still get flagged as a duplicate, even though it is not.

    Case 4: Second and later occurrences of non-exact "duplicates" = 1 - 1 = 0, which is USUALLY correct. However, if we have two values that match each other EXACTLY but have an earlier case-insensitive match, it WON'T flag those as duplicates. They still get 1 -1 = 0 which makes them look like they're not a duplicate when in fact they are.

    The issue with Case 4 would be pretty rare, and I personally don't anticipate any occurrences of that in my sheet and if this were the only issue, I could live with it. Case 3 however is still giving me nuisance flags and it would actually make it harder to differentiate real duplicates from fake as I wouldn't know what lines to compare to. I feel like I'm closer to a solution but still don't have it.

  • Jeff Reisman
    Jeff Reisman ✭✭✭✭✭✭
    edited 02/17/22 Answer ✓

    @BSADK

    The formula I provided above counts the numbers of occurrences of exact case-sensitive matches in a range. You don't have to work so hard, as FIND and COUNTIF do what you need.

    Just put this in a "DupeCount" column:

    =COUNTIF([Main Column]:[Main Column], FIND([Main Column]@row, @cell) > 0)

    Any row showing a count higher than 1 is a dupe. Once you remove one of the duplicates, the count on the other(s) drops by 1.

    Regards,

    Jeff Reisman

    Link: Smartsheet Functions Help Pages Link: Smartsheet Formula Error Messages

    If my answer helped solve your issue, please mark it as accepted so that other users can find it later. Thanks!

  • BSADK
    BSADK ✭✭✭

    Yes! Ingenious! Thank you very much @Jeff Reisman . I knew there must be a way to use @cell, which is a feature unique to SmartSheet (as far as I know), but I hadn't quite figured out how to implement it correctly. This works exactly as I need. I do wish there was a simpler, more straightforward way. I did put in a feature request for a function like Excel's EXACT().


  • BSADK
    BSADK ✭✭✭
    Answer ✓

    @Jeff Reisman I found an issue with that formula.

    The original problem is that COUNTIF, as we saw, isn't an EXACT match because it's case insensitive.

    Using FIND in this manner, though case sensitive, is also NOT an EXACT match, because it will match if any SUBSTRING inside matches, so it doesn't care if the lengths match or not.

    This was not an issue in the first sheet I was using and wrote this question about, because the "ID" field was the same length for everything in that same system. I found this out when I tried to apply this same formula to another Spreadsheet that had a column that represented a model number, where no-only could the lengths be different, but many were the same number with just an additional letter at the end.

    Truth be told, in this case, I didn't need a case sensitive match, and going back to the original COUNTIF would work for this scenario. HOWEVER, if someone else comes across this thread looking for an EXACT match -or- as the title suggests, duplicating COUNTIF function but in a case sensitive way, I figured I would share this change to the formula that should generate a true exact match.

    =COUNTIF([Main Column]:[Main Column], AND(LEN([Main Column]@row) = LEN(@cell), FIND([Main Column]@row, @cell) > 0))

    Note: I purposely put the LEN comparison first in case Smartsheets AND() is a "short-circuit" AND(). Meaning if the first term is 0, it won't bother checking the second term because it doesn't matter. That would shorten processing time if that's the case. I don't know if it is, though.

    ...still wish there was an EXCEL-like EXACT() function. I put in the feature request, but not hopeful it will ever come to pass.