Formula to see only alphanumeric codes in a cell

I have a column in smartsheet with cells that have, for example, the following: "I70.0: atherosclerosis of the aorta; J43.2: Centrilobular emphysema; J47.9: Bronchiectasis, uncomplicated; F33.0: Major depressive disorder, recurrent, mild ;", I need another column where I only see the alphanumeric codes that are before each (:), without the description that follows after, for example like this: "I70.0; J43.2, J47.9; F33.0;"

Answers

  • KPH
    KPH ✭✭✭✭✭✭

    Would it be true to say that you need the first 6 characters in the cell? Or are there times where the codes are longer/shorter, or not at the start? If so, is the last character you need a : or a ; or a mix (the example has both)?

  • Yes, there are codes that could be shorter or longer. Sorry for the error, I just need a ; separating the codes, even at the end, so yes, a semicolon after each code.

  • KPH
    KPH ✭✭✭✭✭✭

    oh, hang on, that example is one cell not a selection of examples.

    You have many codes throughout the cell and each code ends with a colon, that you don’t want to include, but you do want a semi-colon after it.

    The codes are all preceded by semi-colon and space, other than the very first one.

    Is that true?

    Can you supply more examples to show different code lengths? for example, do they all have a period in them, and you want all the characters between the space and the period, and the period, and all characters after the period up to the colon, excluding the colon?

    We need to establish some rules and use those to find the codes.

  • Exactly, all those codes are in the same cell and the codes are all preceded by semi-colon and space, other than the very first one. By medical coding rule, there is always a period after the first alphanumeric digits.

    Other code examples with different lengths:

    I43: Cardiomyopathy in diseases classified elsewhere (does not have the dot because there are only 3)

    I63.339: Cerebral infarction due to thrombosis of unspecified posterior cerebral artery

    M97.01XA: Periprosthetic fracture around internal prosthetic right hip joint, initial encounter


    Thanks for helping!


  • Example:

    I43: Cardiomyopathy in diseases classified elsewhere; I63.339: Cerebral infarction due to thrombosis of unspecified posterior cerebral artery; M97.01XA: Periprosthetic fracture around internal prosthetic right hip joint, initial encounter;

  • KPH
    KPH ✭✭✭✭✭✭
    edited 02/04/24

    I have created a formula that works for those examples and would be interested to see if it works for more or if there are some examples that break the rules.

    This formula relies on

    1. The only place : is used is at the end of a code.
    2. The only place ; is used is between diagnoses. Ie at the start of all codes except the first.
    3. The first code is at the very start of the cell.
    4. The ~ isn't used at all (if it is, we can replace this with something that is not).
    5. There are always at most 6 codes in one cell (if there are more we can add more to the formula - you can probably figure out the pattern).

    I can explain it in detail tomorrow but am pushed for time right now. We are using the MID function to extract the text and the FIND ; space to find the start position and FIND : for the end. We need to SUBSTITUTE these symbols with ~ as FIND does not allow us to find the 1st, 2nd, 3rd, etc. but SUBSTITUTE does. So we substitute the nth one and then find that.

    After we extract each part we add a semi-colon and space to the end of it.

    We wrap each one in an IFERROR to return nothing (this stops you from getting lots of semi-colons at the end if there aren't many codes.

    The formula starts with a slightly different function to find the first code. We can use LEFT here as the code is at the start of the cell, and just extract the text up to the first colon, excluding the colon.


    =LEFT(Diagnosis@row, FIND(":", Diagnosis@row) - 1) + "; "

    + IFERROR(MID(Diagnosis@row, FIND("~", SUBSTITUTE(Diagnosis@row, "; ", "~", 1)) + 1, FIND("~", SUBSTITUTE(Diagnosis@row, ":", "~", 2)) - FIND("~", SUBSTITUTE(Diagnosis@row, ";", "~", 1)) - 1) + "; ", "")

    + IFERROR(MID(Diagnosis@row, FIND("~", SUBSTITUTE(Diagnosis@row, "; ", "~", 2)) + 1, FIND("~", SUBSTITUTE(Diagnosis@row, ":", "~", 3)) - FIND("~", SUBSTITUTE(Diagnosis@row, ";", "~", 2)) - 1) + "; ", "")

    + IFERROR(MID(Diagnosis@row, FIND("~", SUBSTITUTE(Diagnosis@row, "; ", "~", 3)) + 1, FIND("~", SUBSTITUTE(Diagnosis@row, ":", "~", 4)) - FIND("~", SUBSTITUTE(Diagnosis@row, ";", "~", 3)) - 1) + "; ", "")

    + IFERROR(MID(Diagnosis@row, FIND("~", SUBSTITUTE(Diagnosis@row, "; ", "~", 4)) + 1, FIND("~", SUBSTITUTE(Diagnosis@row, ":", "~", 5)) - FIND("~", SUBSTITUTE(Diagnosis@row, ";", "~", 4)) - 1) + "; ", "")

    + IFERROR(MID(Diagnosis@row, FIND("~", SUBSTITUTE(Diagnosis@row, "; ", "~", 5)) + 1, FIND("~", SUBSTITUTE(Diagnosis@row, ":", "~", 6)) - FIND("~", SUBSTITUTE(Diagnosis@row, ";", "~", 5)) - 1) + "; ", "")


    I called my column with the full text in "Diagnosis", you will need to change the name to your column name


  • It worked ! it worked! It worked! Thanks a lot!!!!🤗

  • Now I wonder if it is possible to do something, I don't know if it is possible...

    I have a sheet (sheet 1) where there is a column for the alphanumeric codes and another column shows a numeric value that represents the weight of each code. For example, for I70.0 the weight is 0.124.

    I have another sheet (sheet 2) in which I have the codes grouped as you helped me obtain them, for example: I70.0; J43.9; J47.9; F33.0.

    Now, in this sheet 2 I need a column where I obtain the sum of the weight that all the codes that are in that cell have (I70.0; J43.9; J47.9; F33.0), based on the weight that appears for each code in sheet 1. For example, if in sheet 1 it appears that the weight for I70.0 is 0.124. for J43.9: 0.451, for J47.9 it is 0.325 and for F33.0 it is 0.144, then I want the total to be calculated and displayed in a column called RAF Score of sheet 2: for example: 1.044. I don't know how feasible this is, but if you can give me a hand I would greatly appreciate it.

  • KPH
    KPH ✭✭✭✭✭✭

    Awesome, I'm pleased that works. So long as the data meets the rules you should be good. Here are a couple of adaptations in case you need them:

    If you have a problem with rule 4 and find ~ is used in the text you can change the formula to replace ~ with any other character that is not used. *, |, etc. I would do this by pasting the formula into a text editor (not Word) and using search and replace so you don't miss any.

    If you need to break rule 5 and have more than 6 codes per cell you can extend the formula by adding this part for as many extra codes as you need. Increase the numbers in bold by 1 each time. This is the current last code which is between the 5th ; and 6th :

    + IFERROR(MID(Diagnosis@row, FIND("~", SUBSTITUTE(Diagnosis@row, "; ", "~", 5)) + 1, FIND("~", SUBSTITUTE(Diagnosis@row, ":", "~", 6)) - FIND("~", SUBSTITUTE(Diagnosis@row, ";", "~", 5)) - 1) + "; ", "")

    So you would add this for the next code:

    + IFERROR(MID(Diagnosis@row, FIND("~", SUBSTITUTE(Diagnosis@row, "; ", "~", 6)) + 1, FIND("~", SUBSTITUTE(Diagnosis@row, ":", "~", 7)) - FIND("~", SUBSTITUTE(Diagnosis@row, ";", "~", 6)) - 1) + "; ", "")

    -----------

    As for your second question, I have some ideas but it would depend on how many of these codes there are.

    I don't want to confuse this thread with another question though. Could you mark this question as answered and post a new question (if you tag me, I will see it when I'm online)? That way anyone else wanting to know how to extract text strings between two characters will be able to find the formula we have.

Help Article Resources

Want to practice working with formulas directly in Smartsheet?

Check out the Formula Handbook template!