How can numbers only be extracted from a cell which may have text, numbers and symbols?

Options

I have a column of ID's extracted from a larger string. I need to extract the numbers ONLY. The column looks like this:

The numbers will not always be exactly in the same starting position, nor will they always be the same length. There can be anywhere from 3-5 numbers. The preceding text may be between 5-6 characters. I do not want any text or symbols, just numbers.

The closest I was able to come extracted the numbers from cells that included a "J" but did not pick up the values from cells containing other characters. My formula is below. Is there a better way to do this?

=IFERROR(IF(AND(VALUE(LEFT([REV ID]@row, 5)) > 0), LEFT([REV ID]@row, 5)), IFERROR(IF(AND(VALUE(LEFT([REV ID]@row, 1)) > 0, ISTEXT(MID([REV ID]@row, 1, 5))), VALUE(LEFT([REV ID]@row, 5)), ""), ""))

Constance Fetter (she/her/elle)

Tags:

Best Answer

  • KPH
    KPH ✭✭✭✭✭✭
    Answer ✓
    Options

    Your description of the problem is excellent and manages to rule out most of the ideas that immediately spring to mind! 🤣

    I have assumed that the numbers are all in one string and there is never a case where there are letters among them. If you have letters in between numbers what I say below will not work.

    You can't use the VALUE, or ISNUMBER, or ISTEXT functions as the cell is going to be a text string, due to the text.

    You could use IFs with NOT and AND to SUBSTITUTE each of the 5-6 characters at the start and 1-x characters at the end with nothing IF they are NOT 0 AND NOT 1 AND NOT 2 etc. That is going to be quite lengthy checking each position to see if it is 1 of 10 things.

    I wanted to see if we could do anything to find the start and end of the numeric section, and have the follow suggestion:

    Step 1 - Find the first numeric in a string

    FIND is a great function and will identify the first instance of a character. We can use 9 of those to find the first numeric. We can't use a simple MIN FIND as FIND returns 0 if the character is not present (so any strings that don't have all 0-9 in will be 0). But we can combine it with an IF (double the work, unfortunately).

    We start with this formula to find the first position of the character 0

    =FIND("0", [REV ID]@row)

    Then wrap that in an IF to return blank instead of 0 if there are no 0s

    =IF(FIND("0", [REV ID]@row) = 0, "", FIND("0", [REV ID]@row))

    Then we make 9 more of those, 1 for each numeric character, and wrap them all in a MIN to return the lowest number.

    =MIN(IF(FIND("0", [REV ID]@row) = 0, "", FIND("0", [REV ID]@row)), IF(FIND("1", [REV ID]@row) = 0, "", FIND("1", [REV ID]@row)), IF(FIND("2", [REV ID]@row) = 0, "", FIND("2", [REV ID]@row)), IF(FIND("3", [REV ID]@row) = 0, "", FIND("3", [REV ID]@row)), IF(FIND("4", [REV ID]@row) = 0, "", FIND("4", [REV ID]@row)), IF(FIND("5", [REV ID]@row) = 0, "", FIND("5", [REV ID]@row)), IF(FIND("6", [REV ID]@row) = 0, "", FIND("6", [REV ID]@row)), IF(FIND("7", [REV ID]@row) = 0, "", FIND("7", [REV ID]@row)), IF(FIND("8", [REV ID]@row) = 0, "", FIND("8", [REV ID]@row)), IF(FIND("9", [REV ID]@row) = 0, "", FIND("9", [REV ID]@row)))

    Here is an example:

    Step 2 - Find the end of the numeric string within a string

    Sadly, we can't use MAX FIND to find the last number, as any duplicated numbers will be excluded. 00010 would return 1 for character 0 as it is first found in position 1, and 4 for character 1. So the MAX would be 4.

    Instead, we can count the number of numerics (which is actually more useful in the long run).

    Count the number of numeric digits in a string

    This formula will count the number of times the character 0 appears in the string. It takes the length of the string, and subtracts the length of the same string if the 0s had been substituted with nothing.

    =LEN([REV ID]@row) - LEN(SUBSTITUTE([REV ID]@row, "0", ""))

    Now we just need another 9 of those, and we add them all together to get the sum of all the 0s, 1s, 2s, 3s,…

    =LEN([REV ID]@row) - LEN(SUBSTITUTE([REV ID]@row, "1", "")) + LEN([REV ID]@row) - LEN(SUBSTITUTE([REV ID]@row, "2", "")) + LEN([REV ID]@row) - LEN(SUBSTITUTE([REV ID]@row, "3", "")) + LEN([REV ID]@row) - LEN(SUBSTITUTE([REV ID]@row, "4", "")) + LEN([REV ID]@row) - LEN(SUBSTITUTE([REV ID]@row, "5", "")) + LEN([REV ID]@row) - LEN(SUBSTITUTE([REV ID]@row, "6", "")) + LEN([REV ID]@row) - LEN(SUBSTITUTE([REV ID]@row, "7", "")) + LEN([REV ID]@row) - LEN(SUBSTITUTE([REV ID]@row, "8", "")) + LEN([REV ID]@row) - LEN(SUBSTITUTE([REV ID]@row, "9", "")) + LEN([REV ID]@row) - LEN(SUBSTITUTE([REV ID]@row, "0", ""))

    Here is how that looks:

    Step 3 - Extract the numeric digits from the middle of a string

    Now we know where the numbers start (step 1) and how many of them there are (step 2). Which leads to the perfect opportunity to use a MID function. The MID function returns a string of text from the middle of a cell and simply needs to know the input string, the starting position, and the number of characters. Like this:

    =MID([REV ID]@row, [1st number]@row, [Number of numbers]@row)

    Here is how that looks in the result column.

    Tidying up

    You don't need to keep those first two columns, now it all makes sense, you can just use the formulas that are in them within your final MID function. This is the combined formula.

    =MID([REV ID]@row, MIN(IF(FIND("0", [REV ID]@row) = 0, "", FIND("0", [REV ID]@row)), IF(FIND("1", [REV ID]@row) = 0, "", FIND("1", [REV ID]@row)), IF(FIND("2", [REV ID]@row) = 0, "", FIND("2", [REV ID]@row)), IF(FIND("3", [REV ID]@row) = 0, "", FIND("3", [REV ID]@row)), IF(FIND("4", [REV ID]@row) = 0, "", FIND("4", [REV ID]@row)), IF(FIND("5", [REV ID]@row) = 0, "", FIND("5", [REV ID]@row)), IF(FIND("6", [REV ID]@row) = 0, "", FIND("6", [REV ID]@row)), IF(FIND("7", [REV ID]@row) = 0, "", FIND("7", [REV ID]@row)), IF(FIND("8", [REV ID]@row) = 0, "", FIND("8", [REV ID]@row)), IF(FIND("9", [REV ID]@row) = 0, "", FIND("9", [REV ID]@row))), LEN([REV ID]@row) - LEN(SUBSTITUTE([REV ID]@row, "1", "")) + LEN([REV ID]@row) - LEN(SUBSTITUTE([REV ID]@row, "2", "")) + LEN([REV ID]@row) - LEN(SUBSTITUTE([REV ID]@row, "3", "")) + LEN([REV ID]@row) - LEN(SUBSTITUTE([REV ID]@row, "4", "")) + LEN([REV ID]@row) - LEN(SUBSTITUTE([REV ID]@row, "5", "")) + LEN([REV ID]@row) - LEN(SUBSTITUTE([REV ID]@row, "6", "")) + LEN([REV ID]@row) - LEN(SUBSTITUTE([REV ID]@row, "7", "")) + LEN([REV ID]@row) - LEN(SUBSTITUTE([REV ID]@row, "8", "")) + LEN([REV ID]@row) - LEN(SUBSTITUTE([REV ID]@row, "9", "")) + LEN([REV ID]@row) - LEN(SUBSTITUTE([REV ID]@row, "0", "")))

    And finally, if you want the number to be treated like a number so you can do math on it, you can wrap it in a VALUE function. However, this will remove any leading 0s.

    Let me know how you get in. I hope this is suitable. 🤞

Answers

  • KPH
    KPH ✭✭✭✭✭✭
    Answer ✓
    Options

    Your description of the problem is excellent and manages to rule out most of the ideas that immediately spring to mind! 🤣

    I have assumed that the numbers are all in one string and there is never a case where there are letters among them. If you have letters in between numbers what I say below will not work.

    You can't use the VALUE, or ISNUMBER, or ISTEXT functions as the cell is going to be a text string, due to the text.

    You could use IFs with NOT and AND to SUBSTITUTE each of the 5-6 characters at the start and 1-x characters at the end with nothing IF they are NOT 0 AND NOT 1 AND NOT 2 etc. That is going to be quite lengthy checking each position to see if it is 1 of 10 things.

    I wanted to see if we could do anything to find the start and end of the numeric section, and have the follow suggestion:

    Step 1 - Find the first numeric in a string

    FIND is a great function and will identify the first instance of a character. We can use 9 of those to find the first numeric. We can't use a simple MIN FIND as FIND returns 0 if the character is not present (so any strings that don't have all 0-9 in will be 0). But we can combine it with an IF (double the work, unfortunately).

    We start with this formula to find the first position of the character 0

    =FIND("0", [REV ID]@row)

    Then wrap that in an IF to return blank instead of 0 if there are no 0s

    =IF(FIND("0", [REV ID]@row) = 0, "", FIND("0", [REV ID]@row))

    Then we make 9 more of those, 1 for each numeric character, and wrap them all in a MIN to return the lowest number.

    =MIN(IF(FIND("0", [REV ID]@row) = 0, "", FIND("0", [REV ID]@row)), IF(FIND("1", [REV ID]@row) = 0, "", FIND("1", [REV ID]@row)), IF(FIND("2", [REV ID]@row) = 0, "", FIND("2", [REV ID]@row)), IF(FIND("3", [REV ID]@row) = 0, "", FIND("3", [REV ID]@row)), IF(FIND("4", [REV ID]@row) = 0, "", FIND("4", [REV ID]@row)), IF(FIND("5", [REV ID]@row) = 0, "", FIND("5", [REV ID]@row)), IF(FIND("6", [REV ID]@row) = 0, "", FIND("6", [REV ID]@row)), IF(FIND("7", [REV ID]@row) = 0, "", FIND("7", [REV ID]@row)), IF(FIND("8", [REV ID]@row) = 0, "", FIND("8", [REV ID]@row)), IF(FIND("9", [REV ID]@row) = 0, "", FIND("9", [REV ID]@row)))

    Here is an example:

    Step 2 - Find the end of the numeric string within a string

    Sadly, we can't use MAX FIND to find the last number, as any duplicated numbers will be excluded. 00010 would return 1 for character 0 as it is first found in position 1, and 4 for character 1. So the MAX would be 4.

    Instead, we can count the number of numerics (which is actually more useful in the long run).

    Count the number of numeric digits in a string

    This formula will count the number of times the character 0 appears in the string. It takes the length of the string, and subtracts the length of the same string if the 0s had been substituted with nothing.

    =LEN([REV ID]@row) - LEN(SUBSTITUTE([REV ID]@row, "0", ""))

    Now we just need another 9 of those, and we add them all together to get the sum of all the 0s, 1s, 2s, 3s,…

    =LEN([REV ID]@row) - LEN(SUBSTITUTE([REV ID]@row, "1", "")) + LEN([REV ID]@row) - LEN(SUBSTITUTE([REV ID]@row, "2", "")) + LEN([REV ID]@row) - LEN(SUBSTITUTE([REV ID]@row, "3", "")) + LEN([REV ID]@row) - LEN(SUBSTITUTE([REV ID]@row, "4", "")) + LEN([REV ID]@row) - LEN(SUBSTITUTE([REV ID]@row, "5", "")) + LEN([REV ID]@row) - LEN(SUBSTITUTE([REV ID]@row, "6", "")) + LEN([REV ID]@row) - LEN(SUBSTITUTE([REV ID]@row, "7", "")) + LEN([REV ID]@row) - LEN(SUBSTITUTE([REV ID]@row, "8", "")) + LEN([REV ID]@row) - LEN(SUBSTITUTE([REV ID]@row, "9", "")) + LEN([REV ID]@row) - LEN(SUBSTITUTE([REV ID]@row, "0", ""))

    Here is how that looks:

    Step 3 - Extract the numeric digits from the middle of a string

    Now we know where the numbers start (step 1) and how many of them there are (step 2). Which leads to the perfect opportunity to use a MID function. The MID function returns a string of text from the middle of a cell and simply needs to know the input string, the starting position, and the number of characters. Like this:

    =MID([REV ID]@row, [1st number]@row, [Number of numbers]@row)

    Here is how that looks in the result column.

    Tidying up

    You don't need to keep those first two columns, now it all makes sense, you can just use the formulas that are in them within your final MID function. This is the combined formula.

    =MID([REV ID]@row, MIN(IF(FIND("0", [REV ID]@row) = 0, "", FIND("0", [REV ID]@row)), IF(FIND("1", [REV ID]@row) = 0, "", FIND("1", [REV ID]@row)), IF(FIND("2", [REV ID]@row) = 0, "", FIND("2", [REV ID]@row)), IF(FIND("3", [REV ID]@row) = 0, "", FIND("3", [REV ID]@row)), IF(FIND("4", [REV ID]@row) = 0, "", FIND("4", [REV ID]@row)), IF(FIND("5", [REV ID]@row) = 0, "", FIND("5", [REV ID]@row)), IF(FIND("6", [REV ID]@row) = 0, "", FIND("6", [REV ID]@row)), IF(FIND("7", [REV ID]@row) = 0, "", FIND("7", [REV ID]@row)), IF(FIND("8", [REV ID]@row) = 0, "", FIND("8", [REV ID]@row)), IF(FIND("9", [REV ID]@row) = 0, "", FIND("9", [REV ID]@row))), LEN([REV ID]@row) - LEN(SUBSTITUTE([REV ID]@row, "1", "")) + LEN([REV ID]@row) - LEN(SUBSTITUTE([REV ID]@row, "2", "")) + LEN([REV ID]@row) - LEN(SUBSTITUTE([REV ID]@row, "3", "")) + LEN([REV ID]@row) - LEN(SUBSTITUTE([REV ID]@row, "4", "")) + LEN([REV ID]@row) - LEN(SUBSTITUTE([REV ID]@row, "5", "")) + LEN([REV ID]@row) - LEN(SUBSTITUTE([REV ID]@row, "6", "")) + LEN([REV ID]@row) - LEN(SUBSTITUTE([REV ID]@row, "7", "")) + LEN([REV ID]@row) - LEN(SUBSTITUTE([REV ID]@row, "8", "")) + LEN([REV ID]@row) - LEN(SUBSTITUTE([REV ID]@row, "9", "")) + LEN([REV ID]@row) - LEN(SUBSTITUTE([REV ID]@row, "0", "")))

    And finally, if you want the number to be treated like a number so you can do math on it, you can wrap it in a VALUE function. However, this will remove any leading 0s.

    Let me know how you get in. I hope this is suitable. 🤞

  • Constance Fetter
    Options

    Thank you! That worked perfectly.

    Constance Fetter (she/her/elle)

  • KPH
    KPH ✭✭✭✭✭✭
    Options

    Great news! Glad I could help - it was a fun brainteaser!

  • Hans1958
    Options

    Hi there,

    Hope I sound not like a smartass, but why so complicated?

    Just use the function ABS and you'll get just the figure.

    Best regards

    Hans ... from Germany

    www.hp-u.de

  • KPH
    KPH ✭✭✭✭✭✭
    Options

    Hi @Hans1958

    As far as I'm aware the ABS function needs to be fed a number and the REV ID that Constance has is a text string with a number within it. So ABS would reject it as INVALID DATA TYPE would it not?

  • Hans1958
    Options

    Hi KPH,

    Indeed, you are right! But I could have sworn that I once solved a similar issue when I needed the number of the duration at a project plan, there I used =ABS(…) and it pulled indeed e.g. 5 instead 5d.

    But since I tried now, … it seems I am mistaken! Sorry!

    Best regards

    Hans

  • KPH
    KPH ✭✭✭✭✭✭
    Options

    Hans, I am both pleased and disappointed 😀

    Pleased I didn't spend time solving a problem that didn't exist.

    But equally disappointed that there isn't an easy answer - doing this with one function would be awesome.

    I think ABS has a special relationship with duration columns. While it won't work on the text "5d" in a text/number column, I believe it does in a duration column (which is pretty cool for other use cases).

Help Article Resources

Want to practice working with formulas directly in Smartsheet?

Check out the Formula Handbook template!