EXTRACT function: select data between defined tags

Lucas Rayala · 06/10/23

Create an EXTRACT function to extract data between defined beginning and end tags.

=EXTRACT(start string, end string, target cell/range)

USE CASE

While there are many use cases for this function, I see it as a way to facilitate “lightweight” cross-sheet referencing. This function would allow you to package data from one page in a single column, with tags around the data elements, then reference that single column on another page. In the target page, the tightly packed data can be accessed using the EXTRACT function.

For instance, if in your source sheet Column A contains “Cat” and Column B contains “Pizza”, you could concatenate the data in Column C (using a formula) so it looks like this:

<cA>Cat<\cA><cB>Pizza<\cB>

Then, you use a cross sheet reference to bring this column to your target sheet, into a column named “Source Data”. This is a simple scenario, but imagine you are sending a dozen or even a hundred column’s worth of data over.

In your target sheet, you would then be able to easily reference defined pieces of that data with the new EXTRACT function:

=EXTRACT(“<cA>”, “<\cA>”, [Source Data]@row)

The formula is saying, “In the column “Source Data” , retrieve the information between the tags “<cA>” and “<\cA>”.

A formula like this could really simplify complicated cross-sheet interactions and significantly lighten the computational load for big sheets that are slamming into the 25M barrier. This extraction can already be done with a series of nested LEFT/RIGHT/FIND functions, but it’s ugly and complicated.

Paul Newcome · 06/10/23

There are a couple of other options that can be used to make life a little easier using MID/FIND/SUBSTITUTE combos or UNICHAR characters.

SUBSTITUTE method:

Join all cells on the source sheet using the same delimiter (start and finish the string with that same character to really make life easier). Make sure it is not something that is going to already be included in the data. You are also going to need to pick a second character that won't be included in any of the string data. I will use "!" as the string delimiter and "~" as the second character.

Bring your string over to the target sheet, and then use something along the lines of

=MID(String@row, FIND("~", SUBSTITUTE(String@row, "!", "~", 1)) + 1, FIND("~", SUBSTITUTE(String@row, "!", "~", 2)) - (FIND("~", SUBSTITUTE(String@row, "!", "~", 1))))

That first SUBSTITUTE function replaces the first "!" with "~" making the "~" the only one in the string. That makes it really easy to FIND it and add 1 to establish our starting position for the MID function. The second SUBSTITUTE does the same thing but swaps out the second "!". The FIND function allows us to establish the stopping point. Once we subtract the starting point from the stopping point we have the number of characters for the MID function.

So within the three SUBSTITUTE functions we have 1 / 2 / 1. T. get the second piece from the string we update it to 2 / 3 / 2. The third piece from the string is 3 / 4 / 3, so on and so forth. If it is feasible, you can even use helper cells with references to allow you to use it as a column formula if you are parsing down a column or dragfill if you are parsing across a row by changing the hard coded numbers to

[Number Column]@row / [Number Column]@row + 1 / [Number Column]@row.

.

UNICHAR method:

This is very similar to the method above, but I use this when the string I want to pull together isn't in the same order as the sheet or has extra cells in between to the point where I can't use a JOIN function on the source sheet. Since I can't use the JOIN function and will have to manually input my delimiters anyway, I usually use different ones across the string, but I make it ones that are easy to use.

=UNICHAR(9601) + [First Column]@row + UNICHAR(9602) + [Second Column]@row + UNICHAR(9603)

From there we use the same MID/FIND combo but without needing the SUBSTITUTE function.

=MID(String@row, FIND(UNICHAR(9601), String@row) + 1, FIND(UNICHAR(9602), String@row - (FIND(UNICHAR(9601), String@row) + 1))

You can also use cell references to automate which UNICHAR to look for using a very similar method (same logic).

=MID(String@row, FIND(UNICHAR(9600 + [Number Column]@row), String@row) + 1, FIND(UNICHAR(9600 + [Number Column]@row + 1), String@row - (FIND(UNICHAR(9600 + [Number Column]@row), String@row) + 1))

.

It is still more complex than an EXTRACT function, but it is the cleanest way I have found to meet this particular challenge (so far).

Lucas Rayala · 06/10/23

These are great @Paul Newcome . I especially like this:

=MID(String@row, FIND(UNICHAR(9601), String@row) + 1, FIND(UNICHAR(9602), String@row - (FIND(UNICHAR(9601), String@row) + 1))

Seeing it, I have one small tweak to make it shorter and easier to visually parse. We use UNICHAR heavily for visual flags, and only just realized it’s a lot easier to parse a formula when you input the character directly instead of using the function, and it works the same. This is especially helpful in some of our summary fields where we may have a dozen UNICHAR. So your excellent and super clean MID solution becomes:

=MID(String@row, FIND(🅰️, String@row) + 1, FIND(🅱️, String@row - (FIND(🅰️, String@row) + 1))

Paul Newcome · 06/10/23

@Lucas Rayala That only works if you know how to type those directly (which I don't). 😅

I personally also like using the UNICHAR because I typically use the [Number Column] cell references to adjust them dynamically so I don't have to manually change the delimiters for every formula. I can either apply it as a column formula or dragfill across a row and be done with it.

Lucas Rayala · 06/11/23

https://community.smartsheet.com/discussion/comment/379989#Comment_379989

I just copy and paste :)

Gina Smith · 05/28/24

@Lucas Rayala - I am trying to implement the solution you listed above but I cannot determine how to build this formula

you could concatenate the data in Column C (using a formula) so it looks like this:

<cA>Cat<\cA><cB>Pizza<\cB>

Smartsheet does not have a CONCATENATE function. In order to use JOIN, Smartsheet does not like me including the <tags> <closing tag>. I need a formula to convert to a column formula. How do I create the formula?

Thank you!

Gina Smith · 05/28/24

@Lucas Rayala - also, when I try to use the EXTRACT function in my destination sheet, Smartsheet does not bring it up as a formula. This is my formula and it keeps telling me it is not right. Do you know if Smartsheet deprecated the EXTRACT command?

=EXTRACT("<SBSD>","<SBSD2>", Breaks@row)

Paul Newcome · 05/28/24

@Gina Smith The EXTRACT function has never existed in Smartsheet. This is a product idea / suggestion for the devs.

To concatenate in Smartsheet without using the JOIN function, you simply "add" the pieces together similar to the UNICHAR method I mentioned above.

Gina Smith · 05/28/24

@Paul Newcome - ah, that makes sense now. I have been testing your first method since I sent that last message but am struggling to get the MID/FINDS/SUBSTITUTES to work. Getting ready to test the UNICHAR method. I am familiar with using JOIN so I can get the columns in the source sheet to concatenate. Just trying to figure out how to extract them in the destination sheet as sometimes column values in the source sheet will be blank so accomodating for that as well is giving me a bit of a challenge.

Paul Newcome · 05/28/24

@Gina Smith I find the UNICHAR method to be less flexible but easier to work with overall. The SUBSTITUTE method works best for outputting a vertical list and when using a helper column to do so.

UNICHAR method:

This is very similar to the method above, but I use this when the string I want to pull together isn't in the same order as the sheet or has extra cells in between to the point where I can't use a JOIN function on the source sheet. Since I can't use the JOIN function and will have to manually input my delimiters anyway, I usually use different ones across the string, but I make it ones that are easy to use.

=UNICHAR(9601) + [First Column]@row + UNICHAR(9602) + [Second Column]@row + UNICHAR(9603)

From there we use the same MID/FIND combo but without needing the SUBSTITUTE function.

=MID(String@row, FIND(UNICHAR(9601), String@row) + 1, FIND(UNICHAR(9602), String@row - (FIND(UNICHAR(9601), String@row) + 1))

Gina Smith · 05/28/24

@Paul Newcome - thank you! I finally got the UNICHAR method to work!

Lucas Rayala · 05/30/24

hi @Gina Smith - sorry I didn’t see your comment earlier, but looks like @Paul Newcome got to where you needed to be!

Maz Uddin · 10/21/24

@Paul Newcome

Hi Paul i have the following text in a cell but want to seperate it by the comma

I am struggling to apply the logic above against this

pleaes can you advise?

George Hudson Tower, Thomas Frye Court, Edward Heylin Court, John Wetherby Court, John Wetherby Court East, John Wetherby Court West

Paul Newcome · 10/21/24

@Maz Uddin Which part are you struggling with?

Sign in to submit new ideas and vote

EXTRACT function: select data between defined tags

Idea Submitted · Last Updated 06/10/23

Comments

Categories