Sign in to submit new ideas and vote
Get Started

EXTRACT function: select data between defined tags

Lucas Rayala
Lucas Rayala ✭✭✭✭✭✭

Create an EXTRACT function to extract data between defined beginning and end tags.

=EXTRACT(start string, end string, target cell/range)

USE CASE

While there are many use cases for this function, I see it as a way to facilitate “lightweight” cross-sheet referencing. This function would allow you to package data from one page in a single column, with tags around the data elements, then reference that single column on another page. In the target page, the tightly packed data can be accessed using the EXTRACT function.

For instance, if in your source sheet Column A contains “Cat” and Column B contains “Pizza”, you could concatenate the data in Column C (using a formula) so it looks like this:

<cA>Cat<\cA><cB>Pizza<\cB>

Then, you use a cross sheet reference to bring this column to your target sheet, into a column named “Source Data”. This is a simple scenario, but imagine you are sending a dozen or even a hundred column’s worth of data over.

In your target sheet, you would then be able to easily reference defined pieces of that data with the new EXTRACT function:

=EXTRACT(“<cA>”, “<\cA>”, [Source Data]@row)

The formula is saying, “In the column “Source Data” , retrieve the information between the tags “<cA>” and “<\cA>”.

A formula like this could really simplify complicated cross-sheet interactions and significantly lighten the computational load for big sheets that are slamming into the 25M barrier. This extraction can already be done with a series of nested LEFT/RIGHT/FIND functions, but it’s ugly and complicated.

4
4 votes

Idea Submitted · Last Updated

Comments

  • Paul Newcome
    Paul Newcome ✭✭✭✭✭✭

    There are a couple of other options that can be used to make life a little easier using MID/FIND/SUBSTITUTE combos or UNICHAR characters.


    SUBSTITUTE method:

    Join all cells on the source sheet using the same delimiter (start and finish the string with that same character to really make life easier). Make sure it is not something that is going to already be included in the data. You are also going to need to pick a second character that won't be included in any of the string data. I will use "!" as the string delimiter and "~" as the second character.

    Bring your string over to the target sheet, and then use something along the lines of

    =MID(String@row, FIND("~", SUBSTITUTE(String@row, "!", "~", 1)) + 1, FIND("~", SUBSTITUTE(String@row, "!", "~", 2)) - (FIND("~", SUBSTITUTE(String@row, "!", "~", 1))))


    That first SUBSTITUTE function replaces the first "!" with "~" making the "~" the only one in the string. That makes it really easy to FIND it and add 1 to establish our starting position for the MID function. The second SUBSTITUTE does the same thing but swaps out the second "!". The FIND function allows us to establish the stopping point. Once we subtract the starting point from the stopping point we have the number of characters for the MID function.


    So within the three SUBSTITUTE functions we have 1 / 2 / 1. T. get the second piece from the string we update it to 2 / 3 / 2. The third piece from the string is 3 / 4 / 3, so on and so forth. If it is feasible, you can even use helper cells with references to allow you to use it as a column formula if you are parsing down a column or dragfill if you are parsing across a row by changing the hard coded numbers to

    [Number Column]@row / [Number Column]@row + 1 / [Number Column]@row.


    .


    UNICHAR method:

    This is very similar to the method above, but I use this when the string I want to pull together isn't in the same order as the sheet or has extra cells in between to the point where I can't use a JOIN function on the source sheet. Since I can't use the JOIN function and will have to manually input my delimiters anyway, I usually use different ones across the string, but I make it ones that are easy to use.

    =UNICHAR(9601) + [First Column]@row + UNICHAR(9602) + [Second Column]@row + UNICHAR(9603)


    From there we use the same MID/FIND combo but without needing the SUBSTITUTE function.

    =MID(String@row, FIND(UNICHAR(9601), String@row) + 1, FIND(UNICHAR(9602), String@row - (FIND(UNICHAR(9601), String@row) + 1))


    You can also use cell references to automate which UNICHAR to look for using a very similar method (same logic).

    =MID(String@row, FIND(UNICHAR(9600 + [Number Column]@row), String@row) + 1, FIND(UNICHAR(9600 + [Number Column]@row + 1), String@row - (FIND(UNICHAR(9600 + [Number Column]@row), String@row) + 1))


    .


    It is still more complex than an EXTRACT function, but it is the cleanest way I have found to meet this particular challenge (so far).

  • Lucas Rayala
    Lucas Rayala ✭✭✭✭✭✭
    edited 06/10/23

    These are great @Paul Newcome . I especially like this:

    =MID(String@row, FIND(UNICHAR(9601), String@row) + 1, FIND(UNICHAR(9602), String@row - (FIND(UNICHAR(9601), String@row) + 1))

    Seeing it, I have one small tweak to make it shorter and easier to visually parse. We use UNICHAR heavily for visual flags, and only just realized it’s a lot easier to parse a formula when you input the character directly instead of using the function, and it works the same. This is especially helpful in some of our summary fields where we may have a dozen UNICHAR. So your excellent and super clean MID solution becomes:

    =MID(String@row, FIND(🅰️, String@row) + 1, FIND(🅱️, String@row - (FIND(🅰️, String@row) + 1))

  • Paul Newcome
    Paul Newcome ✭✭✭✭✭✭

    @Lucas Rayala That only works if you know how to type those directly (which I don't). 😅


    I personally also like using the UNICHAR because I typically use the [Number Column] cell references to adjust them dynamically so I don't have to manually change the delimiters for every formula. I can either apply it as a column formula or dragfill across a row and be done with it.