Extracting Data From Cell
Hello all,
I am looking to see if it is possible to extract specific data from a single cell.
I have an app for a time clock that my workers use, and within that, they allocate their time for better tracking and metrics on how jobs are performing. The Data is spit out in a single cell as per the below.
If I had a list of my "Cost Codes", which are formatted like this " [Major Prep/Concrete] - Leveling/Patching"
What gets spit out in a single cell is something like this: "[Ceramics] - Floor Tile (03:00)[Ceramics] - Wall Tile (03:00)"… so for instance, this particular worker spent 3 hours installing floor tile, and 3 hours on wall tile…
Is there a formula, that if i had a list of my cost codes, that it would pull the relevant times out across multiple employees? If this could do so by date and job name too, that would be great… Screen shot below for reference.
Answers
-
If your text strings are consistently formatted, which it looks like they are, you can use things like the square brackets and colon within the time as "way markers" to extract the data that is between/around them.
The main functions you will need will be MID and FIND.
LEFT, RIGHT, and SUBSTITUTE will also be helpful.
Example 1 - First cost code
The LEFT formula is the easiest to start with - just tell it where to look and how many characters to look for. For example, this (used in Column3 in my example) says look at the Cost Codes column and return the first 10 characters:
=LEFT([Cost Codes]@row, 10)
Because the codes won't always be 10 characters you can replace the fixed number "10" with the FIND function. FIND will count the number of characters before a specific character - just tell it what to look for and where to look and it will count the number of characters to that point.
=FIND("]", [Cost Codes]@row)
Then you put them together like this:
=LEFT([Cost Codes]@row, FIND("]", [Cost Codes]@row))
And you have the first code code extracted, no matter how long it is:
-
Thanks! And a great start.
Sorry if I wasn't clear. I want to be able to extract the time within the brackets and then assign it to that cost code in a list. My end goal is to be able to have a sum if of all the allocated time per Cost code. per job per day essentially. My sumifs for jobs and days are easy enough and I understand those, it was trying to extract the number of hours per cost code when there are multiple cost codes in a cell that I was having issues with.
-
I get it. The formula needed is unique to your data, so I was hoping to lay the building blocks for you so you could adapt it if needed. I started with cost code as that is the easiest to explain. Next I was going to show how to use SUBSTITUTE to get the second cost code and then MID for the time - which is going to be more complicated as you have parenthesis in some other places as well so will need to use the colon (assuming there are no other colons). You'll have to play around with the basic principles and see how you can tweak them to work with your data as there may be things we are not aware of from the example share, so I wanted to start with one part and see how that went. I'll share the time part next if that is your preference. Just give me a minute to type out the explanation.
-
Example 2 - First time
The MID function will find the time for you. Just like LEFT, you tell it where to look, how many characters to return, but also where to start. In this case it starts at character 10 and extracts the next 2 characters:
=MID([Cost Codes]@row, 10, 2)
And our friend FIND, from example 1 can help with where to start, as this varies by row.
Usually the start would be the character before the time, but as you have parentheses elsewhere, I am going to use the colon. If you have other colons you will need to try something else, but assuming you do not….
=FIND(":", [Cost Codes]@row)
This tells us where the first colon in the string is.
So now you can combine FIND and MID to extract the time.
=MID([Cost Codes]@row, FIND(":", [Cost Codes]@row) - 2, 5)
This means start at the location of the colon minus 2 (ie, include the 2 digits before the colon) and extract 5 characters (the 2 before the colon, the colon, and the 2 after the colon).
You get this:
Now you have the first cost code and the first time and can SUMUF the time based on the code.
But would have some rows with a second cost code and time. To extract those into other columns you can harness the power of the SUBSTITUTE function, which lets you find the nth time a thing appears (so you look for the second [ and the second :.
-
Example 3 - Finding the second cost code
You can adapt the formula in example 2 to look for [ rather than : However, it will return the first cost code, which may or may not always be consistent. To find the second one we need to be a little sneaky. We can use the SUBSTITUTE function to change the second [ into something else. This needs to be something unique. Here I am using *. If * appears in your data choose something that does not.
This says replace [ with * the 2nd time it appears in the string (the 2 means the 2nd time).
=SUBSTITUTE([Cost Codes]@row, "[", "*", 2)
It does this
So now we can FIND the position of the *, and we can do this all at once by making the SUBSTITUTE function the text to search in the FIND function, and * as the thing to find:
=FIND("*", SUBSTITUTE([Cost Codes]@row, "[", "*", 2))
And then we can use this in our MID function:
=MID([Cost Codes]@row, FIND("*", SUBSTITUTE([Cost Codes]@row, "[", "*", 2)), 10)
This means extract 10 characters from Cost Codes from the point the * is if the second [ was changed into a *.
And we wrap it in an IFERROR so it returns nothing on those rows where there is not a second [
=IFERROR(MID([Cost Codes]@row, FIND("*", SUBSTITUTE([Cost Codes]@row, "[", "*", 2)), 10), "")
And that looks great because both codes are 10 characters. They won't always be, look what happens if the are shorter or longer:
So there is one final step to change the 10 characters to extract after the [ to a dynamic number. And I bet you know what does that? - FIND! Or in this case two FINDs as we need to find the position of the second ] and subtract the position of the second [ and then add 1 to get the length of the string between the second [ and ].
So our final formula is:
=IFERROR(MID([Cost Codes]@row, FIND("*", SUBSTITUTE([Cost Codes]@row, "[", "*", 2)), FIND("*", SUBSTITUTE([Cost Codes]@row, "]", "*", 2)) - FIND("*", SUBSTITUTE([Cost Codes]@row, "[", "*", 2)) + 1), "")
You can combine the formula in example 2 and 3 to extract the second time stamp.
-
How did you get on @Nathan_FOF ? Does that do what you need? Are you OK summing the results (hint: the same LEFT/MID and FIND functions will be useful when it comes to summing the times).
-
Hi KPH,
I'm out of the office this week, but I am excited to implement it once back next week! Been wrapping my head around the breakdown of the formula though. I trust that it will work out!
Thanks so much for following up though. I will be sure to reach out once I get working on it.
Thanks
- Nathan
-
Thanks for letting me know. It is one of those formulas that you need to understand to be able to adapt it if the data format changes. But once you get it a whole world of possibilities opens up. Enjoy!
Categories
- All Categories
- 14 Welcome to the Community
- Smartsheet Customer Resources
- 62.1K Get Help
- 348 Global Discussions
- 199 Industry Talk
- 427 Announcements
- 4.4K Ideas & Feature Requests
- 133 Brandfolder
- 127 Just for fun
- 127 Community Job Board
- 455 Show & Tell
- 28 Member Spotlight
- 1 SmartStories
- 282 Events
- 36 Webinars
- 7.3K Forum Archives