Breaking down large sheets to improve system performance

This question is an off-shoot of a discussion that started here:

I have run into a limitation in Data shuttle- my uploads time out because the target sheet has too many formulas to process with new data uploads. Tier 2 support offered some suggestions, but the only one that is close to viable is to off-load the formula analysis to a secondary sheet. Here are the parameters of the project that I am trying to maintain:

1) Need to be able to Update and Add into existing data (move automations would break this, so cannot be part of the solution)

2) Need to be able to tabulate final metrics across 20+ teams that use the primary source data

3) Need to recognize when source data is no longer "in progress" and can be removed from the source sheet (a move automation in lieu of an auto-delete is fine here)

4) Need to compare the primary data source (running around 5000 rows currently) with 2 other data sources, so I have to be cognizant of the lookup limitation of 100,000 links.

So, on a conceptual level, can anyone recommend a process whereby I can have a data source in Sheet 1 that can be analyzed in Sheet 2, even though Sheet 1 will change in length every day (adds and removes) and receive updates periodically throughout the lifespan of the task?

Can't post examples because I have no idea where to start, so right now this would be a theoretical discussion.

Here is a basic analysis I have to perform

Sheet 1: Source. Includes patient name and order type

Sheet 2: The theoretical sheet I am trying to develop that does all the analysis

Sheet 3: Lookup source- identifies if the order is preventing billing

Sheet 4: Lookup source- identifies if the order has been received


Theoretical Sequence:

1) Data Shuttle uploads to Sheet 1

2) Sheet 2 registers new additions to Sheet 1 (needs to be dynamic, but not create duplicates). This creates the list of values to be analyzed.

3) Sheet 2 looks up against Sheets 3 and 4 to identify which orders are outstanding, holding billing, or received.

4) Sheet 2 feeds reports to the respective teams for tracking and updating

5) Sheet 1 is updated daily with adds, updates and Sheet 2 needs to stay in sync

Thanks for any ideas or recommendations.

Best Answer

  • AFlint
    AFlint ✭✭✭✭
    Answer ✓

    Update: So, it took a couple of weeks, and a lot of help from support, but here is the outcome of the situation.

    1) Expressions will not support cross-sheet references, at least I couldn't get them to

    2) We did an early upgrade of our levels and started leveraging the additional apps that became available.

    Datamesh- reduced the need for lookup formulas, but introduced different limitations. For anyone considering Datamesh, be aware of the limitation of 20K cell links- this means that 2 columns of cell links reduces your sheet capacity to 10K rows, 4 columns reduces to 5K rows, etc. Using a copy mesh vs. a link seems to mitigate this.

    Datatable- we went ahead and added this as well. The overall capacity bump helped for the raw data, but this now requires extra hops in the data movement. I would consider this a net neutral trade over the need for multiple analysis sheets that I started out with. Now, I upload the raw data into Datatable, then I pull it into an analysis sheet where I apply all the formulas- so far no capacity issues on formulas. However, because Datatable can hold more than 20K rows, I have to still break my analysis into multiple sheets, since the sheets can only pull in 20K rows. However, it is easier on the upload side because I'm not running into capacity issues on the upload.

    Overall, for my fairly complex analysis, the only option was the higher package (and higher cost). I could have probably created a lot of workarounds to avoid the need to upgrade, but it would have resulted in a rat's nest of sheets, links, formulas, etc. Overall, this was a case of "right tool for the job".

    I hope this discussion helps others, even though it was a fairly unique situation for me. I find this community extremely helpful, so I want to share when I can.

Answers

  • Paul Newcome
    Paul Newcome ✭✭✭✭✭✭

    How dynamic do your formulas need to be? Are there some that are basically just running once and then they just sit as relatively static data? If so, you could use expressions in Data Shuttle to replace those formulas. Basically the formula used in the expression runs one time when the workflow is run and then only the static data is input into the sheet.


    If the result of that expression changes for an existing row, it will update when the workflow is run again and still be in place as static data.

    For example... Your sheet has 3 columns: Employee Name, Employee Email, and Manager Name. But those managers are also listed as "employees". You can leverage the fact that the manager is also listed in the employee column and use a lookup type formula to pull the manager's email into a 4th column for every employee.

    If you use the INDEX/MATCH as an Expression in Data Shuttle, it will apply the output as static data in the Manager Email column.

    But if someone changes managers and that change is reflected in the data being used as the source for the workflow, when the workflow runs it will update that manager's email address accordingly and still be considered static data.

    Using this method removes the formula from the sheet and frees up quite a few of those cell references that are available.


    Of course that is a rather specific example, but that is only one of many possibilities. Basically if there is a formula that only needs to run when the workflow does, you can move it from the sheet to an expression and free up some space.

  • AFlint
    AFlint ✭✭✭✭

    @Paul Newcome

    That's a good point- I am still learning how to leverage expressions. When I was trained in Data Shuttle, they discouraged me from using expressions, recommending that I put more of the formulas in the sheet.

    I do have several formulas that might work, though.

    1) Joining patient first name with patient last name to last name, first name

    2) Joining doctor's first name with doctor's last name to last name, first name

    Your example of a lookup could help, but my lookups all go to other sheets- how do I do that in an expression? How do I reference the range on the lookup sheet?

  • Paul Newcome
    Paul Newcome ✭✭✭✭✭✭

    The joining of names would work as an expression very well. You would type it in exactly as you would in your sheet except you would use an expression in Data Shuttle. Syntax and everything would be exactly the same. You could even go so far as to copy/paste the working formula from the sheet to Data Shuttle. Right there are two columns worth of formulas you could get rid of.


    I haven't tested cross sheet references in expressions before, so I cannot promise that they would work.

  • AFlint
    AFlint ✭✭✭✭
    Answer ✓

    Update: So, it took a couple of weeks, and a lot of help from support, but here is the outcome of the situation.

    1) Expressions will not support cross-sheet references, at least I couldn't get them to

    2) We did an early upgrade of our levels and started leveraging the additional apps that became available.

    Datamesh- reduced the need for lookup formulas, but introduced different limitations. For anyone considering Datamesh, be aware of the limitation of 20K cell links- this means that 2 columns of cell links reduces your sheet capacity to 10K rows, 4 columns reduces to 5K rows, etc. Using a copy mesh vs. a link seems to mitigate this.

    Datatable- we went ahead and added this as well. The overall capacity bump helped for the raw data, but this now requires extra hops in the data movement. I would consider this a net neutral trade over the need for multiple analysis sheets that I started out with. Now, I upload the raw data into Datatable, then I pull it into an analysis sheet where I apply all the formulas- so far no capacity issues on formulas. However, because Datatable can hold more than 20K rows, I have to still break my analysis into multiple sheets, since the sheets can only pull in 20K rows. However, it is easier on the upload side because I'm not running into capacity issues on the upload.

    Overall, for my fairly complex analysis, the only option was the higher package (and higher cost). I could have probably created a lot of workarounds to avoid the need to upgrade, but it would have resulted in a rat's nest of sheets, links, formulas, etc. Overall, this was a case of "right tool for the job".

    I hope this discussion helps others, even though it was a fairly unique situation for me. I find this community extremely helpful, so I want to share when I can.