Formula for Actual duration excluding duplication
Hi Community,
I am trying to calculate the actual duration starting from a list of Start and End dates.
What I am trying to achieve is to exclude duration overlaps in order to get the actual effort days only, but including not contiguous time ranges.
Please see example:
Start: 01/10/2019 End: 04/10/2019 = 4 working days effort October
Start: 01/11/2019 End: 15/11/2019 = 11 working days effort November
Start: 11/11/2019 End: 18/11/2019 = 6 working days effort November
Start: 15/11/2019 End: 22/11/2019 = 6 working days effort November
Start: 09/12/2019 End: 13/12/2019 = 5 working days effort December
Total effort = 32 days > this is NOT what I need
Actual days = 4+11+1[=65 overlap]+4[=62 overlap]+5= 20 days > this is what I need
Please consider that this requires to be completely automated through formula as it needs to be deployed into multiple very long backlogs.
I would really appreciate your help.
Thanks!
Comments

You could try something like:
=networkdays(min(collect(Startdate:Startdate,Startdate:Startdate,and(@cell >= date(2019,1,1),@cell <= date(2019,31,1)))), Max(collect(Enddate:Enddate,Enddate:Enddate,and(@cell >= date(2019,31,1),@cell <= date(2019,1,31
This should get you what you want for the month of January. If you have any issues understanding the formula let me know and i can break it down.

L@123,
So you would essentially have to replicate this formula for each month for the duration of the project, then sum those up?
I think what Alessandro is trying to do is figure out how many working days were used from the earliest start date to the latest end date, but only count each day once.
I also noticed the condition of being "completely automated", so from there you would need to be able to automate the dates within your formula to start at the first month of the project, end at the last month, and replicate for every month in between regardless of the number of months.
Also... What if we had a two day task at the beginning of the month and a two day task at the end? Your formula would pull the start for the first task and the end for the second which would give us the total number of working days for the entire month instead of just 4.
I feel like this has been worked out before. I just can't seem to find my notes on it. I am going to keep digging. In the mean time...
Thoughts?

Hi both, thank you for your kind feedback. Much appreciated you took time to respond.
The formula I am trying to generate should ideally be feeding one single entry of the sheet summary, Paul Newcome in this statement "I think what Alessandro is trying to do is figure out how many working days were used from the earliest start date to the latest end date, but only count each day once" is perfectly right.
To give you some more context I have a long sequential backlog for each project, some tasks are assigned to "content development team" some others to "production", "systems" and so on.
We have start and end dates for each task in the backlog, the difference between start and end dates provides the actual days count. I would need the net lead time for each team, hence, SUM actual days for each team but excluding the days (dates) when multiple activities performed by the same team are running in parallel. Start and end dates are already counting working days only, therefore, also the Actual days are not impacted by weekends or holidays. A possible approach would be to calculate the difference between the MIN start date and the MAX finish date for each team (this would eliminate actual days duplication) but automatically excluding the dates for tasks owned by other teams.
Hope the description above would make sense to you to better picture the context.
Many thanks in advance!

If my above assumptions are correct in that we need to figure out the number of working days (where each day is only counted once) across multiple rows for an unlimited amount of rows and variable start and end dates...
I believe I have a solution. It is built on the row level and then a final column is summed. I have it broken down into 6 helper columns.
.
The basics...
A column to designate a row number.
A column that pulls the lowest row number for all rows that overlap that particular row.
A column the pulls the highest row number for all rows that overlap that particular row.
A column to pull the Start Date based on the low number.
A column to pull the End Date based on the high number.
A final column to figure out the NETWORKDAYS between the new start and end dates.
.
Multiple rows that overlap are accounted for.
.
HERE is a published version of the sheet...
.
Feedback is appreciated.

Hi @Paul,
thank you for the time taken to address this solution.
I shared a comment yesterday to provide further context, however, it took some time for the community moderators to approve it. It is now visible though.
I confirm your assumptions are correct and perfectly consistent with the goal I am pursuing.
The published version is a great walk though to implement the solution. I will test it this afternoon and share a feedback with you at the earliest convenience.
I am very grateful for your valuable contribution.
I'll keep you posted.

Alessandro,
I saw where you wanted this grand total of days rolled up into a sheet summary field, but I also saw where you wanted to include the requirements of essentially filtering by team. As long as the team is established on each row, you could use a basic SUMIFS in the sheet summary to pull by team.
=SUMIFS(NETWORKDAYS:NETWORKDAYS, [Team Column]:[Team Column], "Team A")
.
I'm glad you found the published explanation helpful. Please don't hesitate to ask if you need any clarification on anything. I do hope it works for you.

Hi Paul,
all seems to work correctly. I will keep testing but it looks great.
I am so grateful for your brilliant solution and for your kind help.
I will keep you posted if any further clarification shall be needed.
For now, many thanks again and all the best!

Hi Paul,
hope you are well.
I had time to test the formula in more detail on projects where multiple activities are running in parallel and I realised something doesn't look right, unfortunately.
As you can notice in the attached screenshot network days is including days when activities are running in parallel, this results in overestimating the total sum.
A good example is the number 8 highlighted, the date range for the number 8 is entirely covered by the row above, therefore, I assume that number 8 should be equal to 0.
Could you kindly suggest any amendment to the formula that may fix the issue?
(for your reference AS and AF stand for Actual start and Actual finish and the first two columns on the left are the data source for all the new columns on the right side)
Many thanks

That is a scenario I didn't even think about. My apologies. I will need to do some more testing, and then I will get back to you.

Any help would be greatly appreciated.
Many thanks in advance for your kind effort.

Alessandro,
Double check your formulas and layout. When I used those same dates from your screenshot with the formulas unchanged in my published sheet above, and my "New" start and end dates were not the same as your "New" start and end dates...
Ignore the above. I plugged an extra date in to my sheet to make the new dates match your new dates (should have thought of that first duh). Check the above published sheet. I update the formulas in the Min and Max columns.

Hi Paul,
Many thanks for your kind response.
The two new formulas for Min and Max seem to correctly produce the output to resolve the duplication issue where data source is found (Production AS and AF in the screenshot below). But where source data is missing the formula returns #NO MATCH.
No MATCH is, I assume, due to the fact that the data source (Production AS and AF) is linked to the main backlog through the following =IF(Role26 = "Production", [Actual Start]26), this is meant to shortlist dates that are related to Production Team only and calculate NETWORK DAYS uniquely for one team at the time. In fact, my goal is to remove effort duplication for one team at the time as different teams are working in parallel.
Now, the New Start and New End dates (in the screenshot below named New Production AS and New Production AF) seem not to respond to the new Min and Max formulas. Both columns entirely return #NO MATCH either when a Min & Max is found and when data source (Production AS and AF) is missing.
Do you believe some amendments to the New Start and New End dates may resolve the conflict?
I really feel you are getting very close to the solution, this should really be the last conflict.
As usual, many thanks in advance for your great help. Much appreciated.
Alessandro

Ok. No worries. This is actually pretty straight forward. The fact that the Production AS field is blank is what is causing the issue. We just need to put in an edit that says to only run the formula if the Production AS is a date.
.
IF(ISDATE([Production AS]@row),
.
Take the above (comma and all) and insert it immediately after the initial = in each of these columns...
Min
Max
New Production AS
.
I have also updated this in the Published Sheet as well for your reference.

Hi Paul,
this has sorted all the #NO MATCH conflicts.
I have tested the entire formula structure in one of the most complex backlogs and I realised that some duplication are still returned by the formula unfortunately. This produces a higher number of Network days when summed up. I have attached a screenshot that exemplify the output.
However, I tested the formulas in more simple backlogs and it correctly excludes duplicates and returns the correct network days summary output. No clue how this can be possible. I copied and pasted all formulas from the same sheet for both backlogs, therefore, I would exclude syntax errors.
Many thanks in advance if you might come up with any further recommendation on how to amend the syntax.
Help Article Resources
Categories
Check out the Formula Handbook template!