Data Visualizer app: slightly different result depending on the setting of the parameter queries

jetisco4u · 26 January 2023 11:49

THE SCENARIO

I have 5 level organization units from top to bottom like:
Level 1 - State
Level 2 - LGA
Level 3 - Ward
Level 4 - Community
Level 5 - Facilities with 300+ facilities as FAC at level 5

Also I have three organisation unit groups as OUG1, OUG2 and OUG3.

The dataset have 180 data elements DE1, DE2, DE3, DE4, DE5…DE180 divided into sections.

Section 1 have 80 questions or DEs. Out of these 80, 54 has maximum score of 1 for each DE others are not scored. So when a user fill the form the maximum score they can get for this section is 54*1 = 54.

Suppose FAC 1-110 belongs to OUG1, FAC 111-200 belongs to OUG2 and FAC 201-300 belongs to OUG3.

The total score of DE value that FAC 1-110 in OUG1 can have is 54 with SUM as aggregateType for the data element. I also set the DE aggregateLevel as 5 which is the lowest OU. Data is collected at level 5 of the OU.

WHAT WE WANT

We need an indicator to aggregate the score of the facility for this section one using the aggregate sum of the score of the facility out of the 54 expected maximum score as numerator and 54 as denominator. I define my indicator as below. So for each facility,

Numerator = DE1 + DE2 + DE4 + DE5+…+DE54
Denominator = 54
Indicator type is percentage.

Manually when I take the total DE score aggregated at level 5 from the 106 facilities that submitted data among the 110 FAC in OUG1 as numerator and divide it by the 106 times 54, multiply by 100, I got the expected result.

ie 106 * (score from each facility) / (106 * 54) * 100

In which case no facility can have more than 54 score.

So the next requirement is to aggregate the indicator result at the level 1 using average of the percentages scores from level 5.

I will use the 2 case scenario below to explain the different result I am getting. I created a visualizer for the percentage indicator from the OUG1 facilities called Section 1 Indicator and use average as aggregation type. I have a slightly different result depending on the setting of the parameter queries.

For confidentiality, I will blank out name of facilities or DEs. For both visualizer settings below:

i. Data represent the Section1 Indicator which is (facility score / 54 * 100)
ii. Period is quarterly
iii. Organisation Unit is all facilities filterd by OUG1 i.e Health Post like below

iv. Under Option tab I select Average for Aggregation type. Tick Column total and Hide empty row like below:

SCENARIO 1 RESULT - Using organisation unit as Filter

The above image is the result for section1 indicator from OUG1 (Health Post groups) facilities.

In this case I put the Data under Column, Organisation Unit under Filter and Period under Row. The Section1 Indicator or average percentage score return 64.2%.

SCENARIO 2 - Using organisation unit as Row

The above image is the result for section1 indicator from OUG1 (Health Post groups) facilities. In this case I put the Data under Column, Organisation Unit under Row and
Period under Filter. The Section1 Indicator or average percentage score return 62.9%.

Significantly just by changing the position of the organisation unit, I got 1.3% in the value different. Meanwhile when I did a manual calculation of all the scores and divide by the expected denominator I got the 62.9% of the Scenario 2.

MY TAKE

Scenario 2 is the correct expected result. However, to get this result as a visualizer at level one you will need to move the OU to Row which will list all the facilities. But in our use case, we just wanted the aggregate result at the State level which is level 1. That is we want to be able to hide the facilities and see the result only for the state.
I am also curious as to how the system arrive at the 64.2% result when the OU is moved to the filter. I have tried all possible cases manually but couldn’t arrive at this result.
How does moving the organization unit from Row to Filter impact on aggregation at the top level i.e level 1?

Gassim · 30 January 2023 14:46

Good question and observations! Thank you @jetisco4u! Hopefully someone from the community experts or @dhis2-analytics will help us out. I’m triaging this post.

And @jetisco4u please mention the version number of the DHIS2 instance that you are using - just in case this needs testing. Thanks!!

austin · 31 January 2023 04:20

@jetisco4u thank you for reporting - can you reproduce this in an accessible DHIS2 environment like play.dhis2.org and share a link? It will be difficult for the team to investigate without knowing more about the underlying data, period, and org units you have selected and how your system is configured. It is also important to share the DHIS2 version number as @gassim mentioned.

@Gassim should we move this new question to a separate topic?

jetisco4u · 31 January 2023 06:47

2.39.0.1

jetisco4u · 31 January 2023 06:51

I am not sure if I can recreate this in play.dhis2.org but I can share something with more insight of the data maybe with images and other details. would it be ok to continue this conversation or create a new thread for this?

Gassim · 31 January 2023 13:53

Yes, @austin! Done! Thanks!

@jetisco4u

When we’re not able to reproduce the issue then it’s most probably not a bug, and this means it might be tied directly to the instance itself whether from the metadata configuration, database, or instance setup.

This could help a lot and give a hint, thanks! The best details are ones that could lead us to understand what’s causing the issue and this is why reproducing the issue is important, but if you can provide as much details as possible it will help.

Thank you!

jetisco4u · 31 January 2023 20:53

Hello @Gassim I edited the original post with more details.
Thanks for moving this to a new page. I will edit the question later with more insight.

jetisco4u · 10 February 2023 16:52

I edited the original post with more details.

jetisco4u · 10 February 2023 16:53

THE SCENARIO

I have 5 level organization units from top to bottom like:
Level 1 - State
Level 2 - LGA
Level 3 - Ward
Level 4 - Community
Level 5 - Facilities with 300+ facilities as FAC at level 5

Also I have three organisation unit groups as OUG1, OUG2 and OUG3.

The dataset have 180 data elements DE1, DE2, DE3, DE4, DE5…DE180 divided into sections.

Section 1 have 80 questions or DEs. Out of these 80, 54 has maximum score of 1 for each DE others are not scored. So when a user fill the form the maximum score they can get for this section is 54*1 = 54.

Suppose FAC 1-110 belongs to OUG1, FAC 111-200 belongs to OUG2 and FAC 201-300 belongs to OUG3.

The total score of DE value that FAC 1-110 in OUG1 can have is 54 with SUM as aggregateType for the data element. I also set the DE aggregateLevel as 5 which is the lowest OU. Data is collected at level 5 of the OU.

WHAT WE WANT

We need an indicator to aggregate the score of the facility for this section one using the aggregate sum of the score of the facility out of the 54 expected maximum score as numerator and 54 as denominator. I define my indicator as below. So for each facility,

Numerator = DE1 + DE2 + DE4 + DE5+…+DE54
Denominator = 54
Indicator type is percentage.

Manually when I take the total DE score aggregated at level 5 from the 106 facilities that submitted data among the 110 FAC in OUG1 as numerator and divide it by the 106 times 54, multiply by 100, I got the expected result.

ie 106 * (score from each facility) / (106 * 54) * 100

In which case no facility can have more than 54 score.

So the next requirement is to aggregate the indicator result at the level 1 using average of the percentages scores from level 5.

I will use the 2 case scenario below to explain the different result I am getting. I created a visualizer for the percentage indicator from the OUG1 facilities called Section 1 Indicator and use average as aggregation type. I have a slightly different result depending on the setting of the parameter queries.

For confidentiality, I will blank out name of facilities or DEs. For both visualizer settings below:

i. Data represent the Section1 Indicator which is (facility score / 54 * 100)
ii. Period is quarterly
iii. Organisation Unit is all facilities filterd by OUG1 i.e Health Post like below

iv. Under Option tab I select Average for Aggregation type. Tick Column total and Hide empty row like below:

SCENARIO 1 RESULT - Using organisation unit as Filter

The above image is the result for section1 indicator from OUG1 (Health Post groups) facilities.

In this case I put the Data under Column, Organisation Unit under Filter and Period under Row. The Section1 Indicator or average percentage score return 64.2%.

SCENARIO 2 - Using organisation unit as Row

The above image is the result for section1 indicator from OUG1 (Health Post groups) facilities. In this case I put the Data under Column, Organisation Unit under Row and
Period under Filter. The Section1 Indicator or average percentage score return 62.9%.

Significantly just by changing the position of the organisation unit, I got 1.3% in the value different. Meanwhile when I did a manual calculation of all the scores and divide by the expected denominator I got the 62.9% of the Scenario 2.

MY TAKE

Scenario 2 is the correct expected result. However, to get this result as a visualizer at level one you will need to move the OU to Row which will list all the facilities. But in our use case, we just wanted the aggregate result at the State level which is level 1. That is we want to be able to hide the facilities and see the result only for the state.
I am also curious as to how the system arrive at the 64.2% result when the OU is moved to the filter. I have tried all possible cases manually but couldn’t arrive at this result.
How does moving the organization unit from Row to Filter impact on aggregation at the top level i.e level 1?

Matthew_Boddie · 10 February 2023 17:20

because of the relatively small difference in values, I’d be curious if this is a rounding issue, or more generally the averaging component of this pivot. An easy test of that theory I might play around with the indicator type and option–>Aggregation Type. Changing the Indicatory Type to Number (allowing, potentially, for more decimals to exist) and changing the aggregation type to Sum instead of Average, I’d be curious if differences still existed. @jetisco4u

jetisco4u · 12 February 2023 23:57

I doubt if this is just rounding issue because the difference is significant. I also select the Skip rounding box.

I applied percentage to the Indicator Type because the DE Aggregation Type is sum. And I need the percentage. Meanwhile I wanted to get the average of the percentages from the facilities. So, I can’t change the indicator type to number. Or is it that I am missing what you mean by Number?

I needed the average of the percentage values and not sum. So I select the Average.

Matthew_Boddie · 14 February 2023 20:54

@jetisco4u Makes sense–you might be right re: rounding not having enough potential impact. I understand you need these to be percentages and averages, my point is just to make a momentary switch to confirm that the discrepancy still exists.