Missing values in calculating indicators

I see that when indicators are calculated, records where either the numerator or denominator is missing are not excluded. For me this would be ideal default behavior so that the indicator isn’t thrown wildly off by assuming no reporting is the same as reporting 0. I could imagine some times when one would want an unreported numerator to be treated as 0 (if it’s likely that it isn’t reported, because it doesn’t exist), so maybe an option to override default in the indicator definition screen would be good. I can’t think of a scenario where one would want missing denominators to be treated as 0.

I don’t know if others are running into this, or if it’s only me. If there is a way to set this already let me know.

Thanks,

Tom

···

This message was scanned for viruses with Trend Micro ScanMail, GFI MailSecurity and GFI MailEssentials by the World Health Organization Regional Office for the Western Pacific. However, the recipient is advised to scan this e-mail and any attached files for viruses.

Disclaimer:

This e-mail, together with any attachments, is intended for the named recipients only and is confidential. It may also be privileged or otherwise protected by law.

If you have received it in error, please notify the sender immediately by reply e-mail and delete it and any attachments from your system. You may not copy or disclose its contents to anyone.

Hi Tom,

You can consult the DHIS2 mailing list archives for long and torturous email threads on this issue and it has come up many times.

What would the desired behavior be in your case? Do you mean, that something like

NA + 1 + 2 +3 = NA

This is not the default behavior at all in DHIS2, and most of the time NAs/NULLS will be coerced to zero. So, the default behavior in DHIS2 is

NA + 1 + 2+ 3 = 6

In cases when the NAs are important, we have always had to resort to SQL/R/Stata to perform the aggregation. But maybe there is a better solution from someone else?

Best regards,

Jason

···

On Thu, May 31, 2012 at 8:13 AM, hiattt@wpro.who.int wrote:

I see that when indicators are calculated, records where either the numerator or denominator is missing are not excluded. For me this would be ideal default behavior so that the indicator isn’t thrown wildly off by assuming no reporting is the same as reporting 0. I could imagine some times when one would want an unreported numerator to be treated as 0 (if it’s likely that it isn’t reported, because it doesn’t exist), so maybe an option to override default in the indicator definition screen would be good. I can’t think of a scenario where one would want missing denominators to be treated as 0.

I don’t know if others are running into this, or if it’s only me. If there is a way to set this already let me know.

Thanks,

Tom


This message was scanned for viruses with Trend Micro ScanMail, GFI MailSecurity and GFI MailEssentials by the World Health Organization Regional Office for the Western Pacific. However, the recipient is advised to scan this e-mail and any attached files for viruses.

Disclaimer:

This e-mail, together with any attachments, is intended for the named recipients only and is confidential. It may also be privileged or otherwise protected by law.

If you have received it in error, please notify the sender immediately by reply e-mail and delete it and any attachments from your system. You may not copy or disclose its contents to anyone.


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

Hi Jason,

Sorry I’m in the dark on some of these debates. Please let me know if I’m getting annoying on any of my questions!

My problem is slightly different. It’s more like this example for a specific period:

Org unit Numerator Denominator

O1 50 100

O2 40 50

O3 80 NA

For indicator X = (numerator / denominator) * 100

What I would like to see is (50 + 40) / (100 + 50) = 60%

What I’m seeing is (50 + 40 + 80) / (100 + 50) = 113%

Does that make sense? Again I could see a debate on whether to include rows with missing numerator data, but I can’t think of a scenario where you would want rows with missing denominator data.

Thanks again, and please do let me know if I’m abusing the mailing list. You all have been so responsive and I appreciate it, but I don’t want to wear out my welcome!

Tom

···

From: Jason Pickering [mailto:jason.p.pickering@gmail.com]

Sent: 31 May 2012 14:39

To: Hiatt, Mr Tom (WPRO)

Cc: dhis2-users@lists.launchpad.net

Subject: Re: [Dhis2-users] Missing values in calculating indicators

Hi Tom,

You can consult the DHIS2 mailing list archives for long and torturous email threads on this issue and it has come up many times.

What would the desired behavior be in your case? Do you mean, that something like

NA + 1 + 2 +3 = NA

This is not the default behavior at all in DHIS2, and most of the time NAs/NULLS will be coerced to zero. So, the default behavior in DHIS2 is

NA + 1 + 2+ 3 = 6

In cases when the NAs are important, we have always had to resort to SQL/R/Stata to perform the aggregation. But maybe there is a better solution from someone else?

Best regards,

Jason

On Thu, May 31, 2012 at 8:13 AM, hiattt@wpro.who.int wrote:

I see that when indicators are calculated, records where either the numerator or denominator is missing are not excluded. For me this would be ideal default behavior so that the indicator isn’t thrown wildly off by assuming no reporting is the same as reporting 0. I could imagine some times when one would want an unreported numerator to be treated as 0 (if it’s likely that it isn’t reported, because it doesn’t exist), so maybe an option to override default in the indicator definition screen would be good. I can’t think of a scenario where one would want missing denominators to be treated as 0.

I don’t know if others are running into this, or if it’s only me. If there is a way to set this already let me know.

Thanks,

Tom


This message was scanned for viruses with Trend Micro ScanMail, GFI MailSecurity and GFI MailEssentials by the World Health Organization Regional Office for the Western Pacific. However, the recipient is advised to scan this e-mail and any attached files for viruses.

Disclaimer:

This e-mail, together with any attachments, is intended for the named recipients only and is confidential. It may also be privileged or otherwise protected by law.

If you have received it in error, please notify the sender immediately by reply e-mail and delete it and any attachments from your system. You may not copy or disclose its contents to anyone.


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp


This message was scanned for viruses with Trend Micro ScanMail, GFI MailSecurity and GFI MailEssentials by the World Health Organization Regional Office for the Western Pacific. However, the recipient is advised to scan this e-mail and any attached files for viruses.

Disclaimer:

This e-mail, together with any attachments, is intended for the named recipients only and is confidential. It may also be privileged or otherwise protected by law.

If you have received it in error, please notify the sender immediately by reply e-mail and delete it and any attachments from your system. You may not copy or disclose its contents to anyone.

Hi Tom,

regarding indicators we do produce the aggregated value if the numerator is not existing or zero by default. The reason is that we often connect with 3rd party tools like Excel which will do their own aggregation based on data e.g. on the lowest level.

If you prefer to omit indicators with zero numerators you can go to system settings and set the “Omit indicator values with zero numerator value in data mart” setting to true.

We of course ignore indicators where the denominator is not existing/zero as it would be illegal to calculate.

Lars

···

On Thu, May 31, 2012 at 8:13 AM, hiattt@wpro.who.int wrote:

I see that when indicators are calculated, records where either the numerator or denominator is missing are not excluded. For me this would be ideal default behavior so that the indicator isn’t thrown wildly off by assuming no reporting is the same as reporting 0. I could imagine some times when one would want an unreported numerator to be treated as 0 (if it’s likely that it isn’t reported, because it doesn’t exist), so maybe an option to override default in the indicator definition screen would be good. I can’t think of a scenario where one would want missing denominators to be treated as 0.

I don’t know if others are running into this, or if it’s only me. If there is a way to set this already let me know.

Thanks,

Tom


This message was scanned for viruses with Trend Micro ScanMail, GFI MailSecurity and GFI MailEssentials by the World Health Organization Regional Office for the Western Pacific. However, the recipient is advised to scan this e-mail and any attached files for viruses.

Disclaimer:

This e-mail, together with any attachments, is intended for the named recipients only and is confidential. It may also be privileged or otherwise protected by law.

If you have received it in error, please notify the sender immediately by reply e-mail and delete it and any attachments from your system. You may not copy or disclose its contents to anyone.


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

Thanks, Lars. Sorry for the ambiguity in my original question. I would still like to have the indicator calculated for each time period, I would just like it to omit from the calculation org units that didn’t report the numerator.

Thanks, for pointing out that setting. That’s also very useful.

Tom

···

From: Lars Helge Øverland [mailto:larshelge@gmail.com]

Sent: 31 May 2012 15:24

To: Hiatt, Mr Tom (WPRO)

Cc: dhis2-users@lists.launchpad.net

Subject: Re: [Dhis2-users] Missing values in calculating indicators

Hi Tom,

regarding indicators we do produce the aggregated value if the numerator is not existing or zero by default. The reason is that we often connect with 3rd party tools like Excel which will do their own aggregation based on data e.g. on the lowest level.

If you prefer to omit indicators with zero numerators you can go to system settings and set the “Omit indicator values with zero numerator value in data mart” setting to true.

We of course ignore indicators where the denominator is not existing/zero as it would be illegal to calculate.

Lars

On Thu, May 31, 2012 at 8:13 AM, hiattt@wpro.who.int wrote:

I see that when indicators are calculated, records where either the numerator or denominator is missing are not excluded. For me this would be ideal default behavior so that the indicator isn’t thrown wildly off by assuming no reporting is the same as reporting 0. I could imagine some times when one would want an unreported numerator to be treated as 0 (if it’s likely that it isn’t reported, because it doesn’t exist), so maybe an option to override default in the indicator definition screen would be good. I can’t think of a scenario where one would want missing denominators to be treated as 0.

I don’t know if others are running into this, or if it’s only me. If there is a way to set this already let me know.

Thanks,

Tom


This message was scanned for viruses with Trend Micro ScanMail, GFI MailSecurity and GFI MailEssentials by the World Health Organization Regional Office for the Western Pacific. However, the recipient is advised to scan this e-mail and any attached files for viruses.

Disclaimer:

This e-mail, together with any attachments, is intended for the named recipients only and is confidential. It may also be privileged or otherwise protected by law.

If you have received it in error, please notify the sender immediately by reply e-mail and delete it and any attachments from your system. You may not copy or disclose its contents to anyone.


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp


This message was scanned for viruses with Trend Micro ScanMail, GFI MailSecurity and GFI MailEssentials by the World Health Organization Regional Office for the Western Pacific. However, the recipient is advised to scan this e-mail and any attached files for viruses.

Disclaimer:

This e-mail, together with any attachments, is intended for the named recipients only and is confidential. It may also be privileged or otherwise protected by law.

If you have received it in error, please notify the sender immediately by reply e-mail and delete it and any attachments from your system. You may not copy or disclose its contents to anyone.

Okay I understand. I think supporting your requirement would def make sense. I am only worried this will incur a huge performance penalty to the aggregation process.

···

On May 31, 2012 9:51 AM, hiattt@wpro.who.int wrote:

Thanks, Lars. Sorry for the ambiguity in my original question. I would still like to have the indicator calculated for each time period, I would just like it to omit from the calculation org units that didn’t report the numerator.

Thanks, for pointing out that setting. That’s also very useful.

Tom


From: Lars Helge Øverland [mailto:larshelge@gmail.com]

Sent: 31 May 2012 15:24

To: Hiatt, Mr Tom (WPRO)

Cc: dhis2-users@lists.launchpad.net

Subject: Re: [Dhis2-users] Missing values in calculating indicators

Hi Tom,

regarding indicators we do produce the aggregated value if the numerator is not existing or zero by default. The reason is that we often connect with 3rd party tools like Excel which will do their own aggregation based on data e.g. on the lowest level.

If you prefer to omit indicators with zero numerators you can go to system settings and set the “Omit indicator values with zero numerator value in data mart” setting to true.

We of course ignore indicators where the denominator is not existing/zero as it would be illegal to calculate.

Lars

On Thu, May 31, 2012 at 8:13 AM, hiattt@wpro.who.int wrote:

I see that when indicators are calculated, records where either the numerator or denominator is missing are not excluded. For me this would be ideal default behavior so that the indicator isn’t thrown wildly off by assuming no reporting is the same as reporting 0. I could imagine some times when one would want an unreported numerator to be treated as 0 (if it’s likely that it isn’t reported, because it doesn’t exist), so maybe an option to override default in the indicator definition screen would be good. I can’t think of a scenario where one would want missing denominators to be treated as 0.

I don’t know if others are running into this, or if it’s only me. If there is a way to set this already let me know.

Thanks,

Tom


This message was scanned for viruses with Trend Micro ScanMail, GFI MailSecurity and GFI MailEssentials by the World Health Organization Regional Office for the Western Pacific. However, the recipient is advised to scan this e-mail and any attached files for viruses.

Disclaimer:

This e-mail, together with any attachments, is intended for the named recipients only and is confidential. It may also be privileged or otherwise protected by law.

If you have received it in error, please notify the sender immediately by reply e-mail and delete it and any attachments from your system. You may not copy or disclose its contents to anyone.


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp


This message was scanned for viruses with Trend Micro ScanMail, GFI MailSecurity and GFI MailEssentials by the World Health Organization Regional Office for the Western Pacific. However, the recipient is advised to scan this e-mail and any attached files for viruses.

Disclaimer:

This e-mail, together with any attachments, is intended for the named recipients only and is confidential. It may also be privileged or otherwise protected by law.

If you have received it in error, please notify the sender immediately by reply e-mail and delete it and any attachments from your system. You may not copy or disclose its contents to anyone.

Hi Lars and Tom,

As Lars mentions, this is a very common problem with aggregation in DHIS2. In the case you highlight here, you want to exlcude the value which has NA as a denominator. We started having big problems with this with PMTCT indicators, when we saw huge values. This was because of systematic under-reporting of the denominator value, leading to many numerators with no corresponding denominator. (This is further complicated by the fact that the validation rules do not respond as you might expect and makes this problem hard to identify.

Lets say we were going to have a validation rule like

Numerator <= Denominator

This is fairly typical for many indicators in DHIS2

Now, in the case of the validations rules

50 <= NA evaluates to FALSE/NULL

In this case, the NA is not treated as a zero, although in the case of aggregation which you highlight, the NA is treated effectively as a zero.

The case which you and Lars both describe is quite simple. It becomes complicated with composite denominators, which may consist of multiple operands.

So, if you have an indicator like

DE1 + DE2 / DE3 + DE4

Now, lets say the values are as follows

10+20 / 30 + NA

What does that NA really mean? Is it something which has not been reported? Is it a zero which was entered as a zero but never recorded in the database? There are not really any clear cut answers, and you are going to have to try and solve this problem through a number of approaches, such as use of compulsory data elements, storing zeros, validation rules, and likely, external aggregation in tools such as R which will allow you a lot more control over how the aggregation takes place. This is the only solution I have come up with, but it is certainly far from ideal.

I think this comes down, once again, to the problem with handling NULLs. In certain cases we need to be able to exlcude NULLS entirely from the calculation. But with indicators it is tricky, as we take the sum of the numerators and then the sum of the denominators and divide them. With what you are proposing Tom, we would need to exclude certain denominator values (or a set of operands) entirely based on whether or not a numerator (or member of set of operands which compose a numerator value) is NULL. Definitely doable, and probably a good idea, but as Lars points out, it is going to most certainly affect performance. However, there is little point in calculating values quickly, if they are not correct. I certainly have come up against these same problems before Tom, and I do think we need to do some careful thinking around this issue.

Best regards,

Jason

···

On Thu, May 31, 2012 at 8:56 AM, hiattt@wpro.who.int wrote:

Hi Jason,

Sorry I’m in the dark on some of these debates. Please let me know if I’m getting annoying on any of my questions!

My problem is slightly different. It’s more like this example for a specific period:

Org unit Numerator Denominator

O1 50 100

O2 40 50

O3 80 NA

For indicator X = (numerator / denominator) * 100

What I would like to see is (50 + 40) / (100 + 50) = 60%

What I’m seeing is (50 + 40 + 80) / (100 + 50) = 113%

Does that make sense? Again I could see a debate on whether to include rows with missing numerator data, but I can’t think of a scenario where you would want rows with missing denominator data.

Thanks again, and please do let me know if I’m abusing the mailing list. You all have been so responsive and I appreciate it, but I don’t want to wear out my welcome!

Tom


From: Jason Pickering [mailto:jason.p.pickering@gmail.com]

Sent: 31 May 2012 14:39

To: Hiatt, Mr Tom (WPRO)

Cc: dhis2-users@lists.launchpad.net

Subject: Re: [Dhis2-users] Missing values in calculating indicators

Hi Tom,

You can consult the DHIS2 mailing list archives for long and torturous email threads on this issue and it has come up many times.

What would the desired behavior be in your case? Do you mean, that something like

NA + 1 + 2 +3 = NA

This is not the default behavior at all in DHIS2, and most of the time NAs/NULLS will be coerced to zero. So, the default behavior in DHIS2 is

NA + 1 + 2+ 3 = 6

In cases when the NAs are important, we have always had to resort to SQL/R/Stata to perform the aggregation. But maybe there is a better solution from someone else?

Best regards,

Jason

On Thu, May 31, 2012 at 8:13 AM, hiattt@wpro.who.int wrote:

I see that when indicators are calculated, records where either the numerator or denominator is missing are not excluded. For me this would be ideal default behavior so that the indicator isn’t thrown wildly off by assuming no reporting is the same as reporting 0. I could imagine some times when one would want an unreported numerator to be treated as 0 (if it’s likely that it isn’t reported, because it doesn’t exist), so maybe an option to override default in the indicator definition screen would be good. I can’t think of a scenario where one would want missing denominators to be treated as 0.

I don’t know if others are running into this, or if it’s only me. If there is a way to set this already let me know.

Thanks,

Tom


This message was scanned for viruses with Trend Micro ScanMail, GFI MailSecurity and GFI MailEssentials by the World Health Organization Regional Office for the Western Pacific. However, the recipient is advised to scan this e-mail and any attached files for viruses.

Disclaimer:

This e-mail, together with any attachments, is intended for the named recipients only and is confidential. It may also be privileged or otherwise protected by law.

If you have received it in error, please notify the sender immediately by reply e-mail and delete it and any attachments from your system. You may not copy or disclose its contents to anyone.


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp


This message was scanned for viruses with Trend Micro ScanMail, GFI MailSecurity and GFI MailEssentials by the World Health Organization Regional Office for the Western Pacific. However, the recipient is advised to scan this e-mail and any attached files for viruses.

Disclaimer:

This e-mail, together with any attachments, is intended for the named recipients only and is confidential. It may also be privileged or otherwise protected by law.

If you have received it in error, please notify the sender immediately by reply e-mail and delete it and any attachments from your system. You may not copy or disclose its contents to anyone.

Hi Jason and Lars,

Thank you for the thoughtful responses. I agree with you this is similar to the NULL issue, but I think it’s helpful to separate them into sub-scenarios as you’ve done.

Does this summarize?

NULL issues

  1. NAs when summing variables (indicators or variable dimensions to make the variable(total))

  2. Treat NAs as 0s: 1+2+3+NA=6

  3. Treat NAs as NA: 1+2+3+NA=NA

  4. Treat NAs as 0s unless all are NA: 0+NA+NA+NA=0 and NA+NA+NA+NA=NA

  5. Treat NAs as 0s sometimes and NAs sometimes: IFNULL(var1,0)+IFNULL(var2,NA)

  6. Treatment of NAs in evaluating rules

  7. NAs are considered 0s in rules

  8. NAs always make rules fail

  9. NAs are ignored in rules

  10. Treatment of NAs when calculating numerator/denominator indicators (also depends on settings in 1.)

  11. Drop numerator and denominator for an orgunit-timeperiod when calculating if either are NA

  12. Drop numerator and denominator for an orgunit-timeperiod when calculating only if denominator is NA

  13. Drop numerator and denominator for an orgunit-timeperiod when calculating only if both are NA

I don’t pretend to understand all the implications of this on the backend, but it certainly makes sense to me as Jason said that wrong indicators are as useful as no indicators (and even detrimental if misleading).

I think the current DHIS behavior (1a, 2b, 3c) is justifiable (except for 3c which seems the wrong option of the three). I suspect that fairly granular control would be required not only to accommodate different users of DHIS, but also to accommodate different needs within one DHIS instance, but perhaps default behaviour is as far as you want to take it in the system.

I was speaking with Knut the other day about the possibilities of incorporating R into DHIS. I don’t know if this would this be a way to take this heavy lifting off the database, but it would be great (for me at least!) to have that kind of control that R provides. Or maybe the best scenario is simply a way for R to put output back into DHIS and then any tailored aggregating can be done outside of the system.

For me, at least for the time being, I will just make sure to not calculate indicators unless rules are in place to clean the data first before calculating.

Thanks again,

Tom

···

From: Jason Pickering [mailto:jason.p.pickering@gmail.com]

Sent: 01 June 2012 14:31

To: Hiatt, Mr Tom (WPRO)

Cc: dhis2-users@lists.launchpad.net

Subject: Re: [Dhis2-users] Missing values in calculating indicators

Hi Lars and Tom,

As Lars mentions, this is a very common problem with aggregation in DHIS2. In the case you highlight here, you want to exlcude the value which has NA as a denominator. We started having big problems with this with PMTCT indicators, when we saw huge values. This was because of systematic under-reporting of the denominator value, leading to many numerators with no corresponding denominator. (This is further complicated by the fact that the validation rules do not respond as you might expect and makes this problem hard to identify.

Lets say we were going to have a validation rule like

Numerator <= Denominator

This is fairly typical for many indicators in DHIS2

Now, in the case of the validations rules

50 <= NA evaluates to FALSE/NULL

In this case, the NA is not treated as a zero, although in the case of aggregation which you highlight, the NA is treated effectively as a zero.

The case which you and Lars both describe is quite simple. It becomes complicated with composite denominators, which may consist of multiple operands.

So, if you have an indicator like

DE1 + DE2 / DE3 + DE4

Now, lets say the values are as follows

10+20 / 30 + NA

What does that NA really mean? Is it something which has not been reported? Is it a zero which was entered as a zero but never recorded in the database? There are not really any clear cut answers, and you are going to have to try and solve this problem through a number of approaches, such as use of compulsory data elements, storing zeros, validation rules, and likely, external aggregation in tools such as R which will allow you a lot more control over how the aggregation takes place. This is the only solution I have come up with, but it is certainly far from ideal.

I think this comes down, once again, to the problem with handling NULLs. In certain cases we need to be able to exlcude NULLS entirely from the calculation. But with indicators it is tricky, as we take the sum of the numerators and then the sum of the denominators and divide them. With what you are proposing Tom, we would need to exclude certain denominator values (or a set of operands) entirely based on whether or not a numerator (or member of set of operands which compose a numerator value) is NULL. Definitely doable, and probably a good idea, but as Lars points out, it is going to most certainly affect performance. However, there is little point in calculating values quickly, if they are not correct. I certainly have come up against these same problems before Tom, and I do think we need to do some careful thinking around this issue.

Best regards,

Jason

On Thu, May 31, 2012 at 8:56 AM, hiattt@wpro.who.int wrote:

Hi Jason,

Sorry I’m in the dark on some of these debates. Please let me know if I’m getting annoying on any of my questions!

My problem is slightly different. It’s more like this example for a specific period:

Org unit Numerator Denominator

O1 50 100

O2 40 50

O3 80 NA

For indicator X = (numerator / denominator) * 100

What I would like to see is (50 + 40) / (100 + 50) = 60%

What I’m seeing is (50 + 40 + 80) / (100 + 50) = 113%

Does that make sense? Again I could see a debate on whether to include rows with missing numerator data, but I can’t think of a scenario where you would want rows with missing denominator data.

Thanks again, and please do let me know if I’m abusing the mailing list. You all have been so responsive and I appreciate it, but I don’t want to wear out my welcome!

Tom


From:
Jason Pickering [mailto:jason.p.pickering@gmail.com]

Sent: 31 May 2012 14:39

To: Hiatt, Mr Tom (WPRO)

Cc:
dhis2-users@lists.launchpad.net

Subject: Re: [Dhis2-users] Missing values in calculating indicators

Hi Tom,

You can consult the DHIS2 mailing list archives for long and torturous email threads on this issue and it has come up many times.

What would the desired behavior be in your case? Do you mean, that something like

NA + 1 + 2 +3 = NA

This is not the default behavior at all in DHIS2, and most of the time NAs/NULLS will be coerced to zero. So, the default behavior in DHIS2 is

NA + 1 + 2+ 3 = 6

In cases when the NAs are important, we have always had to resort to SQL/R/Stata to perform the aggregation. But maybe there is a better solution from someone else?

Best regards,

Jason

On Thu, May 31, 2012 at 8:13 AM, <hiattt@wpro.who.int > wrote:

I see that when indicators are calculated, records where either the numerator or denominator is missing are not excluded. For me this would be ideal default behavior so that the indicator isn’t thrown wildly off by assuming no reporting is the same as reporting 0. I could imagine some times when one would want an unreported numerator to be treated as 0 (if it’s likely that it isn’t reported, because it doesn’t exist), so maybe an option to override default in the indicator definition screen would be good. I can’t think of a scenario where one would want missing denominators to be treated as 0.

I don’t know if others are running into this, or if it’s only me. If there is a way to set this already let me know.

Thanks,

Tom


This message was scanned for viruses with Trend Micro ScanMail, GFI MailSecurity and GFI MailEssentials by the World Health Organization Regional Office for the Western Pacific. However, the recipient is advised to scan this e-mail and any attached files for viruses.

Disclaimer:

This e-mail, together with any attachments, is intended for the named recipients only and is confidential. It may also be privileged or otherwise protected by law.

If you have received it in error, please notify the sender immediately by reply e-mail and delete it and any attachments from your system. You may not copy or disclose its contents to anyone.


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp


This message was scanned for viruses with Trend Micro ScanMail, GFI MailSecurity and GFI MailEssentials by the World Health Organization Regional Office for the Western Pacific. However, the recipient is advised to scan this e-mail and any attached files for viruses.

Disclaimer:

This e-mail, together with any attachments, is intended for the named recipients only and is confidential. It may also be privileged or otherwise protected by law.

If you have received it in error, please notify the sender immediately by reply e-mail and delete it and any attachments from your system. You may not copy or disclose its contents to anyone.


This message was scanned for viruses with Trend Micro ScanMail, GFI MailSecurity and GFI MailEssentials by the World Health Organization Regional Office for the Western Pacific. However, the recipient is advised to scan this e-mail and any attached files for viruses.

Disclaimer:

This e-mail, together with any attachments, is intended for the named recipients only and is confidential. It may also be privileged or otherwise protected by law.

If you have received it in error, please notify the sender immediately by reply e-mail and delete it and any attachments from your system. You may not copy or disclose its contents to anyone.

Hi Tom,
I think you have identified many of the scenarios, and it is good to see them mapped out here. Would be to capture this as a blueprint on launchpad.

Just another comment about R. In fact, we often have to pull out certain indicators, and aggregate them using a combination of SQL and R. What we have not tried to do yet (although we will shortly), is put them back in the database, However, the last two sections of the appendix on R integration with DHIS sets the stage for this. Using PL/R, more complex aggregation operators could be programmed in R, and applied to result sets using custom aggregation operators. This might be useful for building custom SQL queries for report tables. The last section on the use of the WebAPI with R, shows how to retrieve data and metadata into R. This section outlines how to use the WebAPI to inject data back into DHIS2 through the WebAPI. So, the basic building blocks are there. We just need to do a lot more work to create usable methods in R or DHIS2 to make this more easily doable. I think custom analyses should not be too difficult, but the offloading the entire datamart process to R would obviously incur a big overhead and require a lot of work.

Best regards,

Jason

···

On Fri, Jun 1, 2012 at 10:36 AM, hiattt@wpro.who.int wrote:

Hi Jason and Lars,

Thank you for the thoughtful responses. I agree with you this is similar to
the NULL issue, but I think it’s helpful to separate them into sub-scenarios
as you’ve done.

Does this summarize?

NULL issues

NAs when summing variables (indicators or variable dimensions to make the
variable(total))

Treat NAs as 0s: 1+2+3+NA=6
Treat NAs as NA: 1+2+3+NA=NA
Treat NAs as 0s unless all are NA: 0+NA+NA+NA=0 and NA+NA+NA+NA=NA
Treat NAs as 0s sometimes and NAs sometimes: IFNULL(var1,0)+IFNULL(var2,NA)

Treatment of NAs in evaluating rules

NAs are considered 0s in rules
NAs always make rules fail
NAs are ignored in rules

Treatment of NAs when calculating numerator/denominator indicators (also

depends on settings in 1.)

Drop numerator and denominator for an orgunit-timeperiod when calculating if
either are NA
Drop numerator and denominator for an orgunit-timeperiod when calculating

only if denominator is NA
Drop numerator and denominator for an orgunit-timeperiod when calculating
only if both are NA

I don’t pretend to understand all the implications of this on the backend,

but it certainly makes sense to me as Jason said that wrong indicators are
as useful as no indicators (and even detrimental if misleading).

I think the current DHIS behavior (1a, 2b, 3c) is justifiable (except for 3c

which seems the wrong option of the three). I suspect that fairly granular
control would be required not only to accommodate different users of DHIS,
but also to accommodate different needs within one DHIS instance, but

perhaps default behaviour is as far as you want to take it in the system.

I was speaking with Knut the other day about the possibilities of
incorporating R into DHIS. I don’t know if this would this be a way to take

this heavy lifting off the database, but it would be great (for me at
least!) to have that kind of control that R provides. Or maybe the best
scenario is simply a way for R to put output back into DHIS and then any

tailored aggregating can be done outside of the system.

For me, at least for the time being, I will just make sure to not calculate
indicators unless rules are in place to clean the data first before

calculating.

Thanks again,

Tom


From: Jason Pickering [mailto:jason.p.pickering@gmail.com]

Sent: 01 June 2012 14:31

To: Hiatt, Mr Tom (WPRO)
Cc: dhis2-users@lists.launchpad.net
Subject: Re: [Dhis2-users] Missing values in calculating indicators

Hi Lars and Tom,

As Lars mentions, this is a very common problem with aggregation in DHIS2.
In the case you highlight here, you want to exlcude the value which has NA

as a denominator. We started having big problems with this with PMTCT
indicators, when we saw huge values. This was because of systematic
under-reporting of the denominator value, leading to many numerators with no

corresponding denominator. (This is further complicated by the fact that
the validation rules do not respond as you might expect and makes this
problem hard to identify.

Lets say we were going to have a validation rule like

Numerator <= Denominator

This is fairly typical for many indicators in DHIS2

Now, in the case of the validations rules

50 <= NA evaluates to FALSE/NULL

In this case, the NA is not treated as a zero, although in the case of

aggregation which you highlight, the NA is treated effectively as a zero.

The case which you and Lars both describe is quite simple. It becomes
complicated with composite denominators, which may consist of multiple

operands.

So, if you have an indicator like

DE1 + DE2 / DE3 + DE4

Now, lets say the values are as follows

10+20 / 30 + NA

What does that NA really mean? Is it something which has not been reported?
Is it a zero which was entered as a zero but never recorded in the database?

There are not really any clear cut answers, and you are going to have to try
and solve this problem through a number of approaches, such as use of
compulsory data elements, storing zeros, validation rules, and likely,

external aggregation in tools such as R which will allow you a lot more
control over how the aggregation takes place. This is the only solution I
have come up with, but it is certainly far from ideal.

I think this comes down, once again, to the problem with handling NULLs. In
certain cases we need to be able to exlcude NULLS entirely from the
calculation. But with indicators it is tricky, as we take the sum of the

numerators and then the sum of the denominators and divide them. With what
you are proposing Tom, we would need to exclude certain denominator values
(or a set of operands) entirely based on whether or not a numerator (or

member of set of operands which compose a numerator value) is
NULL. Definitely doable, and probably a good idea, but as Lars points out,
it is going to most certainly affect performance. However, there is little

point in calculating values quickly, if they are not correct. I certainly
have come up against these same problems before Tom, and I do think we need
to do some careful thinking around this issue.

Best regards,

Jason

On Thu, May 31, 2012 at 8:56 AM, hiattt@wpro.who.int wrote:

Hi Jason,

Sorry I’m in the dark on some of these debates. Please let me know if I’m
getting annoying on any of my questions!

My problem is slightly different. It’s more like this example for a specific

period:

Org unit Numerator Denominator

O1 50 100

O2 40 50

O3 80 NA

For indicator X = (numerator / denominator) * 100

What I would like to see is (50 + 40) / (100 + 50) = 60%

What I’m seeing is (50 + 40 + 80) / (100 + 50) = 113%

Does that make sense? Again I could see a debate on whether to include rows
with missing numerator data, but I can’t think of a scenario where you would

want rows with missing denominator data.

Thanks again, and please do let me know if I’m abusing the mailing list. You
all have been so responsive and I appreciate it, but I don’t want to wear

out my welcome!

Tom


From: Jason Pickering [mailto:jason.p.pickering@gmail.com]

Sent: 31 May 2012 14:39
To: Hiatt, Mr Tom (WPRO)
Cc: dhis2-users@lists.launchpad.net
Subject: Re: [Dhis2-users] Missing values in calculating indicators

Hi Tom,

You can consult the DHIS2 mailing list archives for long and torturous email
threads on this issue and it has come up many times.

What would the desired behavior be in your case? Do you mean, that something
like

NA + 1 + 2 +3 = NA

This is not the default behavior at all in DHIS2, and most of the time

NAs/NULLS will be coerced to zero. So, the default behavior in DHIS2 is

NA + 1 + 2+ 3 = 6

In cases when the NAs are important, we have always had to resort to

SQL/R/Stata to perform the aggregation. But maybe there is a better solution
from someone else?

Best regards,

Jason

On Thu, May 31, 2012 at 8:13 AM, hiattt@wpro.who.int wrote:

I see that when indicators are calculated, records where either the

numerator or denominator is missing are not excluded. For me this would be
ideal default behavior so that the indicator isn’t thrown wildly off by
assuming no reporting is the same as reporting 0. I could imagine some times

when one would want an unreported numerator to be treated as 0 (if it’s
likely that it isn’t reported, because it doesn’t exist), so maybe an option
to override default in the indicator definition screen would be good. I

can’t think of a scenario where one would want missing denominators to be
treated as 0.

I don’t know if others are running into this, or if it’s only me. If there
is a way to set this already let me know.

Thanks,

Tom


This message was scanned for viruses with Trend Micro ScanMail, GFI

MailSecurity and GFI MailEssentials by the World Health Organization
Regional Office for the Western Pacific. However, the recipient is advised
to scan this e-mail and any attached files for viruses.

Disclaimer:

This e-mail, together with any attachments, is intended for the named
recipients only and is confidential. It may also be privileged or otherwise
protected by law.

If you have received it in error, please notify the sender immediately by
reply e-mail and delete it and any attachments from your system. You may not
copy or disclose its contents to anyone.


Mailing list: https://launchpad.net/~dhis2-users
Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users
More help : https://help.launchpad.net/ListHelp


This message was scanned for viruses with Trend Micro ScanMail, GFI
MailSecurity and GFI MailEssentials by the World Health Organization

Regional Office for the Western Pacific. However, the recipient is advised
to scan this e-mail and any attached files for viruses.

Disclaimer:

This e-mail, together with any attachments, is intended for the named

recipients only and is confidential. It may also be privileged or otherwise
protected by law.

If you have received it in error, please notify the sender immediately by
reply e-mail and delete it and any attachments from your system. You may not

copy or disclose its contents to anyone.


This message was scanned for viruses with Trend Micro ScanMail, GFI

MailSecurity and GFI MailEssentials by the World Health Organization
Regional Office for the Western Pacific. However, the recipient is advised
to scan this e-mail and any attached files for viruses.

Disclaimer:

This e-mail, together with any attachments, is intended for the named
recipients only and is confidential. It may also be privileged or otherwise
protected by law.

If you have received it in error, please notify the sender immediately by
reply e-mail and delete it and any attachments from your system. You may not
copy or disclose its contents to anyone.