Statistical grounds of DHIS2 data quality toolkit

Tammy_Kim · 17 August 2023 08:09

Hello!

I am currently assessing the internal consistency of DHIS2 maternal and child health data using R and internal data quality check toolkit referencing the WHO data quality toolkit documents. However, the documents are not fully functional, so I would like to ask a few questions.

How should I get the threshold? Some pages indicate a threshold of 20%, while others suggest 33%. Which standard should I follow both consistency over time and consistency between indicators?
Is there any statistical evidence of internal consistency?

Scott · 21 August 2023 09:10

Hi Tammy,

I suggest you review our youtube series on data quality and our online data quality academy

To answer your questions,

A threshold should be calculated using standard deviations from the median (modified-Z score) as opposed to percentages. The number of standard deviations is usually 2 or 3 depending on how sever of an outlier you want to find. Another method is to use interquartile ranges which is available in the scatter plot chart in the data visualizer application. If you’re looking for outliers in seasonal data these methods will not be appropriate and you will need to use a time-series model like the Mean Absolute Scales Error (MASE). You can see example of how this is done in R using data from DHIS2 in this presentation. https://www.youtube.com/watch?v=65GKAC64qIg
For a detailed overview of internal consistency I suggest you review the WHO Data Quality Assessment guides. https://www.who.int/data/data-collection-tools/health-service-data/data-quality-assurance-dqa. Different metrics of internal consistency will have different methodologies. For example negative drop-out rates indicates data quality issues as does statically significant variance from year-over-year.

Rebecca · 21 August 2023 09:15

Hi Tammy,
I’m not sure if there is a straight-forward answer to your question, unfortunately. DHIS2 has developed a number of data quality tools within DHIS2 to align with the guidelines and recommendations of the WHO Data Quality Review toolkit and facilitate analysis of DHIS2 data, but I see you are using an external R tool. The resources from Scott are excellent to explore further the types of analyses that can be performed within DHIS2.

I believe your questions are related to the WHO’s guidelines (the thresholds themselves and statistical recommendations coming from WHO and not from the DHIS2 team, these were updated in 2022 and the module most relevant to you is Module 2 for Desk Review: Data quality assurance: module 2: discrete desk review of data quality).

We hope this helps in your DQ journey!