DAC2021 presentation - Program evaluation in resource-limited environments: a demonstration of a novel machine learning approach to deriving actionable insights from DHIS2-housed data for healthcare intervention management

Part of the Use of DHIS2 data in research DAC2021 Session: Wednesday 23nd June 14:00

If you are attending the DHIS2 Annual Conference, more details on this session are available here.

If you have not yet registered to the DHIS2 Annual Conference, do not hesitate to visit the DHIS2 Annual Conference website and secure your spot!

As countries work to meet Global Goal 3: Ensure healthy lives and promote well-being for all at all ages, health interventions at all levels are vital for meeting targets for universal health coverage. However, these intervention programs often lack funding for the so-called “gold standard” of intervention assessment - a randomized control trial (RCT.) Furthermore, over the past decade, researchers worldwide have readily identified a need to build program evaluation in a more flexible and pragmatic way for the real-world context. These authors have highlighted that typical evaluation approaches are often poorly defined, lack control groups, and repeatedly disregarded as insufficient or unsatisfactory.

In this session, we propose that DHIS2-housed data, long used in conjunction with programmatic data for routine program management across health interventions, could be used for flexible program evaluation in situations where an RCT may not be feasible even ethical.

Advances in computational efficiency, where local machines can run complex models on local computers, mean that data science approaches to program evaluation are far more attainable. We suggest that a more computationally intensive method - the synthetic control method - can yield better and more actionable program evaluation results than alternatives such as difference-in-difference models or natural experiments. The fundamental difference lies in the synthetic control approach. The effects of confounders are not held constant over time in synthetic controls models and instead are reweighted throughout the modeling process. This approach allows for conducting simulated program evaluation in situations where the validity of the parallel trends assumption is questionable.

Using publicly available DHIS2 demonstration data, we demonstrate the synthetic control method at a granular geographic level as a means of generating insights for program evaluation via site comparison. Specifically, we will showcase the data pipeline necessary for assessing programmatic health data alongside DHIS2 data, the extract-load-transform process, the modeling approach, and the pathway to sharing regular, actionable information. We discuss the implications of this approach, and conclude with recommendations for deploying this approach as a repeatable modeling solution, particularly in resource-limited environments where implementers are looking to conduct rigorous program evaluation in situations where a randomized control trial might not be ethical or feasible. We believe this novel approach is the first example of deploying machine learning for health program evaluation using DHIS2 and expect it to be a valuable contribution to the DHIS2 community of practice.


Sharing here the code repository if you are interested in recreating the sample analysis, or trying it with different parameters, Emily has made the sample code available on GitHub here: GitHub - eyelverton/dhis2-2021-sc: contains demo code for DataKind's DHIS2 2021 conference presentation "A machine learning approach to deriving insights from DHIS2 data for program evaluation of healthcare interventions"

DataKind - Key Takeaways - DAC2021.pdf (30.8 KB)