Maximizing DHIS2 uptime, scaling beyond Hardware

This community innovation has been accepted at the 2026 DHIS2 Annual Conference as a digital poster.


Maximizing DHIS2 uptime, scaling beyond Hardware

Ethiopia operates one of the world’s largest DHIS2 implementations, managing 1.4 TB of data with 912 million datavalues, 44K Orgunits including 10 years of legacy analytics. In 2025, due to the increased number of public and private health facilities starting to use DHIS2 and added new programs, a 14% surge in annual data entry (170 million new records) spiked and exhausted the infrastructure. Despite high-performance hosting, the system faced critical bottlenecks. System logs revealed a critical issues: A “Query exhaustion” where about 25,000 users at a time triggered massive 10-year analytics queries on startup; Unmanaged API Crawlers from external integrations silently flooding connection load; and Massive metadata hygiene issue and improper optimizations. We discovered that hardware expansion alone is not sufficient without equally aggressive application and traffic governance and metadata cleaning. In so doing, we followed a holistic approach: We redesigned the deployment architecture to a 7-server cluster to offload analytics and fine-tuned configuration parameters to handle loads. We implemented strict domain separation, routing data entry and analytics to ensure real-time availability with synching from the primary database. We replaced the default dashboard with a newly designed lightweight “Landing Page” App, neutralizing massive dashboard query load spikes. We identified and regulated heavy API consumers, enforcing strict management on “crawler” system users. We cleaned 23,373 inactive users, 4,293 unused dashboards, 2,447 org-unit issues, to further reduce query Results: This balanced approach succeeded, Monthly incidents dropped from 900 to 108 (an 89% reduction), and uptime stabilized at 98.8%. This session demonstrates that national scale requires both powerful hardware, app metadata cleaning and strict traffic governance.

Primary Author: Abinet Zergaw


Keywords:
High Availability, Deployment Architecture, API Traffic Management, PostgreSQL Read-Replica, Query Optimization, Landing page, Traffic Segmentation, DHIS2 Landing Page, Server Clustering, Database Tuning, System Governance, Scaling , metadata cleaning, performance, query

2 Likes