We have a few API calls to get data from DHIS2 into our custom web portal. We normally use -
https://[dhis2]/api/dataValueSets.csv
with 3 hours incremental pull. But at times we have to get the full dataset of a month. But most of the time the API times out in 600 seconds, and the interoperability is at stake. As of I know, DataValueSets do not have pagination option.
Where do we increase this timeout? Or is there any alternate, if the data could have been zipped in server side and then transferred that would have made the data smaller and transfer time would also decrease.
The DHIS2 has 20,000+ health facilities! Guidance would be highly appreciated.
hi @Gassim
We have 3 org unit groups (one has 2000 org units, second one has 5000 and the third one has 8000 == around 15000) which we use in the call to filter, the third one has started failing. If we call by one org unit, it works like a charm.
If we change the API call to get one org unit’s data for one month each time (so there will be 15000 API calls) - would that be a practical approach?
(Just my personal thought) when we give the call using orgunitgroup, I think identifying the org units for which data need to be returned may be the time-consuming part inside the API and prone to break. Secondly, I also have seen the Export-Import tool also fails to return data in the same case.
@Mahmud I think your idea to call each org unit one-by-one might be faster than the org unit group call. I agree, even using the user interface, that assignments per OU group is very slow (likely because it is checking every OU in the selected OU tree). I’m not sure if your server could handle 15000 API calls back-to-back, but it sounds like a start. Maybe adding additional filter could help, like org unit level (to avoid checking every OU in the system).
hi @lnunez
Yes, we are rewriting the bulk import code to call the API repetitively, but we found that instead of calling once facility at a time, putting say 20-30 orgunits in one call also gets results pretty quickly, so we will reduce the API calls to at least 20 times.
Given the scale we are talking about here, I would be very tempted (I know there will be people who would shoot me for this!) to just run a simple service alongside which pulls the data directly from the database in whatever format you need and secure it well.
It is also possible to increase timeouts on your reverse proxy and/or tomcat to work around the immediate problem but, to be honest, having http connections running for more than 5 minutes on your server brings its own problems.