We recently upgraded our DHIS2 version from 2.33.9 to 184.108.40.206
As far as I can tell everything went fine with the upgradee, but the dashboards take too long to display data (I clocked around 11 minutes to have data for a single dashboard), and most of the time I have timeouts. But the same dashboards would take less than 4 minutes to display data on the previous version of DHIS2 (2.33.9).
I did some tests using Data Visualizer, and selected a few program indicators, one orgunit and 1 week, and it takes around 9 minutes to return the data. The same test using 2.33.9 returns data in less than a minute.
This is happening in the production server.
Before doing the upgrade, we used a test server (a virtual machine with almost the same specifications as the production server 8CPUs and 20GB RAM) and this test server using DHIS2 220.127.116.11 performed a little bit slower than the production server when using 2.33.9, but is performing faster than the production server now using 18.104.22.168. With the test server I was able to get data on all the dashboards in under 10 minutes.
I’m trying to understand what could be going wrong with the production server.
The specifications for the production server are: 8CPUs, 64GB RAM, 200GB 100GBSSD, DHIS2 22.214.171.124, Java 11.0.16, Ubuntu 18.04.4
Just making sure it’s not a cache issue, are you accessing the upgraded instance from the same browser that you accessed the instance before the upgrade. This could also be a cache issue. Please try using a new browser (browser profile) which you have not used to access the instance before? If you don’t see these issues then it’s a cache issue which can be easily cleared.
I usually do my tests using incognito (private mode), just to make sure I won’t be dealing with cache issues in the browser.
So I think this may not be related to browser cache,
The message that is displayed when opening the Data Visualizer correspondind to the dashboard is “Something went wrong, An unkwon network error occurred”
Any ideias on how to debug this one. I’ve been trying to do some performance tuning on Postgres, but still the problem remains
Did you check which request failed in network tab of browser? Also, any console logs? You can also check the dhis logs and tomcat logs to debug this.
Please, let me double check on this issue, did you try to clear server cache as well?
- Data Administration app → Maintenance → Clear Cache, reload apps
I found out that timeouts were happening because we’ve configured Nginx as a proxy server, and it had 800 seconds timeout for the connections. Since the queries are taking more than 14 minutes, it returned the network error. Thanks for that tip.
I’ve increased the timeout period to allow more time for the queries return data to DHIS2.
But still, there is pending issue of the queries taking too long (some pivot tables take 103 minutes to return data). Any ideas on how to fix this.
Are these pivot tables using complex indicators or program indicators? Or the period range is quite large?
The pivot tables uses quite complex program indicators. About the period range I can’t tell if it can be considered large or not, in our case our monthly calendar starts from 21st of previous month and ends on 20th of current month. So we had to manually insert all those 30 days individually in the period of the analysis.
We need to zero down on the root cause to get to the solution for the issue.
Here are some suggestions.
- Monitor the memory usage of the instance when the particular requests are being made. Check if RAM usage is hitting the maximum capacity.
- Check which process is consuming lot of memory.
- If the RAM utilisation is minimal but still the responses are slow, check the JVM heap size arguments.
- If you suspect postgres is the cause, then you can fine tune postgres performance by following
Installation - DHIS2 Documentation
- You can also look at the option of read replica (if postgres performance is the issue)
Installation - DHIS2 Documentation