We are experiencing dhis2 downtime on dhis2 v2.33.5 (and in the past v2.33.4).
Each time this happens, we see a CPU spike, on our separate AWS RDS instance, maxing it out to 100% and making it unresponsive for several hours. This happened five times since July 2020. Twice the system restored itself, otherwise we had to restart the RDS instance. We cannot see a pattern shared between each downtime occurance, or understand the reason the system went down.
The most recent downtime occurance, we see the following in the log right before dhis2 went down:
It seems that the job ‘Remove Expired Reserved Values’ could be the issue, but this job runs every hour, and we do not have tracked entity attributes configured to generate reserved values.
Has anyone else experienced dhis2 downtime? What was the issue you faced? Was there a solution, or workaround? @Scott has this come up before?
We had similar issue in one of our instances. As a workaround we have increased the CPU core from 2 to 4 for that server where three other instance (medium sized) is running. Don’t know the exact cause so far, still digging the log.
Also we recommend our users to run data integrity check, import/export, large pivot table generation and other heavy resource consumption activities either early morning or at end of the day.
Are you able to ssh into the server and run the top command to see what the offending process name is? I am guessing it’s the java executable as I’ve often seen this behavior in Java apps. If it is, what is the output of java --version ?
@Mohammad_Ullah we have a separate AWS RDS instance with PostgreSQL, which should be sufficient for our activities.
@Edward_Robinson we are not able to ssh into our server, but we do have monitoring tools that inform us about the activity. None have been able to identify the cause of the CPU spike to 100%.
Apologies for the late response, I’ve been out of action after eye surgery and am back at work today. If the issue persists, check your JVM environment settings including:
Xms
Xmx
Also, how much RAM is available on the machine? How much is available to services such as PostgreSQL, Tomcat?
Are there other Java apps running on this server (e.g. monitoring tools)?