Hello All,
We are experiencing dhis2 downtime on dhis2 v2.33.5 (and in the past v2.33.4).
Each time this happens, we see a CPU spike, on our separate AWS RDS instance, maxing it out to 100% and making it unresponsive for several hours. This happened five times since July 2020. Twice the system restored itself, otherwise we had to restart the RDS instance. We cannot see a pattern shared between each downtime occurance, or understand the reason the system went down.
The most recent downtime occurance, we see the following in the log right before dhis2 went down:
- INFO 2021-01-29 12:00:00,000 ‘system-process’ update org.hisp.dhis.scheduling.JobConfiguration, name: Remove expired reserved values, uid: uwWCT2BMmlq (AuditLogUtil.java [taskScheduler-10])
- INFO 2021-01-29 12:00:00,010 ‘system-process’ update org.hisp.dhis.scheduling.JobConfiguration, name: Remove expired reserved values, uid: uwWCT2BMmlq (AuditLogUtil.java [taskScheduler-10])
It seems that the job ‘Remove Expired Reserved Values’ could be the issue, but this job runs every hour, and we do not have tracked entity attributes configured to generate reserved values.
Has anyone else experienced dhis2 downtime? What was the issue you faced? Was there a solution, or workaround? @Scott has this come up before?
Many Thanks,
Lauren