Dhis2 deadlock it self

We have regular dead lock of dhis2 instances.

    "http-nio-8080-exec-357" #3226 daemon prio=5 os_prio=0 tid=0x00007f9cc0043000 nid=0xcad waiting for monitor entry [0x00007f9c80cd9000]
   java.lang.Thread.State: BLOCKED (on object monitor)
	at java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1868)
	- waiting to lock <0x00000006032d1bc8> (a java.util.concurrent.ConcurrentHashMap$Node)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.doComputeIfAbsent(BoundedLocalCache.java:2029)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.computeIfAbsent(BoundedLocalCache.java:2012)
	at com.github.benmanes.caffeine.cache.LocalCache.computeIfAbsent(LocalCache.java:113)
	at com.github.benmanes.caffeine.cache.LocalManualCache.get(LocalManualCache.java:54)
	at org.hisp.dhis.cache.LocalCache.get(LocalCache.java:101)
	at org.hisp.dhis.setting.DefaultSystemSettingManager.getSystemSetting(DefaultSystemSettingManager.java:207)
	at org.hisp.dhis.setting.DefaultSystemSettingManager.getSystemSettings(DefaultSystemSettingManager.java:348)
	at sun.reflect.GeneratedMethodAccessor1764.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe

strangely the db do nothing, all connections are idle

datid,datname,pid,usesysid,usename,application_name,client_addr,client_hostname,client_port,backend_start,xact_start,query_start,state_change,wait_event_type,wait_event,state,backend_xid,backend_xmin,query
16396,dhis2,14679,16395,dhis2,PostgreSQL JDBC Driver,172.31.19.128,,43942,2019-06-29 06:29:11.552247+00,,2019-06-29 09:38:51.691524+00,2019-06-29 09:38:51.69154+00,,,idle,,,SET SESSION CHARACTERISTICS AS TRANSACTION READ WRITE
16396,dhis2,14678,16395,dhis2,PostgreSQL JDBC Driver,172.31.19.128,,43940,2019-06-29 06:29:11.551823+00,,2019-06-29 09:38:51.679668+00,2019-06-29 09:38:51.679689+00,,,idle,,,SET SESSION CHARACTERISTICS AS TRANSACTION READ WRITE
16396,dhis2,12653,16395,dhis2,PostgreSQL JDBC Driver,172.31.19.128,,59430,2019-06-27 12:27:00.339296+00,,2019-06-29 09:38:51.689919+00,2019-06-29 09:38:51.689933+00,,,idle,,,SET SESSION CHARACTERISTICS AS TRANSACTION READ WRITE
16396,dhis2,14680,16395,dhis2,PostgreSQL JDBC Driver,172.31.19.128,,43944,2019-06-29 06:29:11.552594+00,,2019-06-29 09:38:46.623479+00,2019-06-29 09:38:46.623511+00,,,idle,,,SET SESSION CHARACTERISTICS AS TRANSACTION READ WRITE
16396,dhis2,14855,16395,dhis2,PostgreSQL JDBC Driver,172.31.19.128,,34296,2019-06-27 12:27:59.67825+00,,2019-06-29 09:38:51.707286+00,2019-06-29 09:38:51.7073+00,,,idle,,,SET SESSION CHARACTERISTICS AS TRANSACTION READ WRITE

some threads seem go a bit further in the “compute” part

"http-nio-8080-exec-510" #3512 daemon prio=5 os_prio=0 tid=0x00007f9cbc0cc800 nid=0xdcb in Object.wait() [0x00007f9c6d0a0000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at com.mchange.v2.resourcepool.BasicResourcePool.awaitAvailable(BasicResourcePool.java:1465)
at com.mchange.v2.resourcepool.BasicResourcePool.prelimCheckoutResource(BasicResourcePool.java:644)
- locked <0x00000004c26b6220> (a com.mchange.v2.resourcepool.BasicResourcePool)
at com.mchange.v2.resourcepool.BasicResourcePool.checkoutResource(BasicResourcePool.java:554)
at com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool.checkoutAndMarkConnectionInUse(C3P0PooledConnectionPool.java:758)
at com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool.checkoutPooledConnection(C3P0PooledConnectionPool.java:685)
at com.mchange.v2.c3p0.impl.AbstractPoolBackedDataSource.getConnection(AbstractPoolBackedDataSource.java:146)

most threads are involved in user and system settings access

org.hisp.dhis.setting.DefaultSystemSettingManager.getSystemSetting
org.hisp.dhis.interceptor.SystemSettingInterceptor.intercept(SystemSettingInterceptor.java:104)
org.hisp.dhis.user.DefaultUserSettingService.getUserSettingOptional

Is there anyway to specify a timeout on the connection pool in order to let this resolve (with a few errors) by it self instead of requiring a tomcat restart ?
Is this a known bug ? (dhis2 2.30)

Thanks,

for developers : a full thread dump is available here : threaddump.txt · GitHub

2 Likes

Is it possible that it’s related to this :

1 Like

https://jira.dhis2.org/browse/DHIS2-7197

1 Like

Hi @Stephan_Mestach,

Our team from @dhis2-backend will have a look and advise on this. Kindly be patient and available in case they need more information from you.

Thanks, no problem.

1 Like

Could potentially be JDK level bug:

https://bugs.openjdk.java.net/browse/JDK-8161372

1 Like

Thanks for pointing this.

What are my options ?

  • morning/evening restart to prevent at least this during the day (12h cache from what I’ve seen in the codebase) ?
  • switching cache implementation ? Is redis cache advised ? (don’t see a lot of references in the doc except in cluster support)
  • is there a plan to support latest java versions ?
1 Like

The live locking seem easy to reproduce if your computeIfAbsent call another cache entry

is this possible to run dhis2 on newer jdk ?

1 Like

Hi @Stephan_Mestach,

just to keep the thread up to date, I want to report that we are able to reproduce the issue. We have come up with several ways to fix or mitigate the issue and are in the process of testing out the most reasonable way to fix this issue for existing versions of DHIS2.

1 Like

Some follow-up: [DHIS2-7197] - Jira

Which resulted in: feat: Cache2k implementation by ameenhere · Pull Request #3664 · dhis2/dhis2-core · GitHub (switch to cache2k)

So this will probably get resolved in a future DHIS version. Thanks all involved!

1 Like