Hosting dhis.conf changes around analytics.connection.*

Stephan_Mestach · 2 October 2024 12:22

I noticed something “new” in the dhis.conf

A set of new properties

# ----------------------------------------------------------------------
# Analytics [Optional]
# ----------------------------------------------------------------------

# Analytics database JDBC driver class
analytics.connection.driver_class = org.postgresql.Driver

# Analytics database connection URL
analytics.connection.url = jdbc:postgresql:analytics

# Analytics database username
analytics.connection.username = analytics

# Analytics database password
analytics.connection.password = xxxx

# Analytics unlogged tables. Can be 'on' (default), 'off'. On will improve analytics geeneration performance at the cost of no replication.
analytics.table.unlogged = on

a few questions :

Is this something that is encouraged to have separate db instance for analytics tables ?
- should we expect performance/stability gains ? (except the fact the analytics table are no more logged in postgres)
- is it ok to have the instance in a different location/vm then the normal connection ?
Does it means that the we can remove some access right for the “normal” connection or it’s still required to have the upgrade/flyway to work ?

connection.url

Gassim · 3 October 2024 10:51

Thank you for sharing about this with the community @Stephan_Mestach ! When we get a response, it will be helpful for everyone especially server administrators.

For the moment, is it okay if I ask how did you happen to find these changes?

Stephan_Mestach · 4 October 2024 13:13

by luck, I opened the doc and scrolled down and noticed a section I never saw before

I read about analytics.table.unlogged in the release notes but not the others

bobj · 7 October 2024 09:26

Hi @Stephan_Mestach

The analytics.table.unlogged is a very useful feature (potentially more useful than the inventors realized). It was originally introduced as a measure to give a slight speedup to the analytics table generation (15-20% depending on context). That in itself is a good enough reason to have it enabled at all times. In fact it is now set as default so you will actually have to unset it if you want wal logs for analytics.

But the two potentially beneficial side effects are:
(i) not logging analytics tables makes it feasible to do incremental backup of your transactional database. ie. take a periodic base backup and archive the wal logs in between. That way you can restore back to a point in time up to the last transaction committed, which is a much better backup strategy than the current common nightly logical backup
(ii) making replicas of the transaction database is also now much more feasible than before, because you eliminate the churn of tonnes of wal log related to analytics. This has implications for high availability and potentially for performance if you can redirect read only queries for things like TEI search and line listing.

So (i) above is already possible now if you know you way around postgresql. I am hoping to provide a little guidance documentation soon. (ii) is also possible, but still some work required to nicely deal with redirecting read queries. See [DHIS2-17541] - Jira

The other question you ask is about the analytics.connection. This has been internally tested for some performance related research, but I don’t know yet of anyone using it in the wild. AFAIK it does work and I hope to do some testing myself. There are potentially a lot of benefits. It would be great if you are interested in trying it out and let us know how you get on. There are a few initiatives going on internally to find ways of scaling analytics better and this is one possible route which has some potential.

Regards
Bob