Duckdb as an alternative to postgres for analytics?

Dear @Stephan_Mestach ,

The DHIS2 team also consider these technologies to be almost a necessity for large instances, due to the performance advantages you outline. Even for smaller instances, separating the analytics from the “reporting” database has advantages for stability and availability.

In DHIS2 v42 the features were, to some extent, experimental and limited; but they have allowed a few implementations to test out the potential. With v43, to be released in early May, we will bring support for Tracker related data too; making this a much more complete solution.

In terms of our infrastructure, we have been including Apache Doris environments systematically in our core test infrastructure for the last six months.

In terms of production use, I think it’s probably too early to say, and you are right to be a bit cautious at the moment. However, another advantage of this being purely for analytics is that the data in the “Analytics database” (Apache Doris or Clickhouse) is ephemeral; that is to say it is not the important raw data. So, if there are issues with the analytics database, it should be simple to reconfigure to remove that connection and regenerate analytics on the base PostgrSQL DB (you may need to take that into account in the provisioning of the infrastructure, of course).

@Lars has been championing these developments and may be able to give more of a detailed outlook.

Kind regards,
Phil

2 Likes