Duckdb as an alternative to postgres for analytics?

Stephan_Mestach · 10 June 2025 15:34

Does some developer looked at duckdb and parquet files as an alternative to the current postgres implementation ?

exemple

I’ve extracted as is the analytics tables to parquet

INSTALL postgres;
LOAD postgres;

ATTACH 'postgresql://user:password@dbhost:5432/dhis' AS dhis_source (TYPE postgres);


COPY (SELECT * FROM dhis_source.analytics_2016) TO 'analytics_2016.parquet' (FORMAT PARQUET);
COPY (SELECT * FROM dhis_source.analytics_2017) TO 'analytics_2017.parquet' (FORMAT PARQUET);
COPY (SELECT * FROM dhis_source.analytics_2018) TO 'analytics_2018.parquet' (FORMAT PARQUET);
COPY (SELECT * FROM dhis_source.analytics_2019) TO 'analytics_2019.parquet' (FORMAT PARQUET);
COPY (SELECT * FROM dhis_source.analytics_2020) TO 'analytics_2020.parquet' (FORMAT PARQUET);
COPY (SELECT * FROM dhis_source.analytics_2021) TO 'analytics_2021.parquet' (FORMAT PARQUET);
COPY (SELECT * FROM dhis_source.analytics_2022) TO 'analytics_2022.parquet' (FORMAT PARQUET);
COPY (SELECT * FROM dhis_source.analytics_2023) TO 'analytics_2023.parquet' (FORMAT PARQUET);
COPY (SELECT * FROM dhis_source.analytics_2024) TO 'analytics_2024.parquet' (FORMAT PARQUET);
COPY (SELECT * FROM dhis_source.analytics_2025) TO 'analytics_2025.parquet' (FORMAT PARQUET);


COPY (SELECT * FROM dhis_source._periodstructure) TO 'periodstructure.parquet' (FORMAT PARQUET);
COPY (SELECT * FROM dhis_source._orgunitstructure) TO 'orgunitstructure.parquet' (FORMAT PARQUET);
COPY (SELECT * FROM dhis_source._dataelementstructure) TO 'dataelementstructure.parquet' (FORMAT PARQUET);
COPY (SELECT * FROM dhis_source._categorystructure) TO 'categorystructure.parquet' (FORMAT PARQUET);

then create

CREATE VIEW analytics AS SELECT * FROM 'analytics_*.parquet';

then run count/sum/… queries on it
the performance are really impressive and much more constant then postgres (not paying price of transaction handling, fetching only what is needed)
there’s a jdbc driver for it.

Gassim · 12 June 2025 08:54

Hi!

Thank you for testing and sharing with the community. I feel like this could be a conversation in the weekly server-admin group calls with @bobj, @tkipkurgat, and the other server admins?

Thanks!

Stephan_Mestach · 16 June 2025 12:16

well it’s more about the core analytics and so development team.

Stephan_Mestach · 9 July 2025 05:36

Apparently there are some development around clickhouse or doris.
Is this an upcoming replacement or just for the built in analytics usage ?

Gassim · 10 July 2025 10:43

Hi @Stephan_Mestach

So is ClickHouse the same suggestion or is Duckdb better? I see that the feature was created in November 2024: Jira

In the related ticket you can read, “To allow for other database platforms than PostgreSQL, a new abstraction layer for managing tables, columns and indexes should be introduced in the analytics backend. This abstraction layer should include a data model for database entities like tables, columns and indexes, and a SQL builder for creating compatible SQL statements for each supported database platform.” by @Lars .

Stephan_Mestach · 11 July 2025 06:18

Clickhouse and duckdb are a bit similar to make queries “count/sum/avg/group by/…” much faster.

So from the jira it’s really a new “implementation” for the analytics generation/queries.

Do you know if it’s already partially released/used in production ?
My guess it’s not yet since support is not full for events/trackers.

For version 42, support includes data elements, aggregate indicators and reporting rates.
Support for event data elements and program indicators is targeted for version 43.
Jira