Duckdb as an alternative to postgres for analytics?

Does some developer looked at duckdb and parquet files as an alternative to the current postgres implementation ?

exemple

I’ve extracted as is the analytics tables to parquet

INSTALL postgres;
LOAD postgres;

ATTACH 'postgresql://user:password@dbhost:5432/dhis' AS dhis_source (TYPE postgres);


COPY (SELECT * FROM dhis_source.analytics_2016) TO 'analytics_2016.parquet' (FORMAT PARQUET);
COPY (SELECT * FROM dhis_source.analytics_2017) TO 'analytics_2017.parquet' (FORMAT PARQUET);
COPY (SELECT * FROM dhis_source.analytics_2018) TO 'analytics_2018.parquet' (FORMAT PARQUET);
COPY (SELECT * FROM dhis_source.analytics_2019) TO 'analytics_2019.parquet' (FORMAT PARQUET);
COPY (SELECT * FROM dhis_source.analytics_2020) TO 'analytics_2020.parquet' (FORMAT PARQUET);
COPY (SELECT * FROM dhis_source.analytics_2021) TO 'analytics_2021.parquet' (FORMAT PARQUET);
COPY (SELECT * FROM dhis_source.analytics_2022) TO 'analytics_2022.parquet' (FORMAT PARQUET);
COPY (SELECT * FROM dhis_source.analytics_2023) TO 'analytics_2023.parquet' (FORMAT PARQUET);
COPY (SELECT * FROM dhis_source.analytics_2024) TO 'analytics_2024.parquet' (FORMAT PARQUET);
COPY (SELECT * FROM dhis_source.analytics_2025) TO 'analytics_2025.parquet' (FORMAT PARQUET);


COPY (SELECT * FROM dhis_source._periodstructure) TO 'periodstructure.parquet' (FORMAT PARQUET);
COPY (SELECT * FROM dhis_source._orgunitstructure) TO 'orgunitstructure.parquet' (FORMAT PARQUET);
COPY (SELECT * FROM dhis_source._dataelementstructure) TO 'dataelementstructure.parquet' (FORMAT PARQUET);
COPY (SELECT * FROM dhis_source._categorystructure) TO 'categorystructure.parquet' (FORMAT PARQUET);

then create

CREATE VIEW analytics AS SELECT * FROM 'analytics_*.parquet';

then run count/sum/… queries on it
the performance are really impressive and much more constant then postgres (not paying price of transaction handling, fetching only what is needed)
there’s a jdbc driver for it.

2 Likes

Hi!

Thank you for testing and sharing with the community. I feel like this could be a conversation in the weekly server-admin group calls with @bobj, @tkipkurgat, and the other server admins? :slight_smile:

Thanks!

well it’s more about the core analytics and so development team.

Apparently there are some development around clickhouse or doris.
Is this an upcoming replacement or just for the built in analytics usage ?

1 Like

Hi @Stephan_Mestach

So is ClickHouse the same suggestion or is Duckdb better? I see that the feature was created in November 2024: Jira

In the related ticket you can read, “To allow for other database platforms than PostgreSQL, a new abstraction layer for managing tables, columns and indexes should be introduced in the analytics backend. This abstraction layer should include a data model for database entities like tables, columns and indexes, and a SQL builder for creating compatible SQL statements for each supported database platform.” by @Lars . :slight_smile:

Clickhouse and duckdb are a bit similar to make queries “count/sum/avg/group by/…” much faster.

So from the jira it’s really a new “implementation” for the analytics generation/queries.

Do you know if it’s already partially released/used in production ?
My guess it’s not yet since support is not full for events/trackers.

For version 42, support includes data elements, aggregate indicators and reporting rates.
Support for event data elements and program indicators is targeted for version 43.
Jira