How does Data Statistics work?

Hi - I’ve been watching data statistics each day, specifically the count of events. However, it isn’t going up even though there are new events being entered - it even says so in the data statistics section for how many were entered today, this week, etc.

I remember there was a recent (2.37?) performance improvement for data statistics that created an index or a cache of some sort. Could this be what I am experiencing? Is there a way to reset it? I’d like an updated view of records in the DB.

1 Like

Hi

I’m pretty sure if I ask you to run the Analytics Tables export then it’d be something that you’ve tried, correct? And the clear server cache and reload applications in Data Administration → Maintenance? So I’m wondering if these didn’t work? Did you try to view them in Guest mode to make sure there’s no cache issue after the above steps?

So even after running the analytics tables export, the values are not appearing in the DB?

Right - it makes no difference. I’m trying to understand how it works - it doesn’t seem to be a direct query to the DB.

1 Like

@chase.freeman the docs literally say that the “data statistics module provides an overview of the number of objects stored in the DHIS2 database,” so if it’s not showing this probably either there is a bug in the module or an issue with the database?

Do you think we can prove the contradiction in any of the play.dhis2.org instances? If so would you like to create a Jira bug issue on Projects - Jira using your account?

Thanks!

This ticket explains the concept behind Data Statistics which was changed in 2.37
https://dhis2.atlassian.net/browse/DHIS2-10865

The “data statistics” function provides information about the number of rows in the database for several objects. Some of them are often very large fact tables like data values and events. Counting the number of rows using select count is no longer feasible for large databases. Instead the system should use approximate counting using the pgclass table.

In PostgreSQL, the reltuples field from the pg_class system catalog table provides an estimated number of rows in a table. The reason this number is considered “approximate” rather than “exact” lies in how and when this statistic is updated by PostgreSQL.

The reltuples value is generally updated when:

  1. You run the ANALYZE command explicitly.
  2. VACUUM or VACUUM ANALYZE is run

However, we don’t have a way of triggering either of those via the UI or API. There is “Analyze Analytics” in Data Admin App > Maintenance but that is only for Analytics tables, not the raw data.

Analytics table analyze will collects statistics about the contents of analytics tables in the database.

POST PUT /api/maintenance/analyticsTablesAnalyze

So, it looks like I’m going to have to reproduce the old Data Stats using a SQL view and query that for my needs.