Getting to understand the DHIS2 tables. Someone please help with an explanation of the Analytics tables that end with either enrollment_ or event_ followed by a program uid. A look at one of the tables shows the data that’s collected from the associated program uid. Question, can this data be used to represent all data entered in the associated program accurately, what else is contained there? Please shed a bit of light on this.
Thank you in advance.
Hi @WaluQ! The team working on @dhis2-analytics will be in touch to assist you with this.
@Stian, @Jim_Grace, or @Markus would you be able to get back to @WaluQ regarding his question?
Still waiting for help on this…
The enrolment_* tables will contain information about a specific enrolment and the attribute values tied to that specific program. That means it will not contain any information about the events/program stages completed within that enrolment.
The event_* tables contains information about the program stage (That includes “events”/program stages completed in tracker capture, but also events completed in event programs using event capture). Each row will represent specific event and data values tied to that event (Note that for events we have data values from data elements, while enrolments have attribute values from tracked entity attributes)
In addition to having the same data you submit (Accurate to the point of which the tables was generated), the analytics tables have a few extra columns for improving queries. These extra columns include which orgunit the information was reported for, as well as all parent orgunits; The period it was reported in, as well as any other period it would fit in; etc.
Using the analytics tables (enrolment_* and event_*) will give you accurate data, but it might be stale, depending on whether or not data has been changed or added since the tables where lastly generated.
Depending on your usecase, working directly with the raw data might be better, since it will always be accurate.
In addition, analytic tables are partitioned by year in the database, so you might have several similar tables for a single program: enrolment_2017, enrolment_2018, etc.
Thank you @Stian for the response. If am getting you correct, you are saying “its okay to use these tables provided they are updated at time of querying (analytics updated before querying)” and that “for any time querying, the raw tables are better since you don’t have to mind when analytics last updated?”
@WaluQ, yes you are correct. However I would avoid generating analytical tables frequently, as they could consume a lot of server resources. For most use-cases, generating these tables every night is sufficient, so worst case your data could be 24 hours old in that case.