Pivot views and resource tables - problem when updating/dropping resource tables referenced by pivot views

olatitle · 26 January 2010 15:35

Hi,

Some background:

Recently we’ve added quite a few so called resource tables to make it easier to analyse DHIS data with various dimensions in external tools.
These include tables with group sets/groups/elements for data elements, indicator and orgunits as well as a table with all categories and options for every catoptioncombo.

(you can generate them from the DHIS UI under Data Adminstration->resource table)
For excel pivot tables I use predefined views to extract data, and in these views I e.g. join the aggregateddatavalue table with the resource tables to add more columns with dimensional information.

The data load process in Excel then typically becomes a select * from , which makes it a lot easier for non-techies to create pivot tables.
These views are manually created in postgres and need to be in the database when you create or refresh the pivot tables.

The problem:
A problem I just noticed is related to updates of the resource tables. I created a few new group sets and groups and needed to re-generate a resource table in order to get the new information into the pivot table. This triggered a drop table statement on the resource table that was not allowed since I had created a view referencing that table. Would it be possible to update the resource tables without having to drop them first, e.g. how we do with datamart table like aggregateddatavalue?

If not (and perhaps anyway) could we maybe look into automatically generating these pivot views from the DHIS as part of the resource tables?
That would mean we could drop the views and re-create them from the code if necessary.

To give you an example of how I use the resource table here is a view to get OU3 level data from the datamart together with categories (on columns) and categoryoptions (on rows), as well as group sets (on columns) and groups (on rows). In excel I can then filter out any category or groupset I don’t need for the specific table or report.

(The two resource tables used here are _categorystructure and _dataelementgroupsetstructure )

···

CREATE OR REPLACE VIEW pivotsource_routinedata_ou3_dim AS
SELECT
organisationunit_1.name AS orgunit1,

organisationunit_1.shortname AS ou1,
organisationunit_2.name AS orgunit2,
organisationunit_2.shortname AS ou2,
organisationunit_3.name AS orgunit3,

organisationunit_3.shortname AS ou3,
dataelementgroup.name AS degroup,
dataelement.name AS dataelement,
dataelement.shortname AS deshort,

periodtype.name AS periodtype,
to_char(period.startdate::timestamp with time zone, ‘YYYY’::text) AS year,
to_char(period.startdate::timestamp with time zone, ‘Mon’::text) AS month,

(rtrim(to_char(period.startdate::timestamp with time zone, ‘Mon’::text)) || ‘-’::text) || to_char
(period.startdate::timestamp with time zone, ‘YY’::text) AS period,
aggregateddatavalue.value, aggregateddatavalue.level,

_categorystructure.,
_dataelementgroupsetstructure.

FROM
dataelementgroupmembers dataelementgroupmembers, _categorystructure, dataelement dataelement, dataelementgroup, _dataelementgroupsetstructure,dataelementgroup,

period period, periodtype periodtype,
orgunitstructure orgunitstructure, organisationunit organisationunit_1, organisationunit organisationunit_2, organisationunit organisationunit_3,
aggregateddatavalue aggregateddatavalue

WHERE
dataelement.dataelementid = aggregateddatavalue.dataelementid AND
dataelementgroupmembers.dataelementid = dataelement.dataelementid AND
dataelementgroupmembers.dataelementgroupid = dataelementgroup.dataelementgroupid AND

aggregateddatavalue.categoryoptioncomboid = _categorystructure.categoryoptioncomboid AND
_dataelementgroupsetstructure.dataelementid = aggregateddatavalue.dataelementid AND
period.periodtypeid = periodtype.periodtypeid AND

periodtype.name::text = ‘Monthly’::text AND
period.periodid = aggregateddatavalue.periodid AND
aggregateddatavalue.organisationunitid = orgunitstructure.organisationunitid AND
orgunitstructure.idlevel1 = organisationunit_1.organisationunitid AND

orgunitstructure.idlevel2 = organisationunit_2.organisationunitid AND
orgunitstructure.idlevel3 = organisationunit_3.organisationunitid AND
aggregateddatavalue.level = 3
ORDER BY period.startdate;
ALTER TABLE pivotsource_routinedata_ou3_dim OWNER TO dhis;

Website: www.healthmetricsnetwork.org

Better Information. Better Decisions. Better Health.

Lars · 26 January 2010 17:51

Hi,

Some background:

Recently we’ve added quite a few so called resource tables to make it easier to analyse DHIS data with various dimensions in external tools.
These include tables with group sets/groups/elements for data elements, indicator and orgunits as well as a table with all categories and options for every catoptioncombo.

(you can generate them from the DHIS UI under Data Adminstration->resource table)
For excel pivot tables I use predefined views to extract data, and in these views I e.g. join the aggregateddatavalue table with the resource tables to add more columns with dimensional information.

The data load process in Excel then typically becomes a select * from , which makes it a lot easier for non-techies to create pivot tables.
These views are manually created in postgres and need to be in the database when you create or refresh the pivot tables.

The problem:
A problem I just noticed is related to updates of the resource tables. I created a few new group sets and groups and needed to re-generate a resource table in order to get the new information into the pivot table. This triggered a drop table statement on the resource table that was not allowed since I had created a view referencing that table. Would it be possible to update the resource tables without having to drop them first, e.g. how we do with datamart table like aggregateddatavalue?

What gets inconvenient with not dropping them is that the columns might change, e.g. when adding more categories the category_structure table will get more columns.

If not (and perhaps anyway) could we maybe look into automatically generating these pivot views from the DHIS as part of the resource tables?

That would mean we could drop the views and re-create them from the code if necessary.

This could clearly be done.

Lars

···

On Tue, Jan 26, 2010 at 4:35 PM, Ola Hodne Titlestad olatitle@gmail.com wrote:

Lars · 26 January 2010 18:02

Wouldn’t some of the views you are developing be fairly general, like the one combining aggregateddatavalue/indicatorvalue with dimensional data? Would it be a good idea to provide certain “built-in” views so that other people could use them straight out of the box (being re-created and everything)?

···

2010/1/26 Lars Helge Øverland larshelge@gmail.com

On Tue, Jan 26, 2010 at 4:35 PM, Ola Hodne Titlestad olatitle@gmail.com wrote:

Hi,

Some background:

Recently we’ve added quite a few so called resource tables to make it easier to analyse DHIS data with various dimensions in external tools.
These include tables with group sets/groups/elements for data elements, indicator and orgunits as well as a table with all categories and options for every catoptioncombo.

(you can generate them from the DHIS UI under Data Adminstration->resource table)
For excel pivot tables I use predefined views to extract data, and in these views I e.g. join the aggregateddatavalue table with the resource tables to add more columns with dimensional information.

The data load process in Excel then typically becomes a select * from , which makes it a lot easier for non-techies to create pivot tables.
These views are manually created in postgres and need to be in the database when you create or refresh the pivot tables.

The problem:
A problem I just noticed is related to updates of the resource tables. I created a few new group sets and groups and needed to re-generate a resource table in order to get the new information into the pivot table. This triggered a drop table statement on the resource table that was not allowed since I had created a view referencing that table. Would it be possible to update the resource tables without having to drop them first, e.g. how we do with datamart table like aggregateddatavalue?

What gets inconvenient with not dropping them is that the columns might change, e.g. when adding more categories the category_structure table will get more columns.

If not (and perhaps anyway) could we maybe look into automatically generating these pivot views from the DHIS as part of the resource tables?

That would mean we could drop the views and re-create them from the code if necessary.

This could clearly be done.

Lars

jason · 26 January 2010 18:55

Just TRUNCATE the table or "DELETE FROM foo WHERE 0" should work as
well. If columns need to be added, just do it. No need to drop the
table just because of this.

The question for me is, what about other views that are really not
part of the main DHIS2 distribution? Even if we managed to include the
"stock views" there may be people who want to build analysis outside
of the DHIS2 box. Dropping tables really does complicate matters.
Otherwise, we would need to determine, what if any impact dropping a
table will have, and then figure out how to resolve it.

I did not realize this actually happened, but it is definitely an
issue for us as well here in Zambia.

Lars · 26 January 2010 19:09

Just TRUNCATE the table or “DELETE FROM foo WHERE 0” should work as

well. If columns need to be added, just do it. No need to drop the

table just because of this.

Yes but one can theoretically remove and/or change all categories, meaning that all columns in the categories resource table will have to be re-generated.

Not saying it can’t be done but it is a bit inconvenient. Would it be feasible to drop all columns in a table, then add the required columns to it? Would the view allow that?

The question for me is, what about other views that are really not

part of the main DHIS2 distribution? Even if we managed to include the

“stock views” there may be people who want to build analysis outside

of the DHIS2 box. Dropping tables really does complicate matters.

Otherwise, we would need to determine, what if any impact dropping a

table will have, and then figure out how to resolve it.

(Just to clarify, we would have those “standard” views in addition to a function for adding custom views)

Just curious, in what cases would one not be able to persist views through DHIS2 ?

···

On Tue, Jan 26, 2010 at 7:55 PM, Jason Pickering jason.p.pickering@gmail.com wrote:

I did not realize this actually happened, but it is definitely an

issue for us as well here in Zambia.

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

jason · 26 January 2010 19:25

Yes but one can theoretically remove and/or change all categories, meaning
that all columns in the categories resource table will have to be
re-generated.

Yes, but maybe there are other ways to do this, see below.

Not saying it can't be done but it is a bit inconvenient. Would it be
feasible to drop all columns in a table, then add the required columns to
it? Would the view allow that?

This might be database dependent. If if the DB allowed it, it might
break the view anyway.

(Just to clarify, we would have those "standard" views in addition to a
function for adding custom views)

Just what the doctor ordered!

Just curious, in what cases would one not be able to persist views through
DHIS2 ?

It should not be a problem, as long as DHIS first tested that there
are views that depend on a table being dropped, saved that somehow,
dropped the table and recreated the view. Of course, this operation
could fail depending on the definition of the view. Seems equally
inconvenient.

What about leaving the data model as it is and not dropping the table,
but using an EAV representation of the data instead? SQL/stored
procedures/Java could then be used to pivot the EAV data into a
crosstab view, dynamically. I am thinking back to the SQL I sent a
while back on this (although this was not dynamic and was postgres
specific). Once we know all the possible attributes of a dataset, a
dynamic query could be used to generate the appropriate pivoted data
table.

Not sure I am making sense at this hour. I will think about it some
more and offer up some more blah blah tomorrow. It seems important
though.

Nighty night.