Managing metadata in a distributed setting

Hi

I have been helping out in Rwanda with their dhis2 setup(s) and I want to share a couple of issues to see whether some of these have been experienced by others and whether we can put our collective heads together on the trickier ones.

The setting is relatively complex in that they have a number of dhis2 instances. The busiest one is the HMIS which is essentially the national routine system which is collecting data from each facility (2466 dataelements in total). There are a few other systems which collect special data - such as the PBF system - also using dhis2.

Then they have a central datawarehouse/dashboard DHIS2 instance which accumulates some data from each of these others. In general it is a restricted subset. So for example there is a dataelement group in HMIS which defines 268 dataelements to be exported to datawarehouse each month. In addition it collects data from some external (non-dhis) systems.

So there are a number of challenges, some which are solved and some which remain unsolved. The first is to synchronize orgunits between them all which we have had largely working for some time (if a bit clumsily).

The bigger one is keeping the dataelement+categoryXXX structures compatible between them all. I have written some simple scripts to extract/filter all the relevant structural metadata for a dataelement group (its dataelements and related categoryXXXX’s). So this set can be exported from one system (eg HMIS) and imported into another (eg Datawarehouse). This solves the problem of routinely exporting datavalues from a dataelementgroup in HMIS and reliably pushing to the datawarehouse without weird conflicts. But …

When you have another system (eg PBF) doing the same thing, it gets hairier. One problem we see is that the categoryoption lists on different systems can have common elements (eg “Male”) but they are independently defined on both HMIS and PBF. So if I export PBF metadata to datawarehouse I will end up with two “Male” categoryOptions with different uids. It seems clear that within this small universe of dhis2 systems they will all have to harmonize these things. So it seems there are two options …

(i) to abandon the project of distributed systems and to bring everything together on one uber-instance with associated usergroups and access controls, or

(ii) to setup a metadata repository instance from which all systems derive their metadata and disable editing on the client systems.

(i) is tempting but I think it escapes the problem rather than solves it. There are valid reasons to maintain these systems separately and there will always be similar cases. So we will probably do (ii). If dhis2 cannot interoperate with itself it augurs badly for its interoperability in general.

Has anybody done something similar? Large scale rationalization of metadata will involve also the massaging of the datavalue tables to keep everything true as the catoptcombos get reorganized. Given that some of the systems have been in full operation for over 3 years now there are the usual messes which have accumulated over time amongst the categoryXXX structures which will need to be addressed anyway to arrive at a pristine state. Maybe there are some nice scripts for this?

Things like sharing settings on categoryoptions provide a bit of a puzzle, but also possibly something useful. Presumably we can set them to be completely readonly on the metadata client systems.

Bob

Hi Bob

This is an interesting challenge. I can think of different methods to address the current alignment problem but you obviously have a long term (master/meta) data management strategy to resolve. Is your current data warehouse/dashboard instance purely a consolidator of edge-system data (i.e. it is an accumulator of datavalues and meta data)?

It sounds as if the warehouse exists as a consolidator but how often is meta/master data cleaned/reviewed for integrity (is that taking place on a regular basis)? If so - what actions are usually taken?

The intention of our national data dictionary (DD) in SA is to ensure integrity of master/meta data across all DHIS2 systems within our province-wide architecture (both for organisation units and for dx resources). Our DD does not exist to ensure integrity of our national data warehouse (NHIRD). For the most part we try to identify new master/meta data within edge-databases and load these recprds into the warehouse before importing data. That way we are prepared before the problem occurs (at least that is our best intention).

We are in the process of developing a synchronization tool for this data dictionary with major coding support coming from the HISP India developers. Maybe they have scripts that can assist?

Best,

Greg

p.s. Gartner research group sometimes make sense (see attachment)

image

···

On Wed, Dec 10, 2014 at 7:41 PM, Bob Jolliffe bobjolliffe@gmail.com wrote:

Hi

I have been helping out in Rwanda with their dhis2 setup(s) and I want to share a couple of issues to see whether some of these have been experienced by others and whether we can put our collective heads together on the trickier ones.

The setting is relatively complex in that they have a number of dhis2 instances. The busiest one is the HMIS which is essentially the national routine system which is collecting data from each facility (2466 dataelements in total). There are a few other systems which collect special data - such as the PBF system - also using dhis2.

Then they have a central datawarehouse/dashboard DHIS2 instance which accumulates some data from each of these others. In general it is a restricted subset. So for example there is a dataelement group in HMIS which defines 268 dataelements to be exported to datawarehouse each month. In addition it collects data from some external (non-dhis) systems.

So there are a number of challenges, some which are solved and some which remain unsolved. The first is to synchronize orgunits between them all which we have had largely working for some time (if a bit clumsily).

The bigger one is keeping the dataelement+categoryXXX structures compatible between them all. I have written some simple scripts to extract/filter all the relevant structural metadata for a dataelement group (its dataelements and related categoryXXXX’s). So this set can be exported from one system (eg HMIS) and imported into another (eg Datawarehouse). This solves the problem of routinely exporting datavalues from a dataelementgroup in HMIS and reliably pushing to the datawarehouse without weird conflicts. But …

When you have another system (eg PBF) doing the same thing, it gets hairier. One problem we see is that the categoryoption lists on different systems can have common elements (eg “Male”) but they are independently defined on both HMIS and PBF. So if I export PBF metadata to datawarehouse I will end up with two “Male” categoryOptions with different uids. It seems clear that within this small universe of dhis2 systems they will all have to harmonize these things. So it seems there are two options …

(i) to abandon the project of distributed systems and to bring everything together on one uber-instance with associated usergroups and access controls, or

(ii) to setup a metadata repository instance from which all systems derive their metadata and disable editing on the client systems.

(i) is tempting but I think it escapes the problem rather than solves it. There are valid reasons to maintain these systems separately and there will always be similar cases. So we will probably do (ii). If dhis2 cannot interoperate with itself it augurs badly for its interoperability in general.

Has anybody done something similar? Large scale rationalization of metadata will involve also the massaging of the datavalue tables to keep everything true as the catoptcombos get reorganized. Given that some of the systems have been in full operation for over 3 years now there are the usual messes which have accumulated over time amongst the categoryXXX structures which will need to be addressed anyway to arrive at a pristine state. Maybe there are some nice scripts for this?

Things like sharing settings on categoryoptions provide a bit of a puzzle, but also possibly something useful. Presumably we can set them to be completely readonly on the metadata client systems.

Bob


Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Health Information Systems Program

**- - - - - - - ****- - - - - - - ****- - - - - - - ****- - - - - - - ****- - - - - **

Mobile : 073 246 2992
Landline: 021 554 3130
Fax: 086 733 8432
Skype: gregory_rowles

Thanks Greg for the heads up from SA. Elmarie also just let me know of your plans. Which, as you know, I have been dimly aware of for some time :slight_smile: I was kind of hoping you guys would respond …

From yours (and Gartners) classification the datawarehouse would be a consolidator of datavalues.

From what you describe, your DD collects metadata from edge systems and populates nation DW with what it finds. So getting to specifics, what do you do about categoryoptions (I think DEs and orgUnits are both conceptually simpler cases)? Taking my example as a starting point. Two edge systems have both defined “Male” with particular different UIDs. When they report datavalues they will make use of categoryoptioncombos which envelop these different encodings for “Male”. What do you put in your DD and what do you tell the DW about “Male”?

Do you create aliases, which would be one approach?

Or do you try and push standardized terminology downwards towards the edge system? Your case might be politically more complex bvecause of the need to respect autonomy in the provinces. The Rwanda case is slightly different and potentially simpler in that all these systems are central, national systems.

Bob

PS. my process of constructingfiltered metadata is a very simple xslt pipeline. Starting with all the structural metadata related to dataelements (like you would get from GET to /api/metaData.xml?assumeTrue=false&dataElements=true&dataElementGroups=true&categories=true&categoryCombos=true&categoryOptions=true&categoryOptionCombos=true" ) I pass through a very simple 3 stage pipeline using xslt. First isolate the dataelements belonging to the group. Second filter out only the categorycombos and catregoryOptionCombos referred to by those dataelements. Finally filter the categotryoptions and categories referenced in those categorycombos. Just a few lines of code in all. Maybe this would work for you too.

···

On 10 December 2014 at 21:50, Greg Rowles greg.rowles@gmail.com wrote:

Hi Bob

This is an interesting challenge. I can think of different methods to address the current alignment problem but you obviously have a long term (master/meta) data management strategy to resolve. Is your current data warehouse/dashboard instance purely a consolidator of edge-system data (i.e. it is an accumulator of datavalues and meta data)?

It sounds as if the warehouse exists as a consolidator but how often is meta/master data cleaned/reviewed for integrity (is that taking place on a regular basis)? If so - what actions are usually taken?

The intention of our national data dictionary (DD) in SA is to ensure integrity of master/meta data across all DHIS2 systems within our province-wide architecture (both for organisation units and for dx resources). Our DD does not exist to ensure integrity of our national data warehouse (NHIRD). For the most part we try to identify new master/meta data within edge-databases and load these recprds into the warehouse before importing data. That way we are prepared before the problem occurs (at least that is our best intention).

We are in the process of developing a synchronization tool for this data dictionary with major coding support coming from the HISP India developers. Maybe they have scripts that can assist?

Best,

Greg

p.s. Gartner research group sometimes make sense (see attachment)

On Wed, Dec 10, 2014 at 7:41 PM, Bob Jolliffe bobjolliffe@gmail.com wrote:

Hi

I have been helping out in Rwanda with their dhis2 setup(s) and I want to share a couple of issues to see whether some of these have been experienced by others and whether we can put our collective heads together on the trickier ones.

The setting is relatively complex in that they have a number of dhis2 instances. The busiest one is the HMIS which is essentially the national routine system which is collecting data from each facility (2466 dataelements in total). There are a few other systems which collect special data - such as the PBF system - also using dhis2.

Then they have a central datawarehouse/dashboard DHIS2 instance which accumulates some data from each of these others. In general it is a restricted subset. So for example there is a dataelement group in HMIS which defines 268 dataelements to be exported to datawarehouse each month. In addition it collects data from some external (non-dhis) systems.

So there are a number of challenges, some which are solved and some which remain unsolved. The first is to synchronize orgunits between them all which we have had largely working for some time (if a bit clumsily).

The bigger one is keeping the dataelement+categoryXXX structures compatible between them all. I have written some simple scripts to extract/filter all the relevant structural metadata for a dataelement group (its dataelements and related categoryXXXX’s). So this set can be exported from one system (eg HMIS) and imported into another (eg Datawarehouse). This solves the problem of routinely exporting datavalues from a dataelementgroup in HMIS and reliably pushing to the datawarehouse without weird conflicts. But …

When you have another system (eg PBF) doing the same thing, it gets hairier. One problem we see is that the categoryoption lists on different systems can have common elements (eg “Male”) but they are independently defined on both HMIS and PBF. So if I export PBF metadata to datawarehouse I will end up with two “Male” categoryOptions with different uids. It seems clear that within this small universe of dhis2 systems they will all have to harmonize these things. So it seems there are two options …

(i) to abandon the project of distributed systems and to bring everything together on one uber-instance with associated usergroups and access controls, or

(ii) to setup a metadata repository instance from which all systems derive their metadata and disable editing on the client systems.

(i) is tempting but I think it escapes the problem rather than solves it. There are valid reasons to maintain these systems separately and there will always be similar cases. So we will probably do (ii). If dhis2 cannot interoperate with itself it augurs badly for its interoperability in general.

Has anybody done something similar? Large scale rationalization of metadata will involve also the massaging of the datavalue tables to keep everything true as the catoptcombos get reorganized. Given that some of the systems have been in full operation for over 3 years now there are the usual messes which have accumulated over time amongst the categoryXXX structures which will need to be addressed anyway to arrive at a pristine state. Maybe there are some nice scripts for this?

Things like sharing settings on categoryoptions provide a bit of a puzzle, but also possibly something useful. Presumably we can set them to be completely readonly on the metadata client systems.

Bob


Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Health Information Systems Program

**- - - - - - - ****- - - - - - - ****- - - - - - - ****- - - - - - - ****- - - - - **

Mobile : 073 246 2992
Landline: 021 554 3130
Fax: 086 733 8432
Skype: gregory_rowles

Bob,

Firstly, we need your option II - yes, having nearly all country data in one DHIS2 instance are theoretically an option in a few smaller countries with centralised systems, but it won’t be the norm. In particular because DHIS2 at the moment don’t support multiple Organisational Hierarchies. So the most common scenario is that (a) a country has multiple DHIS2 instance; and (b) the country has a range of other systems they want to run independently BUT which they also want aligned core data dimensions like Organisational Hierarchy (via UIDs or other OrgUnitCoding), data elements, indicators, data reporting periods, etc. A range of commonly used reference data sets should/must also be aligned across all systems, like

  • Mid-year population & housing estimates

  • Census- or Survey-based socio-economic determinants of health

  • Standard GIS base layers (admin boundaries, line layers like roads and rivers, basic land use, key point layers like health facilities, schools, police stations)

Secondly, in addition to what Greg has already outlined: we need synch functionality for existing databases, BUT our objective is also to make the NDD the starting point for any NEW health facility or reporting unit across systems. Or in other words, at some future point the idea is to FIRST add new reporting units to the NDD and then push those out to all linked systems (which will ensure alignment of UIDs and similar).

Thirdly, while I’m not sure exactly how far the Indian developers we have working on the NDD synch functionality have progressed, it seems obvious to me that at least the basic categories and options should be contained in the NDD (and pushed onto all systems using them). I’m less sure about categoryoptionCOMBOS - that’s might be a construct exclusive to DHIS2. TBD…

Regards

Calle

Secondly,

···

On 11 December 2014 at 00:34, Bob Jolliffe bobjolliffe@gmail.com wrote:

Thanks Greg for the heads up from SA. Elmarie also just let me know of your plans. Which, as you know, I have been dimly aware of for some time :slight_smile: I was kind of hoping you guys would respond …

From yours (and Gartners) classification the datawarehouse would be a consolidator of datavalues.

From what you describe, your DD collects metadata from edge systems and populates nation DW with what it finds. So getting to specifics, what do you do about categoryoptions (I think DEs and orgUnits are both conceptually simpler cases)? Taking my example as a starting point. Two edge systems have both defined “Male” with particular different UIDs. When they report datavalues they will make use of categoryoptioncombos which envelop these different encodings for “Male”. What do you put in your DD and what do you tell the DW about “Male”?

Do you create aliases, which would be one approach?

Or do you try and push standardized terminology downwards towards the edge system? Your case might be politically more complex bvecause of the need to respect autonomy in the provinces. The Rwanda case is slightly different and potentially simpler in that all these systems are central, national systems.

Bob

PS. my process of constructingfiltered metadata is a very simple xslt pipeline. Starting with all the structural metadata related to dataelements (like you would get from GET to /api/metaData.xml?assumeTrue=false&dataElements=true&dataElementGroups=true&categories=true&categoryCombos=true&categoryOptions=true&categoryOptionCombos=true" ) I pass through a very simple 3 stage pipeline using xslt. First isolate the dataelements belonging to the group. Second filter out only the categorycombos and catregoryOptionCombos referred to by those dataelements. Finally filter the categotryoptions and categories referenced in those categorycombos. Just a few lines of code in all. Maybe this would work for you too.


Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

On 10 December 2014 at 21:50, Greg Rowles greg.rowles@gmail.com wrote:

Hi Bob

This is an interesting challenge. I can think of different methods to address the current alignment problem but you obviously have a long term (master/meta) data management strategy to resolve. Is your current data warehouse/dashboard instance purely a consolidator of edge-system data (i.e. it is an accumulator of datavalues and meta data)?

It sounds as if the warehouse exists as a consolidator but how often is meta/master data cleaned/reviewed for integrity (is that taking place on a regular basis)? If so - what actions are usually taken?

The intention of our national data dictionary (DD) in SA is to ensure integrity of master/meta data across all DHIS2 systems within our province-wide architecture (both for organisation units and for dx resources). Our DD does not exist to ensure integrity of our national data warehouse (NHIRD). For the most part we try to identify new master/meta data within edge-databases and load these recprds into the warehouse before importing data. That way we are prepared before the problem occurs (at least that is our best intention).

We are in the process of developing a synchronization tool for this data dictionary with major coding support coming from the HISP India developers. Maybe they have scripts that can assist?

Best,

Greg

p.s. Gartner research group sometimes make sense (see attachment)

On Wed, Dec 10, 2014 at 7:41 PM, Bob Jolliffe bobjolliffe@gmail.com wrote:

Hi

I have been helping out in Rwanda with their dhis2 setup(s) and I want to share a couple of issues to see whether some of these have been experienced by others and whether we can put our collective heads together on the trickier ones.

The setting is relatively complex in that they have a number of dhis2 instances. The busiest one is the HMIS which is essentially the national routine system which is collecting data from each facility (2466 dataelements in total). There are a few other systems which collect special data - such as the PBF system - also using dhis2.

Then they have a central datawarehouse/dashboard DHIS2 instance which accumulates some data from each of these others. In general it is a restricted subset. So for example there is a dataelement group in HMIS which defines 268 dataelements to be exported to datawarehouse each month. In addition it collects data from some external (non-dhis) systems.

So there are a number of challenges, some which are solved and some which remain unsolved. The first is to synchronize orgunits between them all which we have had largely working for some time (if a bit clumsily).

The bigger one is keeping the dataelement+categoryXXX structures compatible between them all. I have written some simple scripts to extract/filter all the relevant structural metadata for a dataelement group (its dataelements and related categoryXXXX’s). So this set can be exported from one system (eg HMIS) and imported into another (eg Datawarehouse). This solves the problem of routinely exporting datavalues from a dataelementgroup in HMIS and reliably pushing to the datawarehouse without weird conflicts. But …

When you have another system (eg PBF) doing the same thing, it gets hairier. One problem we see is that the categoryoption lists on different systems can have common elements (eg “Male”) but they are independently defined on both HMIS and PBF. So if I export PBF metadata to datawarehouse I will end up with two “Male” categoryOptions with different uids. It seems clear that within this small universe of dhis2 systems they will all have to harmonize these things. So it seems there are two options …

(i) to abandon the project of distributed systems and to bring everything together on one uber-instance with associated usergroups and access controls, or

(ii) to setup a metadata repository instance from which all systems derive their metadata and disable editing on the client systems.

(i) is tempting but I think it escapes the problem rather than solves it. There are valid reasons to maintain these systems separately and there will always be similar cases. So we will probably do (ii). If dhis2 cannot interoperate with itself it augurs badly for its interoperability in general.

Has anybody done something similar? Large scale rationalization of metadata will involve also the massaging of the datavalue tables to keep everything true as the catoptcombos get reorganized. Given that some of the systems have been in full operation for over 3 years now there are the usual messes which have accumulated over time amongst the categoryXXX structures which will need to be addressed anyway to arrive at a pristine state. Maybe there are some nice scripts for this?

Things like sharing settings on categoryoptions provide a bit of a puzzle, but also possibly something useful. Presumably we can set them to be completely readonly on the metadata client systems.

Bob


Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Health Information Systems Program

**- - - - - - - ****- - - - - - - ****- - - - - - - ****- - - - - - - ****- - - - - **

Mobile : 073 246 2992
Landline: 021 554 3130
Fax: 086 733 8432
Skype: gregory_rowles


Calle Hedberg

46D Alma Road, 7700 Rosebank, SOUTH AFRICA

Tel/fax (home): +27-21-685-6472

Cell: +27-82-853-5352

Iridium SatPhone: +8816-315-19274

Email: calle.hedberg@gmail.com

Skype: calle_hedberg


Hi Calle

Thanks for your input, which I don’t have time to respond to fully but I think you start to outline a useful set of requirements. It might be useful to gather these somewhere in a google doc or the like.

I think in the move towards your second point, the interesting challenge is how to migrate existing systems which might already have different and slightly messy structural metadata through natural historical process towards using a standardized set. Starting in a green field is easy by comparison.

The categoryXXX (McDonalds) model is certainly the most difficult challenge. So for example for the 5 or 6 systems in Rwanda we would probably need to

(i) clean all the metadata on all systems of any aberrations (orphaned catoptcombos, orphaned categories etc … the typical messiness ) so that each system has a different but at least internally consistent set.

(ii) merge existing duplicates of category options : “<1”, “under 1”, “< 1” etc in each system which have again arisen historically

(iii) bring these reduced and cleaned sets together, side by side and try to harmonize uids and codes (including categories, combos etc))

(iv) push back to the source systems, being careful not to disturb the existing datavalue integrity

From there we would be in a position to launch into a normalized routine of managing all this metadata centrally. i’d be very interested to know whether HISP SA or HISP India have developed any tooling (scripts?) to assist with above tasks.

The catoptcombos are an interesting quandary. We have a highly idiosyncratic model here which external systems find both difficult and unpleasant to consume. For very simple datasets and simple client systems they work fine. Like mobile apps pushing data off a simple datavalueset template. But when the external system is also a dictionary driven system with its own codelists and the like, they will be much more interested in matching codelists (with our categories) than in reconstructing our lattice. (Some of these questions will be very relevant to the IHE discussion around standardizing this stuff)

Bob

···

On 11 December 2014 at 09:27, Calle Hedberg calle.hedberg@gmail.com wrote:

Bob,

Firstly, we need your option II - yes, having nearly all country data in one DHIS2 instance are theoretically an option in a few smaller countries with centralised systems, but it won’t be the norm. In particular because DHIS2 at the moment don’t support multiple Organisational Hierarchies. So the most common scenario is that (a) a country has multiple DHIS2 instance; and (b) the country has a range of other systems they want to run independently BUT which they also want aligned core data dimensions like Organisational Hierarchy (via UIDs or other OrgUnitCoding), data elements, indicators, data reporting periods, etc. A range of commonly used reference data sets should/must also be aligned across all systems, like

  • Mid-year population & housing estimates
  • Census- or Survey-based socio-economic determinants of health
  • Standard GIS base layers (admin boundaries, line layers like roads and rivers, basic land use, key point layers like health facilities, schools, police stations)

Secondly, in addition to what Greg has already outlined: we need synch functionality for existing databases, BUT our objective is also to make the NDD the starting point for any NEW health facility or reporting unit across systems. Or in other words, at some future point the idea is to FIRST add new reporting units to the NDD and then push those out to all linked systems (which will ensure alignment of UIDs and similar).

Thirdly, while I’m not sure exactly how far the Indian developers we have working on the NDD synch functionality have progressed, it seems obvious to me that at least the basic categories and options should be contained in the NDD (and pushed onto all systems using them). I’m less sure about categoryoptionCOMBOS - that’s might be a construct exclusive to DHIS2. TBD…

Regards

Calle

Secondly,

On 11 December 2014 at 00:34, Bob Jolliffe bobjolliffe@gmail.com wrote:

Thanks Greg for the heads up from SA. Elmarie also just let me know of your plans. Which, as you know, I have been dimly aware of for some time :slight_smile: I was kind of hoping you guys would respond …

From yours (and Gartners) classification the datawarehouse would be a consolidator of datavalues.

From what you describe, your DD collects metadata from edge systems and populates nation DW with what it finds. So getting to specifics, what do you do about categoryoptions (I think DEs and orgUnits are both conceptually simpler cases)? Taking my example as a starting point. Two edge systems have both defined “Male” with particular different UIDs. When they report datavalues they will make use of categoryoptioncombos which envelop these different encodings for “Male”. What do you put in your DD and what do you tell the DW about “Male”?

Do you create aliases, which would be one approach?

Or do you try and push standardized terminology downwards towards the edge system? Your case might be politically more complex bvecause of the need to respect autonomy in the provinces. The Rwanda case is slightly different and potentially simpler in that all these systems are central, national systems.

Bob

PS. my process of constructingfiltered metadata is a very simple xslt pipeline. Starting with all the structural metadata related to dataelements (like you would get from GET to /api/metaData.xml?assumeTrue=false&dataElements=true&dataElementGroups=true&categories=true&categoryCombos=true&categoryOptions=true&categoryOptionCombos=true" ) I pass through a very simple 3 stage pipeline using xslt. First isolate the dataelements belonging to the group. Second filter out only the categorycombos and catregoryOptionCombos referred to by those dataelements. Finally filter the categotryoptions and categories referenced in those categorycombos. Just a few lines of code in all. Maybe this would work for you too.


Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp


Calle Hedberg

46D Alma Road, 7700 Rosebank, SOUTH AFRICA

Tel/fax (home): +27-21-685-6472

Cell: +27-82-853-5352

Iridium SatPhone: +8816-315-19274

Email: calle.hedberg@gmail.com

Skype: calle_hedberg


On 10 December 2014 at 21:50, Greg Rowles greg.rowles@gmail.com wrote:

Hi Bob

This is an interesting challenge. I can think of different methods to address the current alignment problem but you obviously have a long term (master/meta) data management strategy to resolve. Is your current data warehouse/dashboard instance purely a consolidator of edge-system data (i.e. it is an accumulator of datavalues and meta data)?

It sounds as if the warehouse exists as a consolidator but how often is meta/master data cleaned/reviewed for integrity (is that taking place on a regular basis)? If so - what actions are usually taken?

The intention of our national data dictionary (DD) in SA is to ensure integrity of master/meta data across all DHIS2 systems within our province-wide architecture (both for organisation units and for dx resources). Our DD does not exist to ensure integrity of our national data warehouse (NHIRD). For the most part we try to identify new master/meta data within edge-databases and load these recprds into the warehouse before importing data. That way we are prepared before the problem occurs (at least that is our best intention).

We are in the process of developing a synchronization tool for this data dictionary with major coding support coming from the HISP India developers. Maybe they have scripts that can assist?

Best,

Greg

p.s. Gartner research group sometimes make sense (see attachment)

On Wed, Dec 10, 2014 at 7:41 PM, Bob Jolliffe bobjolliffe@gmail.com wrote:

Hi

I have been helping out in Rwanda with their dhis2 setup(s) and I want to share a couple of issues to see whether some of these have been experienced by others and whether we can put our collective heads together on the trickier ones.

The setting is relatively complex in that they have a number of dhis2 instances. The busiest one is the HMIS which is essentially the national routine system which is collecting data from each facility (2466 dataelements in total). There are a few other systems which collect special data - such as the PBF system - also using dhis2.

Then they have a central datawarehouse/dashboard DHIS2 instance which accumulates some data from each of these others. In general it is a restricted subset. So for example there is a dataelement group in HMIS which defines 268 dataelements to be exported to datawarehouse each month. In addition it collects data from some external (non-dhis) systems.

So there are a number of challenges, some which are solved and some which remain unsolved. The first is to synchronize orgunits between them all which we have had largely working for some time (if a bit clumsily).

The bigger one is keeping the dataelement+categoryXXX structures compatible between them all. I have written some simple scripts to extract/filter all the relevant structural metadata for a dataelement group (its dataelements and related categoryXXXX’s). So this set can be exported from one system (eg HMIS) and imported into another (eg Datawarehouse). This solves the problem of routinely exporting datavalues from a dataelementgroup in HMIS and reliably pushing to the datawarehouse without weird conflicts. But …

When you have another system (eg PBF) doing the same thing, it gets hairier. One problem we see is that the categoryoption lists on different systems can have common elements (eg “Male”) but they are independently defined on both HMIS and PBF. So if I export PBF metadata to datawarehouse I will end up with two “Male” categoryOptions with different uids. It seems clear that within this small universe of dhis2 systems they will all have to harmonize these things. So it seems there are two options …

(i) to abandon the project of distributed systems and to bring everything together on one uber-instance with associated usergroups and access controls, or

(ii) to setup a metadata repository instance from which all systems derive their metadata and disable editing on the client systems.

(i) is tempting but I think it escapes the problem rather than solves it. There are valid reasons to maintain these systems separately and there will always be similar cases. So we will probably do (ii). If dhis2 cannot interoperate with itself it augurs badly for its interoperability in general.

Has anybody done something similar? Large scale rationalization of metadata will involve also the massaging of the datavalue tables to keep everything true as the catoptcombos get reorganized. Given that some of the systems have been in full operation for over 3 years now there are the usual messes which have accumulated over time amongst the categoryXXX structures which will need to be addressed anyway to arrive at a pristine state. Maybe there are some nice scripts for this?

Things like sharing settings on categoryoptions provide a bit of a puzzle, but also possibly something useful. Presumably we can set them to be completely readonly on the metadata client systems.

Bob


Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Health Information Systems Program

**- - - - - - - ****- - - - - - - ****- - - - - - - ****- - - - - - - ****- - - - - **

Mobile : 073 246 2992
Landline: 021 554 3130
Fax: 086 733 8432
Skype: gregory_rowles

Bob,

The meta-data synch functionality we have developed (currently ready for testing) covers most meta-data including categoryoptioncombos, but not meta-data related to data sets. Synching data sets are relevant in cases where a country uses a distributed model for some databases(e.g. South Africa is using centralised instances for nearly all databases except the main HMIS databases where each province has its own instance).

After thorough testing and final review by Lars, who did an initial review of it after the DHIS2 Academy in Oslo in August, it will be included with the trunk and then presumably become part of 2.18.

Our lattice, as you call it, might be complex - but that usually reflect the complex reality it tries to model. Which generally shows the potential dangers of standardising / simplifying TOO MUCH - after which any framework (like IHE) becomes a straitjacket that developers/users try to circumvent…

Regards

Calle

···

On 11 December 2014 at 12:23, Bob Jolliffe bobjolliffe@gmail.com wrote:

Hi Calle

Thanks for your input, which I don’t have time to respond to fully but I think you start to outline a useful set of requirements. It might be useful to gather these somewhere in a google doc or the like.

I think in the move towards your second point, the interesting challenge is how to migrate existing systems which might already have different and slightly messy structural metadata through natural historical process towards using a standardized set. Starting in a green field is easy by comparison.

The categoryXXX (McDonalds) model is certainly the most difficult challenge. So for example for the 5 or 6 systems in Rwanda we would probably need to

(i) clean all the metadata on all systems of any aberrations (orphaned catoptcombos, orphaned categories etc … the typical messiness ) so that each system has a different but at least internally consistent set.

(ii) merge existing duplicates of category options : “<1”, “under 1”, “< 1” etc in each system which have again arisen historically

(iii) bring these reduced and cleaned sets together, side by side and try to harmonize uids and codes (including categories, combos etc))

(iv) push back to the source systems, being careful not to disturb the existing datavalue integrity

From there we would be in a position to launch into a normalized routine of managing all this metadata centrally. i’d be very interested to know whether HISP SA or HISP India have developed any tooling (scripts?) to assist with above tasks.

The catoptcombos are an interesting quandary. We have a highly idiosyncratic model here which external systems find both difficult and unpleasant to consume. For very simple datasets and simple client systems they work fine. Like mobile apps pushing data off a simple datavalueset template. But when the external system is also a dictionary driven system with its own codelists and the like, they will be much more interested in matching codelists (with our categories) than in reconstructing our lattice. (Some of these questions will be very relevant to the IHE discussion around standardizing this stuff)

Bob

On 11 December 2014 at 09:27, Calle Hedberg calle.hedberg@gmail.com wrote:

Bob,

Firstly, we need your option II - yes, having nearly all country data in one DHIS2 instance are theoretically an option in a few smaller countries with centralised systems, but it won’t be the norm. In particular because DHIS2 at the moment don’t support multiple Organisational Hierarchies. So the most common scenario is that (a) a country has multiple DHIS2 instance; and (b) the country has a range of other systems they want to run independently BUT which they also want aligned core data dimensions like Organisational Hierarchy (via UIDs or other OrgUnitCoding), data elements, indicators, data reporting periods, etc. A range of commonly used reference data sets should/must also be aligned across all systems, like

  • Mid-year population & housing estimates
  • Census- or Survey-based socio-economic determinants of health
  • Standard GIS base layers (admin boundaries, line layers like roads and rivers, basic land use, key point layers like health facilities, schools, police stations)

Secondly, in addition to what Greg has already outlined: we need synch functionality for existing databases, BUT our objective is also to make the NDD the starting point for any NEW health facility or reporting unit across systems. Or in other words, at some future point the idea is to FIRST add new reporting units to the NDD and then push those out to all linked systems (which will ensure alignment of UIDs and similar).

Thirdly, while I’m not sure exactly how far the Indian developers we have working on the NDD synch functionality have progressed, it seems obvious to me that at least the basic categories and options should be contained in the NDD (and pushed onto all systems using them). I’m less sure about categoryoptionCOMBOS - that’s might be a construct exclusive to DHIS2. TBD…

Regards

Calle

Secondly,

On 11 December 2014 at 00:34, Bob Jolliffe bobjolliffe@gmail.com wrote:

Thanks Greg for the heads up from SA. Elmarie also just let me know of your plans. Which, as you know, I have been dimly aware of for some time :slight_smile: I was kind of hoping you guys would respond …

From yours (and Gartners) classification the datawarehouse would be a consolidator of datavalues.

From what you describe, your DD collects metadata from edge systems and populates nation DW with what it finds. So getting to specifics, what do you do about categoryoptions (I think DEs and orgUnits are both conceptually simpler cases)? Taking my example as a starting point. Two edge systems have both defined “Male” with particular different UIDs. When they report datavalues they will make use of categoryoptioncombos which envelop these different encodings for “Male”. What do you put in your DD and what do you tell the DW about “Male”?

Do you create aliases, which would be one approach?

Or do you try and push standardized terminology downwards towards the edge system? Your case might be politically more complex bvecause of the need to respect autonomy in the provinces. The Rwanda case is slightly different and potentially simpler in that all these systems are central, national systems.

Bob

PS. my process of constructingfiltered metadata is a very simple xslt pipeline. Starting with all the structural metadata related to dataelements (like you would get from GET to /api/metaData.xml?assumeTrue=false&dataElements=true&dataElementGroups=true&categories=true&categoryCombos=true&categoryOptions=true&categoryOptionCombos=true" ) I pass through a very simple 3 stage pipeline using xslt. First isolate the dataelements belonging to the group. Second filter out only the categorycombos and catregoryOptionCombos referred to by those dataelements. Finally filter the categotryoptions and categories referenced in those categorycombos. Just a few lines of code in all. Maybe this would work for you too.


Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp


Calle Hedberg

46D Alma Road, 7700 Rosebank, SOUTH AFRICA

Tel/fax (home): +27-21-685-6472

Cell: +27-82-853-5352

Iridium SatPhone: +8816-315-19274

Email: calle.hedberg@gmail.com

Skype: calle_hedberg


On 10 December 2014 at 21:50, Greg Rowles greg.rowles@gmail.com wrote:

Hi Bob

This is an interesting challenge. I can think of different methods to address the current alignment problem but you obviously have a long term (master/meta) data management strategy to resolve. Is your current data warehouse/dashboard instance purely a consolidator of edge-system data (i.e. it is an accumulator of datavalues and meta data)?

It sounds as if the warehouse exists as a consolidator but how often is meta/master data cleaned/reviewed for integrity (is that taking place on a regular basis)? If so - what actions are usually taken?

The intention of our national data dictionary (DD) in SA is to ensure integrity of master/meta data across all DHIS2 systems within our province-wide architecture (both for organisation units and for dx resources). Our DD does not exist to ensure integrity of our national data warehouse (NHIRD). For the most part we try to identify new master/meta data within edge-databases and load these recprds into the warehouse before importing data. That way we are prepared before the problem occurs (at least that is our best intention).

We are in the process of developing a synchronization tool for this data dictionary with major coding support coming from the HISP India developers. Maybe they have scripts that can assist?

Best,

Greg

p.s. Gartner research group sometimes make sense (see attachment)

On Wed, Dec 10, 2014 at 7:41 PM, Bob Jolliffe bobjolliffe@gmail.com wrote:

Hi

I have been helping out in Rwanda with their dhis2 setup(s) and I want to share a couple of issues to see whether some of these have been experienced by others and whether we can put our collective heads together on the trickier ones.

The setting is relatively complex in that they have a number of dhis2 instances. The busiest one is the HMIS which is essentially the national routine system which is collecting data from each facility (2466 dataelements in total). There are a few other systems which collect special data - such as the PBF system - also using dhis2.

Then they have a central datawarehouse/dashboard DHIS2 instance which accumulates some data from each of these others. In general it is a restricted subset. So for example there is a dataelement group in HMIS which defines 268 dataelements to be exported to datawarehouse each month. In addition it collects data from some external (non-dhis) systems.

So there are a number of challenges, some which are solved and some which remain unsolved. The first is to synchronize orgunits between them all which we have had largely working for some time (if a bit clumsily).

The bigger one is keeping the dataelement+categoryXXX structures compatible between them all. I have written some simple scripts to extract/filter all the relevant structural metadata for a dataelement group (its dataelements and related categoryXXXX’s). So this set can be exported from one system (eg HMIS) and imported into another (eg Datawarehouse). This solves the problem of routinely exporting datavalues from a dataelementgroup in HMIS and reliably pushing to the datawarehouse without weird conflicts. But …

When you have another system (eg PBF) doing the same thing, it gets hairier. One problem we see is that the categoryoption lists on different systems can have common elements (eg “Male”) but they are independently defined on both HMIS and PBF. So if I export PBF metadata to datawarehouse I will end up with two “Male” categoryOptions with different uids. It seems clear that within this small universe of dhis2 systems they will all have to harmonize these things. So it seems there are two options …

(i) to abandon the project of distributed systems and to bring everything together on one uber-instance with associated usergroups and access controls, or

(ii) to setup a metadata repository instance from which all systems derive their metadata and disable editing on the client systems.

(i) is tempting but I think it escapes the problem rather than solves it. There are valid reasons to maintain these systems separately and there will always be similar cases. So we will probably do (ii). If dhis2 cannot interoperate with itself it augurs badly for its interoperability in general.

Has anybody done something similar? Large scale rationalization of metadata will involve also the massaging of the datavalue tables to keep everything true as the catoptcombos get reorganized. Given that some of the systems have been in full operation for over 3 years now there are the usual messes which have accumulated over time amongst the categoryXXX structures which will need to be addressed anyway to arrive at a pristine state. Maybe there are some nice scripts for this?

Things like sharing settings on categoryoptions provide a bit of a puzzle, but also possibly something useful. Presumably we can set them to be completely readonly on the metadata client systems.

Bob


Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Health Information Systems Program

**- - - - - - - ****- - - - - - - ****- - - - - - - ****- - - - - - - ****- - - - - **

Mobile : 073 246 2992
Landline: 021 554 3130
Fax: 086 733 8432
Skype: gregory_rowles


Calle Hedberg

46D Alma Road, 7700 Rosebank, SOUTH AFRICA

Tel/fax (home): +27-21-685-6472

Cell: +27-82-853-5352

Iridium SatPhone: +8816-315-19274

Email: calle.hedberg@gmail.com

Skype: calle_hedberg