dhis2 dxf data import

bobj · 16 September 2011 23:13

Hi Bob,
Thanks for this.
This will be very useful.

what do you think about openmrs generating dxf like ihris?

Hi Ime

That’s a pertinent question and one that has preoccupied me for some time. If you have been following discussion on the openmrs devs list some weeks back you might have picked up that

openmrs can already generate all manner of flexible renderings of what they call indicator reports using its reporting framework
one of the renderings which already exist for openmrs is the SDMX-HD module developed by Jembi. SDMX-HD cross sectional data is very similar to this dxf2 data - to the extent that it is easily transformed from one to the other. Having had something of a hand in both, I can assure you this convergence is not entirely accidental This is what we have done with iHRIS in Kenya for example. iHRIS is actually producing (well it can produce) SDMX-HD data and we simply transform it to dxf2 during import. The process is very efficient and we can comfortably import 300 000 datavalues in 1 minute.
we have similarly imported SDMX-HD data from openmrs in Sierra Leone though this was very much a proof of concept. An important detail which escaped me then and which I have only come to understand fairly recently, is that the type of “indicators” which can be generated through the openmrs reporting module framework and rendered with the jembi module are something called “Cohort Indicators” and these, while being very powerful, are also quite resrticted in what they can measure.
A cohort indicator is an aggregation or calculation based upon a cohort of patients - so you can easily generate dataelements like “number of patients with reduced CD4 count this month”. These are really useful indicators from a practitioner or clinical research perspective. But many (perhaps most) of our HMIS indicators are not cohort based - they are often based on a measure of service delivery, like for example “how many malaria cases treated this month”.
so we are a bit back to the drawing board here The SDMX-HD capability in openmrs is restricted to cohort indicators, but in general these form only a subset of of what a facility might typically have to routinely report. So we can currently read data from OpenMRS, but only a small subset of what is realistically required for a typical facility.

Having said that all far from lost. But it does mean that further customisation of openmrs is required in order to produce typical dataelements. Developers in Rwanda for example have created a solution to produce more flexible reports. There is work underway at present to define a new SQL Indicator type in the openmrs reporting module which will allow these to be mainstreamed into the core reporting framework. Though I am not optimistic this will easily tie in to the existing openmrs sdmx integration module, the critical thing is to be able to produce the right data. The data format, as you have observed, is fairly trivial to render. Chuyen, a Vietnamese developer working with HISP india, is also working on a (hopefully) simple openmrs module to create a basic aggregate reporting capability. The format here is not the major issue. Once we have the means to generate the dataelement values required it can easily enough be rendered in this dxf2 format or in sdmx-hd. In fact as long as its got a dataelement code, a period and a value I think we can (and happily will) swallow any xml representation.

There you go … long answer to short question.

Regards
Bob

PS. short answer to another point you raised right at the start re streamlining of metadata export. I think the requirement is actually even simpler than you imagine. Whereas a system can determine the codes it needs from a “complete” dhis metadata export, its actually much simpler for both parties for dhis to simply export codelists for dataelements and orgunits, rather than the kitchen sink of shortnames, alternative names, geo-coordinates etc etc.

···

On 16 September 2011 02:36, Ime Asangansi asangansi@yahoo.com wrote:

Ime

— On Thu, 9/15/11, Bob Jolliffe bobjolliffe@gmail.com wrote:

From: Bob Jolliffe bobjolliffe@gmail.com
Subject: Re: [Dhis2-devs] dhis2 dxf data import
To: “Ime Asangansi” asangansi@yahoo.com

Cc: “dhis2-devs” dhis2-devs@lists.launchpad.net
Date: Thursday, September 15, 2011, 11:36 AM

On 15 September 2011 10:01, Ime Asangansi <asangansi@yahoo.com> > > wrote:

Thanks Bob, the pdf is useful.

When you mean codes, you mean the unique id for the record?

I wish life were so simple There are quite a few ways that an identifiable object (eg Orgunit) can be judged unique:

the primary database key

the name

the uuid

the code

1 and 2 are not good for a number of reasons.

3 is quite ok except that (a) its a bit long and (b) we might have to map to data from elsewhere which doesn’t use a uuid.

This latter case is quite common - if dhis was the central authority in the world for assigning metadata (sometimes it feels like it is designed as if it is life might be better - but the reality is that sometimes there are other authorities and it is good that there are. The case we have been dealing with in Kenya for example - where they have an official Master Facility List which is responsible for registering facilities and assigning codes. In which case we use these official codes in the code field of orgunit.

Secondly, +1 for an internal routine to assign ids

I’m in two minds about this. For sure it might be better to have generated codes ou23, de456 etc rather than leave the field blank. But codes generally work best when assigned, such as the MFL case above.

Thirdly, please how are you generating the dxf for ihris?

iHRIS is doing the generating the dxf for us ie. they are generating HR dataelement values (number of docs, nurses etc)

Bob

Thanks

Ime

— On Thu, 9/15/11, Bob Jolliffe <bobjolliffe@gmail.com> wrote:

This will also streamline metedata generation because one will only need to generate and pass metadata for only de/ou/period.

But I wonder; what’s the difference between orgunitId and orgunit in your example.

That’s a typo. Please take a look at the pdf file I sent out earlier this week as that is more correct.

Also, some elements don’t use any categories, but the model references a default categorycombo. How will that look in your proposed schema?

The default categorycombo is just that - default. So in the absence of any categories the categorycombo is automatically set to default when saving datavalues.

Would you branch Jo’s code in a way we could use easily test yours as a module? or…

The reading of this format is already implemented in the import/export module. It is tightly coupled with Jo’s code in the sense of making use of the same element/attribute name strings defined in his beans. So you can already use it by just importing the xml file. To test you should ideally set up some codes in your database. We should try and do this in the demo instance so people can try it there. Meanwhile I would suggest to test:

(i) pick an orgunit and assign it a code if it does not already have one (eg ou1)
(ii) pick a small dataset and assign it a code (eg dataset1)
(iii) assign codes to the dataelements within the dataset
(iv) assign the dataset to the orgunit

Then you should be able to import datavaluesets according to the examples given.

Alternatively you can use the existing uuids instead of the codes.

(It might be worth having a startup routine which automatically assigns codes based on the existing internal ids where they do not already exist.)

Regards
Bob

Thanks

Ime

— On Thu, 9/1/11, Bob Jolliffe <bobjolliffe@gmail.com> wrote:
The implication of adding all the above will be that whereas the
datavalueset above will remain valid (except perhaps shifting to
storedBy), the following would also be valid:

<dataValueSets orgUnitId=“code” dataElementId=“internal”
<dataValue dataElement="2" value="4" Sex="1" />
<dataValue dataElement="2" value="5" Sex="2"/>
<dataValue dataElement="4" value="43" Sex="1" Age="3" />



<dataValue dataElement="5" value="44" Sex="1" Age="3" />
I am pretty sure I can implement the above without breaking what is

currently there. One possible but minor breaking change I would
suggest to improving parsing of very large datasets might be to
abbreviate some well known element names to dv, de and v for
compactness.