On categories and dimensions and zooks

jason · 29 September 2009 05:40

I think Ola is going to write up something on this as well, but I
wanted to pre-empt him and offer some explanation and perhaps an
apology, as I started all of this. Ola and I had a long and productive
chat yesterday afternoon, where we went thought in detail the issue we
have been discussing over the past few days. We have somewhat of a
hybrid set-up here in Zambia, with DHIS 1.4 being used as the primary
means of data collection in the districts. Data is then imported on a
regular basis into DHIS 2, which contains additional data sets that
are not part of the routine data collected with DHIS 1.4. As Ola has
explained in detail to me now, the category options/combos
(multidimensional data elements) that have been implemented in DHiS2
are for a specific case when there is a "master" data element, such as
Malaria cases, and multiple categories and options (Age, Patient
status, etc) that make the data element multi-dimensional. I did not
realize this when this discussion began, thinking that somehow, I
could assign dimensionality to a plain-old data element (PODE??) with
the categories tables.

Our data elements have been created and imported directly from 1.4.
DHIS 1.4 has a very flat model. Each data element must be defined
separately, for each level of disaggregation (which have been included
as examples I have provided through my mails). The important thing
that I realized today during our conversation is that I need to assign
some dimensionality to my data elements, that is completely
independent of cateogries/cateogry options. I have been trying to use
the category combos/options to do this, but it was apparently not the
intended use.

I am going to continue to experiment with the categorycombos/options
as there are still a few issues I am not comfortable with, but it the
meantime, I guess we need to find another solution of how to assign
dimensionality to non-multidimensional data elements (i.e. those
without categories).

Regards,
Jason

···

On Mon, Sep 28, 2009 at 3:00 PM, Jason Pickering <jason.p.pickering@gmail.com> wrote:

The link that you sent seems reasonable, but this is not what has been
implemented in DHIS 2 at this point as far as I can tell. In the
current model (at least through the UI), I have created two categories
(Age and Patient status). I then created a categorycombo
Age_Patientstatus (Age and Patient status). I do this because
conceptually, Age is a dimension, and each of the member of the
category should correspond to a dimensional element (Under 1, 1-5 and
Over 5 for Age). This is necessary because in the SQL view, I need to
have a single column for each dimension, populated with the
appropriate dimensional elements.

query did not work for me but looking at the source of the query, I
assume this is what is supposed to happen.

If I then assign these categorycombos to my data elements, I already
know it is not going to work, because I have no idea which one of the
categorycombooptions is applicable to a particular data element. I
suppose this is why I would need to create a categorycombo with
exactly one option in each category, which again, is not desired. Each
category should be able to have multiple options.

Now, I do see some light.

Now, looking at the current database, when I generate the resource
table, this is what i get back in categoryoptioncombo name
25270;25260;"(Over 5,IPD,)"
25271;25261;"(Over 5,Deaths,)"
25272;25262;"(Over 5,OPD,)"
25273;25263;"(Under 1,IPD,)"
25274;25264;"(Under 1,Deaths,)"
25275;25265;"(Under 1,OPD,)"
25277;25267;"(Age 1-5,Deaths,)"
25278;25268;"(Age 1-5,OPD,)"
;;""
25276;25266;"(Age 1-5,IPD,)"

Now, this looks very much like what I need in my PivotTable source
query, which I think is what the query that Johan just sent is
supposed to provide. (The query did not work, but I assume this is
what it is meant to happen).

The problem is now, I have no idea (at least directly) that the first
set of values corresponds to Age and the second set of values
corresponds to Patient status.

If I could assign a data element a categoryoptioncomboid (the second
number in that result set above) instead of a categorycomboid (as is
the case now) I think I would be able to produce the result set that I
actually want. However, by assigning a data element the
categorycomboid , I can only tell which dimensions the data element
has, but not which particular dimensional elements it possesses.

So, perhaps you are right that there is no need for any changes to the
data model, but rather the assignment that I mention above.

Johan, thanks for the query. I will see if I can get it to work.

Best regards,
Jason

On Fri, Sep 25, 2009 at 12:07 PM, Knut Staring <knutst@gmail.com> wrote:

http://208.76.222.114/confluence/display/RandD/General+multi-dimensional+model

On Fri, Sep 25, 2009 at 11:55 AM, Abyot Gizaw <abyota@gmail.com> wrote:

The one-to-one relationship mentioned between dataelement and
categorycombo is not correct !

The realtionship is one-to-many. A categorycombo can be assigned for many
dataelements. But a dataelement can have only one categorycombo.

Thank you
Abyot.

On Fri, Sep 25, 2009 at 11:44 AM, Jason Pickering >>> <jason.p.pickering@gmail.com> wrote:

Hi there.

My basic issue with the category/category combo is that it appears to be
a one-to-one relationship with data elements. If I look at the data model,
there is a one-to-one relationship between dataelement and categorycomboid.
For a given category combo, you can have multiple options. So, you can
establish a relationship for a given data element and a group of category
options.

Let me try and describe the issue. We have a set of data elements related
to malaria for this example. We would like to be able to pivot the data on
other dimensions dimensions (Data element, age, disease, patient status).
Obviously there are other dimensions that are pivotable (orgunit, period,
dataset)

The data elements look like this. I have put the dimensions in square
brackets, and the dimensional elements into curly brackets.

[Data element, Age, Disease, Patient status]
Deaths Confirmed Malaria total (composed of) {All ages, Malaria Cases,
Deaths}
Deaths Confirmed Malaria 1 to Under 5 Years {1-5, Malaria Cases, Deaths}
Deaths Confirmed Malaria Over 5 Years {Over 5, Malaria Cases, Deaths}
Deaths Confirmed Malaria Under 1 Year {Under 1, Malaria Cases, Deaths}
IP Discharge Confirmed Malaria total (composed of) {All ages, Malaria
Cases, IP}
IP Discharge Confirmed Malaria 1 to Under 5 Years {1-5, Malaria Cases,
IP}
IP Discharge Confirmed Malaria Over 5 Years {Over 5, Malaria Cases,
Deaths}
IP Discharge Confirmed Malaria Under 1 Year {Under 1, Malaria Cases,
Deaths}
OPD 1st Attendance Confirmed Malaria total (composed of) {All ages,
Malaria Cases, OPD}
OPD 1st Attendance Confirmed Malaria 1 to Under 5 Years {1-5, Malaria
Cases, OPD}
OPD 1st Attendance Confirmed Malaria Over 5 Years {Over 5, Malaria Cases,
OPD}
OPD 1st Attendance Confirmed Malaria Under 1 Year {Under 1, Malaria
Cases, OPD}

OK, I hope this is pretty clear. Obviously, there are more data elements
(Typhoid, Yellow fever, etc). I might want to know how many Under 1 deaths I
have had for all diseases, or how many OPD cases I have had for each
disease. How can I do this with the existing data model? It is not obvious
to me because there is no relationship between dimensional elements
(categoryoptions) to each other. Category options can be related through a
cateogry combination, but since data elements can only be assigned a single
category option, the dimensionality is broken once it gets time to pull the
data into a pivot table.

In the incomplete example that I gave yesterday, I established a
one-to-many relationship between a data element and a dimension. If I
understand the current data model, I would have to create a separate
categorycombo for each of these data elements, and assign this
categorycombo to the data element. Now, I might be able to unfold the
dimensions using the categories and categorycombos. I it is not apparent how
the dimensional elements correspond themselves to a particular dimension, as
there is no relation for this in the database as I can see it.

As (Johan pointed out a few mails ago, if I understand him correctly)
is different categorycombo's can be created for individual data elements,
and assigned to these elements. However, this seems to be 1) incredibly
inefficient and 2) does not establish any relationship between dimensional
elements and dimensions. Perhaps it is there, and maybe it has been done in
SL, but the SQL is not apparent to me at all.

It would appear to me, looking from an SQL perspective, that a
one-to-many relationship between a data element, a dimension (category) and
dimensional element (category combo) would be much more effieicnet, and
highly usable from an SQL perspective. As I mentioned in my mail, I am not
sure how easy this would be to implement in a procedural language like Java,
but I assume it should be possible to either do it this way, or rewrite my
Postgres proprietary query in standard SQL (which there are ways to do with
ANSI SQL). This would require modification to the data model (similar to
the table I provided yesterday) and modification to the UI to allow users to
1) select a dimension (category) 2) Select a dimensional element for the
given dimension. This would populate the table with a dataelementid, a
dimensionid (categoryid) and a dimensional element (cateogryoptionid).

My gut feeling this is exactly the same functionality as has currently
been implemented for organizational units. Users can define a hierarchy for
organizational units, and then assign them to
categories/dimension/organizational group sets, decide whether the groups
are compulsory and exlusive, and then assign a particular organizational
unit to a particular group (which is analogous to a dimensional element).
Organizational group sets define the dimension, and one-to-one assignment of
an organizational unit to a particular organizational group defines which
dimensional element the organizational unit is a member of. These dimensions
can then be used in PivotTable analyses, where the orgunitgroupsets become
dimensions, and orgunitgroups become dimensional elements..

I beleive that data elements are no different than organizational units.
They should be able to be grouped into some sort of hierarchy and pivoted on
any dimension. Data elements groups establish a one-to-many relationship
between data elements and a data element group, but there is no concept of
how data element groups relate to each other.
I think this is perhaps the same concept you mention, ReportSet.

I suspect we would need to potentially rethink the entire concept of
multidimensionality if we really wanted to get it right. It would see to me
that the DHIS datamodel and associated aggregation methods have been
hardwired into aggregation across time (period) and geography (orgunit).
What we can do with PivotTables and (and OLAP) is to aggregate across any
possible dimension, slicing as you mention ,on any dimension . I am not sure
this will be so simple to implement but I think there is a way to do it,
without major modifications.

I am not sure it solves the SDMX issue. There are potential issues
related to "ragged" dimensions and how these get handled. Some data elements
might have three dimensions, while others may have more. I have not thought
about this in detail, but know it is an issue with cross-tab queries in SQL.
You normally have to know how many dimensions you are working with in order
to perform a cross-tab, but there are dynamic solutions. Perhaps this could
be dealt with somehow in SDMX.

Anyway ,I am rambling. Hope this mail helps though to push my point
further. Once I get the SQL from SL, I will see if perhaps it has been done
already, and that I am just writing long emails for nothing.

Regards,
Jason

On Fri, Sep 25, 2009 at 10:44 AM, Bob Jolliffe <bobjolliffe@gmail.com> >>>> wrote:
> Hi Jason and Johan
>
> I'm really pleased to see you having this discussion as I have been
> grappling with a similar issue which involves unravelling categories,
> category options and combos into something more familiar. I have
> reached
> similar conclusions regarding nomenclature:
>
> category = dimension
> categorycombo - I have been calling a dimension set (it bears a strong,
> and
> useful, resemblance to xslt:attribute-set)
> category option - I like your suggestion of DimensionalElement. I am
> going
> to start calling it that too.
>
> In my case I need to export (and import data) into a standard format
> called
> sdmx. So whereas in the DHIS2 native DXF we export datavalues with
> effectively three dimensions (source, period, categorycombooption) the
> last
> dimension is a sort of uber-dimension. Like a peppercorn or a cardamon
> seed, when you break it open it explodes its rich complexity of
> dimensions.
>
> In sdmx we need the dimensions exploded. So data values look like:
>
> <dataset>
> <datavalue name="TB test given" uid="44344 ...44" gender="Male"
> age="0-5"
> value="32" />
> <datavalue name="TB test given" uid="44344 ...44" gender="Female"
> age="0-5" value="38" />
> ..
> </dataset>
>
> My approach to unpicking the dimensions from the dxf file is to
> transform it
> with an xslt transformation which is still incomplete but seems to work
> well.
>
> One other nomenclatures issue which has surfaced as a result is what we
> call
> a "dataset". In DHIS2, if I understand correctly, a dataset
> corresponds
> roughly to all the dataelements which might occur on a datacollection
> form.
> If we view all dataelements as having just the three "dimensions" then
> all
> is well, but if we explode the actual dimensions then we have an
> issue. In
> the sdmx model a dataset consists only of dataelements with the same
> dimensionset. After discussing this with Ola we have reached the
> conclusion
> that we need another level of grouping, primarily for the UI - eg
> FormSet or
> ReportSet which allows us to group related datasets. But that is an
> aside
> from what you are talking about.
>
> I know that you guys can do magic with sql, but it seems that we should
> try
> to capture some of this and place it down in the datamodel API. It
> occurs
> to me that for a multidimensional dataelement we might benefit from
> some
> utility methods to retrieve slices and dices which might assist in
> constructing the pivot tables around dimensions. Does this sound like
> the
> right thing to do.
>
> Regards
> Bob
>
> 2009/9/24 <johansa@ifi.uio.no>
>>
>> Jason,
>> I will leave to others to comment the code, but I have a few
>> comments...
>>
>> > I have done a bit more thinking on this, and would like to offer
>> > some
>> > more examples up for discussion.
>> >
>> > Basically, we have a lot of data elements that are somehow related
>> > to
>> > each other, similar to my kooky example in my original mail. I
>> > assume
>> > this is fairly common throughout other HMIS systems. Here, malaria
>> > attendance is broken down into various dimensions/category by
>> > patient
>> > type (outpatient, inpatient, and deaths) and by age (under 1 ,1-5
>> > and
>> > over 5). But say you want to be able to pivot to look at outpatient,
>> > inpatient and deaths totals (i.e. summed up by age). Well, you could
>> > create a separate data element for this, but it sure would be nice
>> > to
>> > be able to Pivot the data somehow.
>>
>> In the Sierra Leone db, Edem and Romain set up views that pulled the
>> categories through into a "Category" pivot field, which you can then
>> use
>> to get what you want. Simply tick the categories (see below) you want
>> to
>> see, and group them together in excel. Maybe Edem and Romain can help
>> further here.
>>
>>
>> > Dimension ? Category
>> > Dimensional element ? Category option ? Category combo ( I think)
>>
>> The right symbol disappeared from my reply-mail here, but some
>> clarification:
>>
>> Crosstab Dimension (age AND gender) = Category combo
>> Dimension (age, gender) = Category
>> Dimensional element (inpatient, outpatient, death, under1, 1-5, and
>> over
>> 5) = Category option
>>
>> So by assigning a DE the category combo of "gender_age", you get 9
>> dimensional elements, 3 category options (in category age) by 3
>> category
>> options (in category gender)
>>
>> Johan
>>
>>
>>
>>
>> > Anyway, here is the helper table I created.
>> >
>> > CREATE TABLE test_dataelementcategorycombo
>> > (
>> > test_dataelementid integer NOT NULL,
>> > test_dataelementcategoryid integer NOT NULL,
>> > test_dataelementcategorycomboid integer NOT NULL,
>> > CONSTRAINT pk_testdataelementcategory PRIMARY KEY
>> > (test_dataelementid, test_dataelementcategoryid,
>> > test_dataelementcategorycomboid)
>> > )
>> > WITH (OIDS=FALSE);
>> >
>> > So this is a real simple table which references a data element, a
>> > data element category, and a data element combo. The reference to a
>> > data element category may be redundant, but anyway, lets leave it in
>> > for now.
>> >
>> > I populated the table with some data, which will be used to assign
>> > dimensions to data elements. It looks like this in my DB, which
>> > looks
>> > like this.
>> >
>> > 309;25250;25251
>> > 309;25257;25255
>> > 348;25250;25252
>> > 348;25257;25255
>> > 455;25250;25253
>> > 455;25257;25255
>> >
>> > but of course this is meaningless to you. What do these values
>> > correspond
>> > to?
>> >
>> > "OPD 1st Attendance Clinical Case of Malaria Under 1
>> > Year";"Age";"Under
>> > 1"
>> > "OPD 1st Attendance Clinical Case of Malaria 1 to Under 5
>> > Years";"Age";"Age 1-5"
>> > "OPD 1st Attendance Clinical Case of Malaria Over 5
>> > Years";"Age";"Over
>> > 5"
>> > "OPD 1st Attendance Clinical Case of Malaria Under 1 Year";"Patient
>> > status";"OPD"
>> > "OPD 1st Attendance Clinical Case of Malaria 1 to Under 5
>> > Years";"Patient status";"OPD"
>> > "OPD 1st Attendance Clinical Case of Malaria Over 5 Years";"Patient
>> > status";"OPD"
>> >
>> > which can be produced by the following view.
>> >
>> > CREATE OR REPLACE VIEW vw_dataelements_dimensions AS
>> > SELECT dataelement.name, dataelementcategory.name AS dimension,
>> > dataelementcategoryoption.name AS dimension_element
>> > FROM dataelement
>> > JOIN test_dataelementcategorycombo ON
>> > test_dataelementcategorycombo.test_dataelementid =
>> > dataelement.dataelementid
>> > JOIN dataelementcategory ON dataelementcategory.categoryid =
>> > test_dataelementcategorycombo.test_dataelementcategoryid
>> > JOIN dataelementcategoryoption ON
>> > test_dataelementcategorycombo.test_dataelementcategorycomboid =
>> > dataelementcategoryoption.categoryoptionid;
>> >
>> > So, that view just provides a human readable view of those integers
>> > that I populated in the the test_dataelementcategorycombo table I
>> > created above. This table just assigns particular data elements to
>> > different category options (dimensional elements).
>> >
>> > OK, so far so good, but the problem now is, how to use this with the
>> > aggregatedatavalue table? If we try and join this table directly, we
>> > will have issues with duplicates in the pivot table, so we need to
>> > transform the data slightly.
>> >
>> > This should do the trick.
>> >
>> > SELECT * FROM crosstab
>> > (
>> > 'SELECT name, dimension, dimension_element FROM
>> > vw_dataelements_dimensions ORDER BY 1,2,3',
>> > 'SELECT DISTINCT dimension from vw_dataelements_dimensions ORDER BY
>> > 1
>> > ASC'
>> > )
>> > as
>> > (
>> > name character varying(230),
>> > age character varying(160),
>> > status character varying(160)
>> > );
>> >
>> >
>> > which returns this record set
>> >
>> > "OPD 1st Attendance Clinical Case of Malaria 1 to Under 5
>> > Years";"Age
>> > 1-5";"OPD"
>> > "OPD 1st Attendance Clinical Case of Malaria Over 5 Years";"Over
>> > 5";"OPD"
>> > "OPD 1st Attendance Clinical Case of Malaria Under 1 Year";"Under
>> > 1";"OPD"
>> >
>> >
>> > OK, admittedly, I cheated a bit and used the crosstab function of
>> > Postgresql, but I assume that this query could be rewritten with a
>> > few
>> > more lines of code in standard SQL or some procedural language like
>> > Java. Now, this record set looks like something that I can almost
>> > use
>> > with the aggregateddatavalue table simply by joining up the table on
>> > the appropriate dataelementid and pulling everything into a pivot
>> > table. I would not have any duplicated values and would have
>> > columns
>> > like data element name, period, orgunit, age, patient status and of
>> > course the value of the data element. I hope that part is pretty
>> > clear. Just join up that table to the aggregateddata table, and you
>> > have pretty much what is needed to pull the data directly into a
>> > PivotTable for further analysis.
>> >
>> > This is not a complete example, but it is very close to what I need
>> > here ,and I think this type of functionality would be much more
>> > useful
>> > than the current data element categories functionality. Basically,
>> > all
>> > that would be required, at least initially, would be another user
>> > interface screen to allow the definition of which category(ies) and
>> > category options a data element is a member of. The rest could ,in
>> > the
>> > first instance be executed with custom SQL (obviously, I am partial
>> > to
>> > this language and hobbled by the fact that I do not know Java), but
>> > eventually this would need to be implemented somehow in Java.
>> >
>> > I am not sure if this really solves all of the issues surrounding
>> > multidimensional analysis of data elements, but it seems to solve
>> > the
>> > issues that I am having by trying to assign some sort of dimensional
>> > hierarchy to data elements (similar to the exclusive/compulsory
>> > functionality of orgunits). Any thoughts on this?
>> >
>> > Best regards,
>> > Jason
>> >
>> >
>> >
>> >
>> > On Wed, Sep 16, 2009 at 10:28 PM, Jason Pickering >>>> >> > <jason.p.pickering@gmail.com> wrote:
>> >>
>> >>
>> >> On Wed, Sep 16, 2009 at 10:13 PM, <johansa@ifi.uio.no> wrote:
>> >>>
>> >>> >> However, there does seem to be the ability to assign
>> >>> >> dimensions,
>> >>> there
>> >>> >> does
>> >>> >> not seem to be the ability to assign particular elements within
>> >>> those
>> >>> >> dimensions to a particular DHIS data element.
>> >>>
>> >>>
>> >>> Just some more clarification here: you can make category combos
>> >>> which
>> >>> you
>> >>> assign to data elements. However, it is not possible to assign
>> >>> just
>> >>> specific parts of a category combo (only some of the category
>> >>> options)
>> >>> to
>> >>> a data element.
>> >>
>> >> Yes, this was exactly what I wanted. Assigning different categories
>> >> would
>> >> seem to break the dimensionality.
>> >>
>> >>>
>> >>> Then you must make a specific category (as the only one in
>> >>> or part of a new category combo) with just those options. It can
>> >>> be
>> >>> hell;
>> >>> in Tajikistan there were way over 20 categories I think, at least
>> >>> 10
>> >>> just
>> >>> on various age groups.
>> >>>
>> >>> Johan
>> >>>
>> >>
>> >> This was my fear.
>> >>
>> >> I will need to do some testing and see. I still fear it is not
>> >> exactly
>> >> the
>> >> intended functionality.
>> >>
>> >> Basically, I think I need something akin to the
>> >> exclusive/compulsory
>> >> groups
>> >> that are in place for organizational units, but instead, for
>> >> arbitrary
>> >> dimensions. I will give a try and see what happens.
>> >>
>> >> Thanks,
>> >> Jason
>> >>
>> >>
>> >>
>> >>
>> >>
>> >
>>
>>
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~dhis2-devs
>> Post to : dhis2-devs@lists.launchpad.net
>> Unsubscribe : https://launchpad.net/~dhis2-devs
>> More help : https://help.launchpad.net/ListHelp
>
>

_______________________________________________
Mailing list: https://launchpad.net/~dhis2-devs
Post to : dhis2-devs@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-devs
More help : https://help.launchpad.net/ListHelp

_______________________________________________
Mailing list: https://launchpad.net/~dhis2-devs
Post to : dhis2-devs@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-devs
More help : https://help.launchpad.net/ListHelp

--
Cheers,
Knut Staring

olatitle · 29 September 2009 08:22

Hi,

Thanks for the explanations Jason. The multidimensional model is quite complicated, is poorly documented, and as you say is DHIS-centric in the way that it is built around the DHIS notion of a Data Element.

After the discussion with Jason yesterday, and also from the discussions we all had here in this thread it became clear to me that we need to provide multi-dimensionality to data elements (and also to indicators I think) that are completely independent of data entry and data value storage.

Our current model of data element categories is really designed for simplifying data entry and storage of large amounts of data elements with the same dimensions, it grew out of the ICD-based forms in Ethiopia remember. I still think there is a lot to gain by this model when it comes to simplifying creation of large datasets, generating grid based data entry forms, and in more effective persistence of data elements and values, but I now I am also quite sure that this model alone is not enough to provide flexible multidimensional data analysis. There are lots of use cases where you would want to analyse data on dimensions that cut across data entry forms and the typical dimensional data sets used for data collection, and in stead of complicating the data entry and storage by adding more flexibility on this model, I think we need to have this functionality independent of how we store data values.

Data element groups provide some of this functionality and is meant to help group, filter, and to some extent provide dimensionality to your data during data analysis. Groups are completely independent of the data values and can be modified at any time without changing the raw data in DataValue. A name change to a category option or regrouping of options within a category however, directly modifies the data values as well, so that model is more fixed and targeting data collection and storage.

Data Element Group Sets as dimensions and Data Elements as dimensional elementsData element groups are currently a flat grouping structure ala assigning a category option to a data element (without having the category). Just like we have for orgunit groups (as Jason pointed out some days ago) we need group sets for data elements as well. It is not a revolutionary thought, and it has been on the wish list for many years, but the usage and real need for it has not been clear (which is why it never was introduced in 1.4 either, although also there it was on the design table at some point). See below for an example of how this would play out.

Jason’s requirements from Zambia, and these are not unique in any way, show a real need for this and with a growing number of 1.4 + 2 hybrid set ups I think this functionality will be more and more important. When applying DHIS 2 as a national or provincial web based system on top of many stand alone 1.4 installations, the category functionality, which is not supported in 1.4, becomes useless as data elements as data will be imported from 1.4. In such hybrid setups the role of DHIS 2 as a web based data warehouse + data analysis tool (incl. GIS) is even stronger than in DHIS2 only setups, and we need to provide multidimensional data analysis to this scenario.

That said, and I think Jason already has made a strong case for this, also in a 100% DHIS2 scenario you will need more flexibility in defining dimensions to your data than what categories can provide. Being able to define data dimensions independent of data collection is powerful and should be supported in a better way than what data element groups provide today. Given that we already have the orgunit group set code in place I would assume that adding group sets to data elements could be a relatively straight forward thing to do (but then again, I am not the programmer…).

To add to my argument of separating data entry from analysis: the data elements listed in my example below, although they look very streamlined would come from at least 2 different data entry forms as outpatients (OPD) and in patients (IP) are treated in separate locations. The IP data elements would most likely exist in a data set with other dimensions to it (like admission, discharge, death) not applicable to OPD elements, and the OPD data elements would most likely have (1st attendance, repeated visit) that are not used in IP. The patient status dimension is not used for data entry (as all data elements in one form would be either IP or OPD), and cuts across data elements from different data sets and forms, while Age is reused in data collection across different datasets.

In a DHIS 2 only setup where data element categories are used, a pivot table could then pull the Age dimension from a data element category, but the patient status would have to be a data element group set as it is not part of any data set in particular (but reflects types of data across many of them). The disease dimension is even more tricky as it would normally be part of the data element name, but since DHIS collects and uses data elements for more than just disease names, a separation out into a data element goup set called “Disease” would simplify the data analysis.

This shows how complex the use cases are and I believe build an argument to provide both categories and data element group sets.

I will put this into a blueprint (I believe the old request for this got lost when Trac sunk a year ago…).

Sorry for the very long email, but to summarise:

DHIS 2 needs data element group set functionality as soon as possible, as this will

add more flexibility to dimensional data analysis in DHIS by separating it from data entry and data value
help all 1.4 + 2 set ups
make Jason very happy

Ola

···

**And here is the example:**The flat data element names:
“Malaria death <5 year”
“Malaria death >5 year”
“Malaria in OPD 1st attendance <5 year”

“Malaria in OPD 1st attendance >5 year”
“Malaria IP discharge <5 year”
“Malaria IP discharge >5 year”
“Typhoid death <5 year”

“Typhoid death >5 year”

etc.
(OPD is outpatient, patients treated at the clinic, IP is inpatient meaning patients that was admitted to a hospital).

There are three dimensions in the data elements above, so I define three data element group sets:

Disease, Patient Status, and Age.
I also define 7 new data element groups (Malaria, Typhoid, <5, >5, Death, OPD, IP) and assign these groups to the group set they belong to:
Disease (Malaria, Typhoid)
Patient Status (Death, OPD, IP)

Age (<5, >5)

I then assign the data element groups to the data elements
“Malaria death <5 year” assigned to “Malaria”, “Death”, and “<5”.
etc.

All these groupings can exist completely independent of data entry and be changed at any time.

From this I can generate a new resource table for my data analysis (similar to the one we already have for orgunit group sets) that provides:
Data Element Group Set, Data Element Group, Data Element
“Disease”, “Malaria”, “Malaria death <5 year”,

“Disease”, “Typhoid”, “Typhoid death <5 year”
“Patient Status”, “Death”, “Malaria death <5 year”
etc.

When joining the above table with an aggregated data value table you can define a pivot table with your three data element group sets as columns (pivot fields) and analyse the data across these three dimensions. The data element name dimension can then be completely hidden in the analysis.

Ola

2009/9/29 Jason Pickering jason.p.pickering@gmail.com

I think Ola is going to write up something on this as well, but I

wanted to pre-empt him and offer some explanation and perhaps an

apology, as I started all of this. Ola and I had a long and productive

chat yesterday afternoon, where we went thought in detail the issue we

have been discussing over the past few days. We have somewhat of a

hybrid set-up here in Zambia, with DHIS 1.4 being used as the primary

means of data collection in the districts. Data is then imported on a

regular basis into DHIS 2, which contains additional data sets that

are not part of the routine data collected with DHIS 1.4. As Ola has

explained in detail to me now, the category options/combos

(multidimensional data elements) that have been implemented in DHiS2

are for a specific case when there is a “master” data element, such as

Malaria cases, and multiple categories and options (Age, Patient

status, etc) that make the data element multi-dimensional. I did not

realize this when this discussion began, thinking that somehow, I

could assign dimensionality to a plain-old data element (PODE??) with

the categories tables.

Our data elements have been created and imported directly from 1.4.

DHIS 1.4 has a very flat model. Each data element must be defined

separately, for each level of disaggregation (which have been included

as examples I have provided through my mails). The important thing

that I realized today during our conversation is that I need to assign

some dimensionality to my data elements, that is completely

independent of cateogries/cateogry options. I have been trying to use

the category combos/options to do this, but it was apparently not the

intended use.

I am going to continue to experiment with the categorycombos/options

as there are still a few issues I am not comfortable with, but it the

meantime, I guess we need to find another solution of how to assign

dimensionality to non-multidimensional data elements (i.e. those

without categories).

Regards,

Jason

On Mon, Sep 28, 2009 at 3:00 PM, Jason Pickering > jason.p.pickering@gmail.com wrote:

The link that you sent seems reasonable, but this is not what has been

implemented in DHIS 2 at this point as far as I can tell. In the

current model (at least through the UI), I have created two categories

(Age and Patient status). I then created a categorycombo

Age_Patientstatus (Age and Patient status). I do this because

conceptually, Age is a dimension, and each of the member of the

category should correspond to a dimensional element (Under 1, 1-5 and

Over 5 for Age). This is necessary because in the SQL view, I need to

have a single column for each dimension, populated with the

appropriate dimensional elements.

query did not work for me but looking at the source of the query, I

assume this is what is supposed to happen.

If I then assign these categorycombos to my data elements, I already

know it is not going to work, because I have no idea which one of the

categorycombooptions is applicable to a particular data element. I

suppose this is why I would need to create a categorycombo with

exactly one option in each category, which again, is not desired. Each

category should be able to have multiple options.

Now, I do see some light.

Now, looking at the current database, when I generate the resource

table, this is what i get back in categoryoptioncombo name

25270;25260;“(Over 5,IPD,)”

25271;25261;“(Over 5,Deaths,)”

25272;25262;“(Over 5,OPD,)”

25273;25263;“(Under 1,IPD,)”

25274;25264;“(Under 1,Deaths,)”

25275;25265;“(Under 1,OPD,)”

25277;25267;“(Age 1-5,Deaths,)”

25278;25268;“(Age 1-5,OPD,)”

;;“”

25276;25266;“(Age 1-5,IPD,)”

Now, this looks very much like what I need in my PivotTable source

query, which I think is what the query that Johan just sent is

supposed to provide. (The query did not work, but I assume this is

what it is meant to happen).

The problem is now, I have no idea (at least directly) that the first

set of values corresponds to Age and the second set of values

corresponds to Patient status.

If I could assign a data element a categoryoptioncomboid (the second

number in that result set above) instead of a categorycomboid (as is

the case now) I think I would be able to produce the result set that I

actually want. However, by assigning a data element the

categorycomboid , I can only tell which dimensions the data element

has, but not which particular dimensional elements it possesses.

So, perhaps you are right that there is no need for any changes to the

data model, but rather the assignment that I mention above.

Johan, thanks for the query. I will see if I can get it to work.

Best regards,

Jason

On Fri, Sep 25, 2009 at 12:07 PM, Knut Staring knutst@gmail.com wrote:

http://208.76.222.114/confluence/display/RandD/General+multi-dimensional+model

On Fri, Sep 25, 2009 at 11:55 AM, Abyot Gizaw abyota@gmail.com wrote:

The one-to-one relationship mentioned between dataelement and

categorycombo is not correct !

The realtionship is one-to-many. A categorycombo can be assigned for many

dataelements. But a dataelement can have only one categorycombo.

Thank you

Abyot.

On Fri, Sep 25, 2009 at 11:44 AM, Jason Pickering > > >>> jason.p.pickering@gmail.com wrote:

Hi there.

My basic issue with the category/category combo is that it appears to be

a one-to-one relationship with data elements. If I look at the data model,

there is a one-to-one relationship between dataelement and categorycomboid.

For a given category combo, you can have multiple options. So, you can

establish a relationship for a given data element and a group of category

options.

Let me try and describe the issue. We have a set of data elements related

to malaria for this example. We would like to be able to pivot the data on

other dimensions dimensions (Data element, age, disease, patient status).

Obviously there are other dimensions that are pivotable (orgunit, period,

dataset)

The data elements look like this. I have put the dimensions in square

brackets, and the dimensional elements into curly brackets.

[Data element, Age, Disease, Patient status]

Deaths Confirmed Malaria total (composed of) {All ages, Malaria Cases,

Deaths}

Deaths Confirmed Malaria 1 to Under 5 Years {1-5, Malaria Cases, Deaths}

Deaths Confirmed Malaria Over 5 Years {Over 5, Malaria Cases, Deaths}

Deaths Confirmed Malaria Under 1 Year {Under 1, Malaria Cases, Deaths}

IP Discharge Confirmed Malaria total (composed of) {All ages, Malaria

Cases, IP}

IP Discharge Confirmed Malaria 1 to Under 5 Years {1-5, Malaria Cases,

IP}

IP Discharge Confirmed Malaria Over 5 Years {Over 5, Malaria Cases,

Deaths}

IP Discharge Confirmed Malaria Under 1 Year {Under 1, Malaria Cases,

Deaths}

OPD 1st Attendance Confirmed Malaria total (composed of) {All ages,

Malaria Cases, OPD}

OPD 1st Attendance Confirmed Malaria 1 to Under 5 Years {1-5, Malaria

Cases, OPD}

OPD 1st Attendance Confirmed Malaria Over 5 Years {Over 5, Malaria Cases,

OPD}

OPD 1st Attendance Confirmed Malaria Under 1 Year {Under 1, Malaria

Cases, OPD}

OK, I hope this is pretty clear. Obviously, there are more data elements

(Typhoid, Yellow fever, etc). I might want to know how many Under 1 deaths I

have had for all diseases, or how many OPD cases I have had for each

disease. How can I do this with the existing data model? It is not obvious

to me because there is no relationship between dimensional elements

(categoryoptions) to each other. Category options can be related through a

cateogry combination, but since data elements can only be assigned a single

category option, the dimensionality is broken once it gets time to pull the

data into a pivot table.

In the incomplete example that I gave yesterday, I established a

one-to-many relationship between a data element and a dimension. If I

understand the current data model, I would have to create a separate

categorycombo for each of these data elements, and assign this

categorycombo to the data element. Now, I might be able to unfold the

dimensions using the categories and categorycombos. I it is not apparent how

the dimensional elements correspond themselves to a particular dimension, as

there is no relation for this in the database as I can see it.

As (Johan pointed out a few mails ago, if I understand him correctly)

is different categorycombo’s can be created for individual data elements,

and assigned to these elements. However, this seems to be 1) incredibly

inefficient and 2) does not establish any relationship between dimensional

elements and dimensions. Perhaps it is there, and maybe it has been done in

SL, but the SQL is not apparent to me at all.

It would appear to me, looking from an SQL perspective, that a

one-to-many relationship between a data element, a dimension (category) and

dimensional element (category combo) would be much more effieicnet, and

highly usable from an SQL perspective. As I mentioned in my mail, I am not

sure how easy this would be to implement in a procedural language like Java,

but I assume it should be possible to either do it this way, or rewrite my

Postgres proprietary query in standard SQL (which there are ways to do with

ANSI SQL). This would require modification to the data model (similar to

the table I provided yesterday) and modification to the UI to allow users to

select a dimension (category) 2) Select a dimensional element for the

given dimension. This would populate the table with a dataelementid, a

dimensionid (categoryid) and a dimensional element (cateogryoptionid).

My gut feeling this is exactly the same functionality as has currently

been implemented for organizational units. Users can define a hierarchy for

organizational units, and then assign them to

categories/dimension/organizational group sets, decide whether the groups

are compulsory and exlusive, and then assign a particular organizational

unit to a particular group (which is analogous to a dimensional element).

Organizational group sets define the dimension, and one-to-one assignment of

an organizational unit to a particular organizational group defines which

dimensional element the organizational unit is a member of. These dimensions

can then be used in PivotTable analyses, where the orgunitgroupsets become

dimensions, and orgunitgroups become dimensional elements…

I beleive that data elements are no different than organizational units.

They should be able to be grouped into some sort of hierarchy and pivoted on

any dimension. Data elements groups establish a one-to-many relationship

between data elements and a data element group, but there is no concept of

how data element groups relate to each other.

I think this is perhaps the same concept you mention, ReportSet.

I suspect we would need to potentially rethink the entire concept of

multidimensionality if we really wanted to get it right. It would see to me

that the DHIS datamodel and associated aggregation methods have been

hardwired into aggregation across time (period) and geography (orgunit).

What we can do with PivotTables and (and OLAP) is to aggregate across any

possible dimension, slicing as you mention ,on any dimension . I am not sure

this will be so simple to implement but I think there is a way to do it,

without major modifications.

I am not sure it solves the SDMX issue. There are potential issues

related to “ragged” dimensions and how these get handled. Some data elements

might have three dimensions, while others may have more. I have not thought

about this in detail, but know it is an issue with cross-tab queries in SQL.

You normally have to know how many dimensions you are working with in order

to perform a cross-tab, but there are dynamic solutions. Perhaps this could

be dealt with somehow in SDMX.

Anyway ,I am rambling. Hope this mail helps though to push my point

further. Once I get the SQL from SL, I will see if perhaps it has been done

already, and that I am just writing long emails for nothing.

Regards,

Jason

On Fri, Sep 25, 2009 at 10:44 AM, Bob Jolliffe bobjolliffe@gmail.com > > >>>> wrote:

Hi Jason and Johan

I’m really pleased to see you having this discussion as I have been

grappling with a similar issue which involves unravelling categories,

category options and combos into something more familiar. I have

reached

similar conclusions regarding nomenclature:

category = dimension

categorycombo - I have been calling a dimension set (it bears a strong,

and

useful, resemblance to xslt:attribute-set)

category option - I like your suggestion of DimensionalElement. I am

going

to start calling it that too.

In my case I need to export (and import data) into a standard format

called

sdmx. So whereas in the DHIS2 native DXF we export datavalues with

effectively three dimensions (source, period, categorycombooption) the

last

dimension is a sort of uber-dimension. Like a peppercorn or a cardamon

seed, when you break it open it explodes its rich complexity of

dimensions.

In sdmx we need the dimensions exploded. So data values look like:

<datavalue name=“TB test given” uid=“44344 …44” gender=“Male”

age=“0-5”

value=“32” />

<datavalue name=“TB test given” uid=“44344 …44” gender=“Female”

age=“0-5” value=“38” />

…

My approach to unpicking the dimensions from the dxf file is to

transform it

with an xslt transformation which is still incomplete but seems to work

well.

One other nomenclatures issue which has surfaced as a result is what we

call

a “dataset”. In DHIS2, if I understand correctly, a dataset

corresponds

roughly to all the dataelements which might occur on a datacollection

form.

If we view all dataelements as having just the three “dimensions” then

all

is well, but if we explode the actual dimensions then we have an

issue. In

the sdmx model a dataset consists only of dataelements with the same

dimensionset. After discussing this with Ola we have reached the

conclusion

that we need another level of grouping, primarily for the UI - eg

FormSet or

ReportSet which allows us to group related datasets. But that is an

aside

from what you are talking about.

I know that you guys can do magic with sql, but it seems that we should

try

to capture some of this and place it down in the datamodel API. It

occurs

to me that for a multidimensional dataelement we might benefit from

some

utility methods to retrieve slices and dices which might assist in

constructing the pivot tables around dimensions. Does this sound like

the

right thing to do.

Regards

Bob

2009/9/24 johansa@ifi.uio.no

Jason,

I will leave to others to comment the code, but I have a few

comments…

I have done a bit more thinking on this, and would like to offer

some

more examples up for discussion.

Basically, we have a lot of data elements that are somehow related

to

each other, similar to my kooky example in my original mail. I

assume

this is fairly common throughout other HMIS systems. Here, malaria

attendance is broken down into various dimensions/category by

patient

type (outpatient, inpatient, and deaths) and by age (under 1 ,1-5

and

over 5). But say you want to be able to pivot to look at outpatient,

inpatient and deaths totals (i.e. summed up by age). Well, you could

create a separate data element for this, but it sure would be nice

to

be able to Pivot the data somehow.

In the Sierra Leone db, Edem and Romain set up views that pulled the

categories through into a “Category” pivot field, which you can then

use

to get what you want. Simply tick the categories (see below) you want

to

see, and group them together in excel. Maybe Edem and Romain can help

further here.

Dimension ? Category

Dimensional element ? Category option ? Category combo ( I think)

The right symbol disappeared from my reply-mail here, but some

clarification:

Crosstab Dimension (age AND gender) = Category combo

Dimension (age, gender) = Category

Dimensional element (inpatient, outpatient, death, under1, 1-5, and

over

= Category option

So by assigning a DE the category combo of “gender_age”, you get 9

dimensional elements, 3 category options (in category age) by 3

category

options (in category gender)

Johan

Anyway, here is the helper table I created.

CREATE TABLE test_dataelementcategorycombo

(

test_dataelementid integer NOT NULL,

test_dataelementcategoryid integer NOT NULL,

test_dataelementcategorycomboid integer NOT NULL,

CONSTRAINT pk_testdataelementcategory PRIMARY KEY

(test_dataelementid, test_dataelementcategoryid,

test_dataelementcategorycomboid)

)

WITH (OIDS=FALSE);

So this is a real simple table which references a data element, a

data element category, and a data element combo. The reference to a

data element category may be redundant, but anyway, lets leave it in

for now.

I populated the table with some data, which will be used to assign

dimensions to data elements. It looks like this in my DB, which

looks

like this.

309;25250;25251

309;25257;25255

348;25250;25252

348;25257;25255

455;25250;25253

455;25257;25255

but of course this is meaningless to you. What do these values

correspond

to?

"OPD 1st Attendance Clinical Case of Malaria Under 1

Year";“Age”;"Under

1"

"OPD 1st Attendance Clinical Case of Malaria 1 to Under 5

Years";“Age”;“Age 1-5”

"OPD 1st Attendance Clinical Case of Malaria Over 5

Years";“Age”;"Over

5"

“OPD 1st Attendance Clinical Case of Malaria Under 1 Year”;"Patient

status";“OPD”

"OPD 1st Attendance Clinical Case of Malaria 1 to Under 5

Years";“Patient status”;“OPD”

“OPD 1st Attendance Clinical Case of Malaria Over 5 Years”;"Patient

status";“OPD”

which can be produced by the following view.

CREATE OR REPLACE VIEW vw_dataelements_dimensions AS

SELECT dataelement.name, dataelementcategory.name AS dimension,

dataelementcategoryoption.name AS dimension_element

FROM dataelement

JOIN test_dataelementcategorycombo ON

test_dataelementcategorycombo.test_dataelementid =

dataelement.dataelementid

JOIN dataelementcategory ON dataelementcategory.categoryid =

test_dataelementcategorycombo.test_dataelementcategoryid

JOIN dataelementcategoryoption ON

test_dataelementcategorycombo.test_dataelementcategorycomboid =

dataelementcategoryoption.categoryoptionid;

So, that view just provides a human readable view of those integers

that I populated in the the test_dataelementcategorycombo table I

created above. This table just assigns particular data elements to

different category options (dimensional elements).

OK, so far so good, but the problem now is, how to use this with the

aggregatedatavalue table? If we try and join this table directly, we

will have issues with duplicates in the pivot table, so we need to

transform the data slightly.

This should do the trick.

SELECT * FROM crosstab
  (
'SELECT name, dimension, dimension_element FROM

vw_dataelements_dimensions ORDER BY 1,2,3’,

'SELECT DISTINCT dimension from vw_dataelements_dimensions ORDER BY

1

ASC’
  )
as

(

name character varying(230),

age character varying(160),

status character varying(160)

);

which returns this record set

"OPD 1st Attendance Clinical Case of Malaria 1 to Under 5

Years";"Age

1-5";“OPD”

“OPD 1st Attendance Clinical Case of Malaria Over 5 Years”;"Over

5";“OPD”

“OPD 1st Attendance Clinical Case of Malaria Under 1 Year”;"Under

1";“OPD”

OK, admittedly, I cheated a bit and used the crosstab function of

Postgresql, but I assume that this query could be rewritten with a

few

more lines of code in standard SQL or some procedural language like

Java. Now, this record set looks like something that I can almost

use

with the aggregateddatavalue table simply by joining up the table on

the appropriate dataelementid and pulling everything into a pivot

table. I would not have any duplicated values and would have

columns

like data element name, period, orgunit, age, patient status and of

course the value of the data element. I hope that part is pretty

clear. Just join up that table to the aggregateddata table, and you

have pretty much what is needed to pull the data directly into a

PivotTable for further analysis.

This is not a complete example, but it is very close to what I need

here ,and I think this type of functionality would be much more

useful

than the current data element categories functionality. Basically,

all

that would be required, at least initially, would be another user

interface screen to allow the definition of which category(ies) and

category options a data element is a member of. The rest could ,in

the

first instance be executed with custom SQL (obviously, I am partial

to

this language and hobbled by the fact that I do not know Java), but

eventually this would need to be implemented somehow in Java.

I am not sure if this really solves all of the issues surrounding

multidimensional analysis of data elements, but it seems to solve

the

issues that I am having by trying to assign some sort of dimensional

hierarchy to data elements (similar to the exclusive/compulsory

functionality of orgunits). Any thoughts on this?

Best regards,

Jason

On Wed, Sep 16, 2009 at 10:28 PM, Jason Pickering > > >>>> >> > jason.p.pickering@gmail.com wrote:

On Wed, Sep 16, 2009 at 10:13 PM, johansa@ifi.uio.no wrote:

However, there does seem to be the ability to assign

dimensions,

there

does

not seem to be the ability to assign particular elements within

those

dimensions to a particular DHIS data element.

Just some more clarification here: you can make category combos

which

you

assign to data elements. However, it is not possible to assign

just

specific parts of a category combo (only some of the category

options)

to

a data element.

Yes, this was exactly what I wanted. Assigning different categories

would

seem to break the dimensionality.

Then you must make a specific category (as the only one in

or part of a new category combo) with just those options. It can

be

hell;

in Tajikistan there were way over 20 categories I think, at least

10

just

on various age groups.

Johan

This was my fear.

I will need to do some testing and see. I still fear it is not

exactly

the

intended functionality.

Basically, I think I need something akin to the

exclusive/compulsory

groups

that are in place for organizational units, but instead, for

arbitrary

dimensions. I will give a try and see what happens.

Thanks,

Jason

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

–

Cheers,

Knut Staring

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Lars · 29 September 2009 08:53

Thanks for the explanations Jason. The multidimensional model is quite complicated, is poorly documented, and as you say is DHIS-centric in the way that it is built around the DHIS notion of a Data Element.

Could we assemble and put some of the text being written on the list to docbook?

That said, and I think Jason already has made a strong case for this, also in a 100% DHIS2 scenario you will need more flexibility in defining dimensions to your data than what categories can provide. Being able to define data dimensions independent of data collection is powerful and should be supported in a better way than what data element groups provide today. Given that we already have the orgunit group set code in place I would assume that adding group sets to data elements could be a relatively straight forward thing to do (but then again, I am not the programmer…).

I don’t see any implications in adding this to the system, it won’t require changes to the existing model as the association goes from the groupset to the groups. We can prioritize this for the 2.0.3 release.

bobj · 29 September 2009 10:03

Hi

On the back of Jason and others comments, I’ve reached the conclusion that we cannot really live with the MD model the way it is. Whereas I think it is (just about) workable there are some serious optimizations we can and should do. I am going to put my other work back a day or two and propose some changes in a branch.

I think central to the inefficiency is the many-many relation between categories and categoryoptions. This strikes me as illogical as well as being cumbersome in the UI. Do we really want to be able to make categories with options like {‘0<5’,‘6-10’,‘Male’,‘Out of stock’,‘35-40’}. Reducing the relation between categories and category options to 1-n cuts two tables, should make sql queries more efficient and grokkable and also matches other models such as sdmx better.

The other possiible inefficiency is the dimensionset. It can be useful in some contexts but I’m guessing that when querying the data (which we want to be fast) it is not relevant. A dataelement can have dimensions. The fact that some dataelements have the same combinations of dimensions is very useful to know for some purposes, but it should be possible to get from the dataelement to the dimension directly.

On the other side of the road is the hierarchical dimensionality idea I see Ola and Jason have been discussing, where dimensions are composed (perhaps post-facto) of uni-dimensional dataelements rather than decomposed into pre-structured dimensional elements. I suspect that:

we need both; and
from the API, user and reporting perspective they should look the same (ie a dataelement can have dimensions - how they come about should not be a concern at the end point).

I’ll try out some of these ideas and point you to the branch.

Regards
Bob

···

2009/9/29 Lars Helge Øverland larshelge@gmail.com

Thanks for the explanations Jason. The multidimensional model is quite complicated, is poorly documented, and as you say is DHIS-centric in the way that it is built around the DHIS notion of a Data Element.

Could we assemble and put some of the text being written on the list to docbook?

That said, and I think Jason already has made a strong case for this, also in a 100% DHIS2 scenario you will need more flexibility in defining dimensions to your data than what categories can provide. Being able to define data dimensions independent of data collection is powerful and should be supported in a better way than what data element groups provide today. Given that we already have the orgunit group set code in place I would assume that adding group sets to data elements could be a relatively straight forward thing to do (but then again, I am not the programmer…).

I don’t see any implications in adding this to the system, it won’t require changes to the existing model as the association goes from the groupset to the groups. We can prioritize this for the 2.0.3 release.

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

bobj · 29 September 2009 14:45

OK. Here’s my first attempt to rationalize things. Please excuse the attachments. I try not to send attachments to mailing lists but these are at least fairly small. (And Lars I will write it up in docbook after fishing for feedback).

My primary aim has been to disturb the existing model as little as possible whilst trying to simplify wherever possible.

Attached oldmodel.png shows the participants in the existing model. As you can see there are 11 tables in all. I haven’t showed the relations as it becomes a bit of a web.

Also attached is a proposed amended database model which bears sufficient similarity to the old that migration between the two should be feasible. But it is down to 6 tables. And I have named the tables according to the terms we have been discussing. Of course this is just the database model. I’ve also put together an XML view of what some sample dataset might look like. There is also a UML model required which would be richer than the underlying datamodel, but one step at a time …

Walking through:

DataElements can have Dimensions. And different dataElements can (and hopefully will) share some of the same Dimensions. So there is a m-to-n relationship between the two necessitating an extra table (DataElementDimensions). An example of a Dimension is SEX. Nothing new here.
Dimensions have DimensionElements. So SEX for example might have DimensionElements “Male”, “Female”, “Unknown”. A big difference from the old model is that there is 1-n relationship between DimensionElements and Dimensions. A Dimension has many DimensionElements. But a DimensionElement is a a member of only one Dimension.
DataValues represent the values at intersection of these Dimensions. Keeping with the spirit of the old model this intersection is represented by a single key, DimensionElementCombination. The DimensionElementCombinations would be populated when a new Dimension is added to a DataElement. Like the original model there is some fragility here. Changing dimensions on dataelements could create a situation where datavalues become orphaned or misdirected. The API must have robust methods for defending this integrity particulalrly when updating the structural metadata. But this is perhaps doable. Either way its not worse than we have.

I haven’t given a name to DimensionElementCombinations. From the examples I have seen from SL this seems to be unnecessary. The names I have seen being used are generally simply contrived from the dimensions or (worse still) from the categoryoptions. What is important is that dataelements can have sets of dimensions.

And then much of what is different is just a renaming of the original entities. From the attached XML file I think you can see some of the issues faced re names and identifiers. I find myself following a sort of convention of CODE, Name, Description and UUID. CODE’s must be unique within the scope of the database. I suppose this is close to what we currently call ShortName. I would like to place constraints on CODES in terms of length and also the disallowing of spaces and other funny characters. The reason being that we may well have to use these codes in making up uri’s. So CODES must be unique. For the moment we could keep name unique but should migrate from it. Its a matter of rewriting all our comparators I guess. UUIDs I am told are unique through some sort of divinity so we apparently do not need to worry about them

I’ve also tried to reduce the number of knees on the donkey - from 11 tables to 6. I believe this can be done whilst preserving the existing functionality. This arangement would make it much more sensible to produce the XML I need to produce. I’m hoping that it would also be more friendly to those who would be trying to pivot the data across dimensions.

Jason do you think this works for you? I might have missed out something really fundamental. Abyot, you’ve been through this process before - am I missing something? From the DataValue you can see DimensionElements. And once you know a DimensionElement you also know the Dimension to which it belongs. I think thats queryable. Will have to hydrate with some data and see.

Shaking the multidimensional model up like this would obviously have implications. But I suspect most of it is taking stuff away rather than adding new so it might just be doable. Less is more.

Not spending time with docbook yet, till I get some feedback.

Cheers
Bob

dxf2.0sample.xml (2.9 KB)

···

2009/9/29 Bob Jolliffe bobjolliffe@gmail.com

Hi

On the back of Jason and others comments, I’ve reached the conclusion that we cannot really live with the MD model the way it is. Whereas I think it is (just about) workable there are some serious optimizations we can and should do. I am going to put my other work back a day or two and propose some changes in a branch.

I think central to the inefficiency is the many-many relation between categories and categoryoptions. This strikes me as illogical as well as being cumbersome in the UI. Do we really want to be able to make categories with options like {‘0<5’,‘6-10’,‘Male’,‘Out of stock’,‘35-40’}. Reducing the relation between categories and category options to 1-n cuts two tables, should make sql queries more efficient and grokkable and also matches other models such as sdmx better.

The other possiible inefficiency is the dimensionset. It can be useful in some contexts but I’m guessing that when querying the data (which we want to be fast) it is not relevant. A dataelement can have dimensions. The fact that some dataelements have the same combinations of dimensions is very useful to know for some purposes, but it should be possible to get from the dataelement to the dimension directly.

On the other side of the road is the hierarchical dimensionality idea I see Ola and Jason have been discussing, where dimensions are composed (perhaps post-facto) of uni-dimensional dataelements rather than decomposed into pre-structured dimensional elements. I suspect that:

we need both; and

from the API, user and reporting perspective they should look the same (ie a dataelement can have dimensions - how they come about should not be a concern at the end point).

I’ll try out some of these ideas and point you to the branch.

Regards
Bob

2009/9/29 Lars Helge Øverland larshelge@gmail.com

Thanks for the explanations Jason. The multidimensional model is quite complicated, is poorly documented, and as you say is DHIS-centric in the way that it is built around the DHIS notion of a Data Element.

Could we assemble and put some of the text being written on the list to docbook?

That said, and I think Jason already has made a strong case for this, also in a 100% DHIS2 scenario you will need more flexibility in defining dimensions to your data than what categories can provide. Being able to define data dimensions independent of data collection is powerful and should be supported in a better way than what data element groups provide today. Given that we already have the orgunit group set code in place I would assume that adding group sets to data elements could be a relatively straight forward thing to do (but then again, I am not the programmer…).

I don’t see any implications in adding this to the system, it won’t require changes to the existing model as the association goes from the groupset to the groups. We can prioritize this for the 2.0.3 release.

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Abyot_Gizaw · 29 September 2009 18:01

Yes your suggestion is doable and less is better … but I think the requirement from the field is more complex.

If, for a moment, we stop talking about datavalues and talk about dataelements - why are we talking about dimension combinations?

Because you are assuming a dataelement to have only one dimension. Am I correct? If that is the case, I see a little bit of inconsistency here. DataElement talks about one dimesion, but its corresponding value talks about combination of dimensions.

Yes from the datavalue I can have dimensionelementcombinations, pick dimensionelments regroup and put them in their corresponding dimesions – in the end telling me from which dimension they came from. But from this point onwards I am no more talking about a value of a single dataelement but a value for combination of dataelements (because I have to pull different dataelements which can give me the identified dimensions) … but is this what we want?

The other point I would like the raise is - will there not be any limitation on the flexibility of the system when putting the restriction “A Dimension has many DimensionElements. But a DimensionElement is a member of only one Dimension” ? Not only system flexibility problem, I see a logical problem as well. Because if we think for example beyond the obvious SEX(male,female,unknown) - I see a strong need for letting dimensionelements to be member of multiple dimensions: For example take the other obvious dimension - AGE. And assume <5 yrs, 5-10 yrs, and <5 yrs as its dimesionelements. May be such scaling of the AGE dimension is approrpiate for Malaria case, but for TB case people might be interested to break the AGE dimension into <5yrs, 5-10yrs, 10-15yrs, >15yrs - so how are we going to handle cases like this? Are we going to define a number of <5yrs or are we going to use the same <5yr dimensionelement ?

Thank you
Abyot.

···

On Tue, Sep 29, 2009 at 4:45 PM, Bob Jolliffe bobjolliffe@gmail.com wrote:

OK. Here’s my first attempt to rationalize things. Please excuse the attachments. I try not to send attachments to mailing lists but these are at least fairly small. (And Lars I will write it up in docbook after fishing for feedback).

My primary aim has been to disturb the existing model as little as possible whilst trying to simplify wherever possible.

Attached oldmodel.png shows the participants in the existing model. As you can see there are 11 tables in all. I haven’t showed the relations as it becomes a bit of a web.

Also attached is a proposed amended database model which bears sufficient similarity to the old that migration between the two should be feasible. But it is down to 6 tables. And I have named the tables according to the terms we have been discussing. Of course this is just the database model. I’ve also put together an XML view of what some sample dataset might look like. There is also a UML model required which would be richer than the underlying datamodel, but one step at a time …

Walking through:

DataElements can have Dimensions. And different dataElements can (and hopefully will) share some of the same Dimensions. So there is a m-to-n relationship between the two necessitating an extra table (DataElementDimensions). An example of a Dimension is SEX. Nothing new here.

Dimensions have DimensionElements. So SEX for example might have DimensionElements “Male”, “Female”, “Unknown”. A big difference from the old model is that there is 1-n relationship between DimensionElements and Dimensions. A Dimension has many DimensionElements. But a DimensionElement is a a member of only one Dimension.

DataValues represent the values at intersection of these Dimensions. Keeping with the spirit of the old model this intersection is represented by a single key, DimensionElementCombination. The DimensionElementCombinations would be populated when a new Dimension is added to a DataElement. Like the original model there is some fragility here. Changing dimensions on dataelements could create a situation where datavalues become orphaned or misdirected. The API must have robust methods for defending this integrity particulalrly when updating the structural metadata. But this is perhaps doable. Either way its not worse than we have.

I haven’t given a name to DimensionElementCombinations. From the examples I have seen from SL this seems to be unnecessary. The names I have seen being used are generally simply contrived from the dimensions or (worse still) from the categoryoptions. What is important is that dataelements can have sets of dimensions.

And then much of what is different is just a renaming of the original entities. From the attached XML file I think you can see some of the issues faced re names and identifiers. I find myself following a sort of convention of CODE, Name, Description and UUID. CODE’s must be unique within the scope of the database. I suppose this is close to what we currently call ShortName. I would like to place constraints on CODES in terms of length and also the disallowing of spaces and other funny characters. The reason being that we may well have to use these codes in making up uri’s. So CODES must be unique. For the moment we could keep name unique but should migrate from it. Its a matter of rewriting all our comparators I guess. UUIDs I am told are unique through some sort of divinity so we apparently do not need to worry about them

I’ve also tried to reduce the number of knees on the donkey - from 11 tables to 6. I believe this can be done whilst preserving the existing functionality. This arangement would make it much more sensible to produce the XML I need to produce. I’m hoping that it would also be more friendly to those who would be trying to pivot the data across dimensions.

Jason do you think this works for you? I might have missed out something really fundamental. Abyot, you’ve been through this process before - am I missing something? From the DataValue you can see DimensionElements. And once you know a DimensionElement you also know the Dimension to which it belongs. I think thats queryable. Will have to hydrate with some data and see.

Shaking the multidimensional model up like this would obviously have implications. But I suspect most of it is taking stuff away rather than adding new so it might just be doable. Less is more.

Not spending time with docbook yet, till I get some feedback.

Cheers
Bob

2009/9/29 Bob Jolliffe bobjolliffe@gmail.com

Hi

On the back of Jason and others comments, I’ve reached the conclusion that we cannot really live with the MD model the way it is. Whereas I think it is (just about) workable there are some serious optimizations we can and should do. I am going to put my other work back a day or two and propose some changes in a branch.

I think central to the inefficiency is the many-many relation between categories and categoryoptions. This strikes me as illogical as well as being cumbersome in the UI. Do we really want to be able to make categories with options like {‘0<5’,‘6-10’,‘Male’,‘Out of stock’,‘35-40’}. Reducing the relation between categories and category options to 1-n cuts two tables, should make sql queries more efficient and grokkable and also matches other models such as sdmx better.

The other possiible inefficiency is the dimensionset. It can be useful in some contexts but I’m guessing that when querying the data (which we want to be fast) it is not relevant. A dataelement can have dimensions. The fact that some dataelements have the same combinations of dimensions is very useful to know for some purposes, but it should be possible to get from the dataelement to the dimension directly.

On the other side of the road is the hierarchical dimensionality idea I see Ola and Jason have been discussing, where dimensions are composed (perhaps post-facto) of uni-dimensional dataelements rather than decomposed into pre-structured dimensional elements. I suspect that:

we need both; and

from the API, user and reporting perspective they should look the same (ie a dataelement can have dimensions - how they come about should not be a concern at the end point).

I’ll try out some of these ideas and point you to the branch.

Regards
Bob

2009/9/29 Lars Helge Øverland larshelge@gmail.com

Thanks for the explanations Jason. The multidimensional model is quite complicated, is poorly documented, and as you say is DHIS-centric in the way that it is built around the DHIS notion of a Data Element.

Could we assemble and put some of the text being written on the list to docbook?

That said, and I think Jason already has made a strong case for this, also in a 100% DHIS2 scenario you will need more flexibility in defining dimensions to your data than what categories can provide. Being able to define data dimensions independent of data collection is powerful and should be supported in a better way than what data element groups provide today. Given that we already have the orgunit group set code in place I would assume that adding group sets to data elements could be a relatively straight forward thing to do (but then again, I am not the programmer…).

I don’t see any implications in adding this to the system, it won’t require changes to the existing model as the association goes from the groupset to the groups. We can prioritize this for the 2.0.3 release.

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

jason · 29 September 2009 19:16

I think Abyot raises some good points, especially his last one about
differenences of what the age dimension really is.

I think the biggest challenge is going to be how to unite the concepts
of a multidimensional data element (as it is currently implemented
with categories) and a data element that has no multidimensionality,
at least in the sense of it not being assigned any categories.

What about the following scenario. Could the cateogry/category combos
be transformed somehow into a sort of data element generator? Users
could define a dimensionality set, assign a master data element, and
DHIS would create all of the necessary data elements. So a category
combination of Patient Status (OPD, IPD, Deaths) and Age (Under 1
,Under 5 and Over 5) and template data element (Clinical malaria)
would produce :

OPD Under 1 Clinical Malaria {OPD, Under 1, Clinical Malaria}
OPD Under 5 Clinical Malaria {OPD, 1-5, Clinical Malaria}
OPD Over 5 Clinical Malaria ...
OPD Clinical Malaria Total {OPD, All ages, Clinical Malaria}
...
..
..
IP Clinical Malaria Total {IP, All ages, Clinical Malaria}
...
...
...
Deaths Clinical Malaria Total {Deaths, All ages, Clinical malaria}
Clinical Malaria Total {All patient status, All ages, Clinical malaria}

Each one of those data elements would then be assigned a set of
dimensions, and a set of dimensional elements.
The cateogries functionality would simply be an artifact to produce
multiple data elements, without having to enter them seperately, which
if I understood Ola yesterday, was one of its intended purposes.

Now, for those of use such as myself, that do that have already create
dozens of data elements with different dimensions in their names (but
no where in a relational table) we could assign the dimensionality in
a seperate step (post-facto as Bob mentioned earlier). I might want to
assign a "uber" dimension of "Communicalble" and "Non-communicable" to
a disease type that might not have anything to do with the definition
of the data element itself, but would be simply for analysis purposes
later. Again, I may be rehashing my previous emails here, but from a
pure SQl standpoint, the approach I suggest here makes sense to me, in
terms of queries of how to pull this into a crosstab as well as how to
generate a fact table that something like an OLAP server could deal
with

This approach might seem to resolve the issue of how to deal with
these two different beasts, but unfolding the multidimensional data
element into simpler components. Meaning that the
cateorgy/combos/options would be used as a templating mechanisms, but
that dimensionality could be assigned through a separate set of
relations. Perhaps this is what is represented in the diagram, but I
will need to study it tomorrow after some sleep.

I do think that that dimenional elements should not be able to be
share by dimensions, and that dimensions and dimensional elements
should not be able to be deleted without lots of bells and whistles
going off once they have been assigned to data elements.

I guess the key question is whether data elements should be able to
have multiple DimensionElementCombinations, which I think is the
current implementation. I am just not sure this will work with a
combination of DHIS2-type-multidimensional elements, and DHIS1.4-type
data elements.

Enough for today.

Thanks for this Bob. It is a good start. Can't you make this diagram
in DocBook so I can edit it?

Regards,
Jason

···

On Tue, Sep 29, 2009 at 8:01 PM, Abyot Gizaw <abyodia@gmail.com> wrote:

Yes your suggestion is doable and less is better .... but I think the
requirement from the field is more complex.

If, for a moment, we stop talking about datavalues and talk about
dataelements - why are we talking about dimension combinations?

Because you are assuming a dataelement to have only one dimension. Am I
correct? If that is the case, I see a little bit of inconsistency here.
DataElement talks about one dimesion, but its corresponding value talks
about combination of dimensions.

Yes from the datavalue I can have dimensionelementcombinations, pick
dimensionelments regroup and put them in their corresponding dimesions -- in
the end telling me from which dimension they came from. But from this point
onwards I am no more talking about a value of a single dataelement but a
value for combination of dataelements (because I have to pull different
dataelements which can give me the identified dimensions) .... but is this
what we want?

The other point I would like the raise is - will there not be any limitation
on the flexibility of the system when putting the restriction "A Dimension
has many DimensionElements. But a DimensionElement is a member of only one
Dimension" ? Not only system flexibility problem, I see a logical problem as
well. Because if we think for example beyond the obvious
SEX(male,female,unknown) - I see a strong need for letting dimensionelements
to be member of multiple dimensions: For example take the other obvious
dimension - AGE. And assume <5 yrs, 5-10 yrs, and <5 yrs as its
dimesionelements. May be such scaling of the AGE dimension is approrpiate
for Malaria case, but for TB case people might be interested to break the
AGE dimension into <5yrs, 5-10yrs, 10-15yrs, >15yrs - so how are we going to
handle cases like this? Are we going to define a number of <5yrs or are we
going to use the same <5yr dimensionelement ?

Thank you
Abyot.

On Tue, Sep 29, 2009 at 4:45 PM, Bob Jolliffe <bobjolliffe@gmail.com> wrote:

OK. Here's my first attempt to rationalize things. Please excuse the
attachments. I try not to send attachments to mailing lists but these are
at least fairly small. (And Lars I will write it up in docbook after
fishing for feedback).

My primary aim has been to disturb the existing model as little as
possible whilst trying to simplify wherever possible.

Attached oldmodel.png shows the participants in the existing model. As
you can see there are 11 tables in all. I haven't showed the relations as
it becomes a bit of a web.

Also attached is a proposed amended database model which bears sufficient
similarity to the old that migration between the two should be feasible.
But it is down to 6 tables. And I have named the tables according to the
terms we have been discussing. Of course this is just the database model.
I've also put together an XML view of what some sample dataset might look
like. There is also a UML model required which would be richer than the
underlying datamodel, but one step at a time ....

Walking through:

1. DataElements can have Dimensions. And different dataElements can (and
hopefully will) share some of the same Dimensions. So there is a m-to-n
relationship between the two necessitating an extra table
(DataElementDimensions). An example of a Dimension is SEX. Nothing new
here.

2. Dimensions have DimensionElements. So SEX for example might have
DimensionElements "Male", "Female", "Unknown". A big difference from the
old model is that there is 1-n relationship between DimensionElements and
Dimensions. A Dimension has many DimensionElements. But a DimensionElement
is a a member of only one Dimension.

3. DataValues represent the values at intersection of these Dimensions.
Keeping with the spirit of the old model this intersection is represented by
a single key, DimensionElementCombination. The DimensionElementCombinations
would be populated when a new Dimension is added to a DataElement. Like the
original model there is some fragility here. Changing dimensions on
dataelements could create a situation where datavalues become orphaned or
misdirected. The API must have robust methods for defending this integrity
particulalrly when updating the structural metadata. But this is perhaps
doable. Either way its not worse than we have.

I haven't given a name to DimensionElementCombinations. From the examples
I have seen from SL this seems to be unnecessary. The names I have seen
being used are generally simply contrived from the dimensions or (worse
still) from the categoryoptions. What is important is that dataelements can
have sets of dimensions.

And then much of what is different is just a renaming of the original
entities. From the attached XML file I think you can see some of the
issues faced re names and identifiers. I find myself following a sort of
convention of CODE, Name, Description and UUID. CODE's must be unique
within the scope of the database. I suppose this is close to what we
currently call ShortName. I would like to place constraints on CODES in
terms of length and also the disallowing of spaces and other funny
characters. The reason being that we may well have to use these codes in
making up uri's. So CODES must be unique. For the moment we could keep
name unique but should migrate from it. Its a matter of rewriting all our
comparators I guess. UUIDs I am told are unique through some sort of
divinity so we apparently do not need to worry about them

I've also tried to reduce the number of knees on the donkey - from 11
tables to 6. I believe this can be done whilst preserving the existing
functionality. This arangement would make it much more sensible to produce
the XML I need to produce. I'm hoping that it would also be more friendly
to those who would be trying to pivot the data across dimensions.

Jason do you think this works for you? I might have missed out something
really fundamental. Abyot, you've been through this process before - am I
missing something? From the DataValue you can see DimensionElements. And
once you know a DimensionElement you also know the Dimension to which it
belongs. I think thats queryable. Will have to hydrate with some data and
see.

Shaking the multidimensional model up like this would obviously have
implications. But I suspect most of it is taking stuff away rather than
adding new so it might just be doable. Less is more.

Not spending time with docbook yet, till I get some feedback.

Cheers
Bob

2009/9/29 Bob Jolliffe <bobjolliffe@gmail.com>

Hi

On the back of Jason and others comments, I've reached the conclusion
that we cannot really live with the MD model the way it is. Whereas I think
it is (just about) workable there are some serious optimizations we can and
should do. I am going to put my other work back a day or two and propose
some changes in a branch.

I think central to the inefficiency is the many-many relation between
categories and categoryoptions. This strikes me as illogical as well as
being cumbersome in the UI. Do we really want to be able to make categories
with options like {'0<5','6-10','Male','Out of stock','35-40'}. Reducing
the relation between categories and category options to 1-n cuts two tables,
should make sql queries more efficient and grokkable and also matches other
models such as sdmx better.

The other possiible inefficiency is the dimensionset. It can be useful
in some contexts but I'm guessing that when querying the data (which we want
to be fast) it is not relevant. A dataelement can have dimensions. The
fact that some dataelements have the same combinations of dimensions is very
useful to know for some purposes, but it should be possible to get from the
dataelement to the dimension directly.

On the other side of the road is the hierarchical dimensionality idea I
see Ola and Jason have been discussing, where dimensions are composed
(perhaps post-facto) of uni-dimensional dataelements rather than decomposed
into pre-structured dimensional elements. I suspect that:
1. we need both; and
2. from the API, user and reporting perspective they should look the
same (ie a dataelement can have dimensions - how they come about should not
be a concern at the end point).

I'll try out some of these ideas and point you to the branch.

Regards
Bob

2009/9/29 Lars Helge Øverland <larshelge@gmail.com>

Thanks for the explanations Jason. The multidimensional model is quite
complicated, is poorly documented, and as you say is DHIS-centric in the way
that it is built around the DHIS notion of a Data Element.

Could we assemble and put some of the text being written on the list to
docbook?

That said, and I think Jason already has made a strong case for this,
also in a 100% DHIS2 scenario you will need more flexibility in defining
dimensions to your data than what categories can provide. Being able to
define data dimensions independent of data collection is powerful and should
be supported in a better way than what data element groups provide today.
Given that we already have the orgunit group set code in place I would
assume that adding group sets to data elements could be a relatively
straight forward thing to do (but then again, I am not the programmer...).

I don't see any implications in adding this to the system, it won't
require changes to the existing model as the association goes from the
groupset to the groups. We can prioritize this for the 2.0.3 release.

_______________________________________________
Mailing list: https://launchpad.net/~dhis2-devs
Post to : dhis2-devs@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-devs
More help : https://help.launchpad.net/ListHelp

_______________________________________________
Mailing list: https://launchpad.net/~dhis2-devs
Post to : dhis2-devs@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-devs
More help : https://help.launchpad.net/ListHelp

_______________________________________________
Mailing list: https://launchpad.net/~dhis2-devs
Post to : dhis2-devs@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-devs
More help : https://help.launchpad.net/ListHelp

Abyot_Gizaw · 29 September 2009 20:08

I think Abyot raises some good points, especially his last one about

differenences of what the age dimension really is.

I think the biggest challenge is going to be how to unite the concepts

of a multidimensional data element (as it is currently implemented

with categories) and a data element that has no multidimensionality,

at least in the sense of it not being assigned any categories.

Isn’t this what we have in the current system? If you are not assigning any combination of categories for a dataelement (well of course for the sake of consistency - from programming logic point of view - implicitly a default category combination with one default category having one default option is assigned - it is like putting your value at zero on the dimensions axis) then the dataelement has no dimensionality.

What about the following scenario. Could the cateogry/category combos

be transformed somehow into a sort of data element generator? Users

could define a dimensionality set, assign a master data element, and

DHIS would create all of the necessary data elements. So a category

combination of Patient Status (OPD, IPD, Deaths) and Age (Under 1

,Under 5 and Over 5) and template data element (Clinical malaria)

would produce :

OPD Under 1 Clinical Malaria {OPD, Under 1, Clinical Malaria}

OPD Under 5 Clinical Malaria {OPD, 1-5, Clinical Malaria}

OPD Over 5 Clinical Malaria …

OPD Clinical Malaria Total {OPD, All ages, Clinical Malaria}

…

…

…

IP Clinical Malaria Total {IP, All ages, Clinical Malaria}

…

…

…

Deaths Clinical Malaria Total {Deaths, All ages, Clinical malaria}

Clinical Malaria Total {All patient status, All ages, Clinical malaria}

Each one of those data elements would then be assigned a set of

dimensions, and a set of dimensional elements.

The cateogries functionality would simply be an artifact to produce

multiple data elements, without having to enter them seperately, which

if I understood Ola yesterday, was one of its intended purposes.

Now, for those of use such as myself, that do that have already create

dozens of data elements with different dimensions in their names (but

no where in a relational table) we could assign the dimensionality in

a seperate step (post-facto as Bob mentioned earlier). I might want to

assign a “uber” dimension of “Communicalble” and “Non-communicable” to

a disease type that might not have anything to do with the definition

of the data element itself, but would be simply for analysis purposes

later. Again, I may be rehashing my previous emails here, but from a

pure SQl standpoint, the approach I suggest here makes sense to me, in

terms of queries of how to pull this into a crosstab as well as how to

generate a fact table that something like an OLAP server could deal

with

This approach might seem to resolve the issue of how to deal with

these two different beasts, but unfolding the multidimensional data

element into simpler components. Meaning that the

cateorgy/combos/options would be used as a templating mechanisms, but

that dimensionality could be assigned through a separate set of

relations. Perhaps this is what is represented in the diagram, but I

will need to study it tomorrow after some sleep.

I do think that that dimenional elements should not be able to be

share by dimensions, and that dimensions and dimensional elements

should not be able to be deleted without lots of bells and whistles

going off once they have been assigned to data elements.

What is wrong with that as long as values are not associated with them? I think we will be falling back to the current implemention instead - like dimensional elements should not be deleted once values are assigned to their combinations.

I guess the key question is whether data elements should be able to

have multiple DimensionElementCombinations, which I think is the

current implementation. I am just not sure this will work with a

combination of DHIS2-type-multidimensional elements, and DHIS1.4-type

data elements.

Can anyone explain me how the DHIS2 multidimensional dataelement concept fails to handle the DHIS 1.4 dataelements - sorry may be I missed this from your earlier discussion? I think the way I see it - if the objective is on OLAP, pivoting/querying, then what we need is not to change the model - instead to develop more APIs which can pull data along a dimension, varying degree of overlappings across dimensions - or more generally aggregation of values over a flexible set of dimensionelementcombinations !

Using the example above - {OPD, IPD}, {Male, Female},{Under 1, 1-5, Above 5} and malaria as base dataelement

What we have currently is an API to provide values for

Malaria(OPD,Male,Under 1)
Malaria(OPD,Male,1-5)

Malaria(OPD,Male,Above 5)
Malaria(OPD,Female,Under 1)
Malaria(OPD,Female,1-5)
Malaria(OPD,Female,Above 5)
…
…

And if I understood correctly … what is required is to have registred cases of

Malaria in the OPD,
Malaria in the IPD
Malaria for Males
Malaria for Females
…
…

Malaria In the OPD but only those Female
Malaria In the IPD but for male
…
…
…
we can list different combinations…

or finally ask … for the Malaria

Isn’t this a simple question of Aggregation? Does the multidimensional datamodel have a limitation to handle the above requirements - or am I talking a different stuff here?

···

On Tue, Sep 29, 2009 at 9:16 PM, Jason Pickering jason.p.pickering@gmail.com wrote:

Enough for today.

Thanks for this Bob. It is a good start. Can’t you make this diagram

in DocBook so I can edit it?

Regards,

Jason

On Tue, Sep 29, 2009 at 8:01 PM, Abyot Gizaw abyodia@gmail.com wrote:

Yes your suggestion is doable and less is better … but I think the

requirement from the field is more complex.

If, for a moment, we stop talking about datavalues and talk about

dataelements - why are we talking about dimension combinations?

Because you are assuming a dataelement to have only one dimension. Am I

correct? If that is the case, I see a little bit of inconsistency here.

DataElement talks about one dimesion, but its corresponding value talks

about combination of dimensions.

Yes from the datavalue I can have dimensionelementcombinations, pick

dimensionelments regroup and put them in their corresponding dimesions – in

the end telling me from which dimension they came from. But from this point

onwards I am no more talking about a value of a single dataelement but a

value for combination of dataelements (because I have to pull different

dataelements which can give me the identified dimensions) … but is this

what we want?

The other point I would like the raise is - will there not be any limitation

on the flexibility of the system when putting the restriction "A Dimension

has many DimensionElements. But a DimensionElement is a member of only one

Dimension" ? Not only system flexibility problem, I see a logical problem as

well. Because if we think for example beyond the obvious

SEX(male,female,unknown) - I see a strong need for letting dimensionelements

to be member of multiple dimensions: For example take the other obvious

dimension - AGE. And assume <5 yrs, 5-10 yrs, and <5 yrs as its

dimesionelements. May be such scaling of the AGE dimension is approrpiate

for Malaria case, but for TB case people might be interested to break the

AGE dimension into <5yrs, 5-10yrs, 10-15yrs, >15yrs - so how are we going to

handle cases like this? Are we going to define a number of <5yrs or are we

going to use the same <5yr dimensionelement ?

Thank you

Abyot.

On Tue, Sep 29, 2009 at 4:45 PM, Bob Jolliffe bobjolliffe@gmail.com wrote:

OK. Here’s my first attempt to rationalize things. Please excuse the

attachments. I try not to send attachments to mailing lists but these are

at least fairly small. (And Lars I will write it up in docbook after

fishing for feedback).

My primary aim has been to disturb the existing model as little as

possible whilst trying to simplify wherever possible.

Attached oldmodel.png shows the participants in the existing model. As

you can see there are 11 tables in all. I haven’t showed the relations as

it becomes a bit of a web.

Also attached is a proposed amended database model which bears sufficient

similarity to the old that migration between the two should be feasible.

But it is down to 6 tables. And I have named the tables according to the

terms we have been discussing. Of course this is just the database model.

I’ve also put together an XML view of what some sample dataset might look

like. There is also a UML model required which would be richer than the

underlying datamodel, but one step at a time …

Walking through:

DataElements can have Dimensions. And different dataElements can (and

hopefully will) share some of the same Dimensions. So there is a m-to-n

relationship between the two necessitating an extra table

(DataElementDimensions). An example of a Dimension is SEX. Nothing new

here.

Dimensions have DimensionElements. So SEX for example might have

DimensionElements “Male”, “Female”, “Unknown”. A big difference from the

old model is that there is 1-n relationship between DimensionElements and

Dimensions. A Dimension has many DimensionElements. But a DimensionElement

is a a member of only one Dimension.

DataValues represent the values at intersection of these Dimensions.

Keeping with the spirit of the old model this intersection is represented by

a single key, DimensionElementCombination. The DimensionElementCombinations

would be populated when a new Dimension is added to a DataElement. Like the

original model there is some fragility here. Changing dimensions on

dataelements could create a situation where datavalues become orphaned or

misdirected. The API must have robust methods for defending this integrity

particulalrly when updating the structural metadata. But this is perhaps

doable. Either way its not worse than we have.

I haven’t given a name to DimensionElementCombinations. From the examples

I have seen from SL this seems to be unnecessary. The names I have seen

being used are generally simply contrived from the dimensions or (worse

still) from the categoryoptions. What is important is that dataelements can

have sets of dimensions.

And then much of what is different is just a renaming of the original

entities. From the attached XML file I think you can see some of the

issues faced re names and identifiers. I find myself following a sort of

convention of CODE, Name, Description and UUID. CODE’s must be unique

within the scope of the database. I suppose this is close to what we

currently call ShortName. I would like to place constraints on CODES in

terms of length and also the disallowing of spaces and other funny

characters. The reason being that we may well have to use these codes in

making up uri’s. So CODES must be unique. For the moment we could keep

name unique but should migrate from it. Its a matter of rewriting all our

comparators I guess. UUIDs I am told are unique through some sort of

divinity so we apparently do not need to worry about them

I’ve also tried to reduce the number of knees on the donkey - from 11

tables to 6. I believe this can be done whilst preserving the existing

functionality. This arangement would make it much more sensible to produce

the XML I need to produce. I’m hoping that it would also be more friendly

to those who would be trying to pivot the data across dimensions.

Jason do you think this works for you? I might have missed out something

really fundamental. Abyot, you’ve been through this process before - am I

missing something? From the DataValue you can see DimensionElements. And

once you know a DimensionElement you also know the Dimension to which it

belongs. I think thats queryable. Will have to hydrate with some data and

see.

Shaking the multidimensional model up like this would obviously have

implications. But I suspect most of it is taking stuff away rather than

adding new so it might just be doable. Less is more.

Not spending time with docbook yet, till I get some feedback.

Cheers

Bob

2009/9/29 Bob Jolliffe bobjolliffe@gmail.com

Hi

On the back of Jason and others comments, I’ve reached the conclusion

that we cannot really live with the MD model the way it is. Whereas I think

it is (just about) workable there are some serious optimizations we can and

should do. I am going to put my other work back a day or two and propose

some changes in a branch.

I think central to the inefficiency is the many-many relation between

categories and categoryoptions. This strikes me as illogical as well as

being cumbersome in the UI. Do we really want to be able to make categories

with options like {‘0<5’,‘6-10’,‘Male’,‘Out of stock’,‘35-40’}. Reducing

the relation between categories and category options to 1-n cuts two tables,

should make sql queries more efficient and grokkable and also matches other

models such as sdmx better.

The other possiible inefficiency is the dimensionset. It can be useful

in some contexts but I’m guessing that when querying the data (which we want

to be fast) it is not relevant. A dataelement can have dimensions. The

fact that some dataelements have the same combinations of dimensions is very

useful to know for some purposes, but it should be possible to get from the

dataelement to the dimension directly.

On the other side of the road is the hierarchical dimensionality idea I

see Ola and Jason have been discussing, where dimensions are composed

(perhaps post-facto) of uni-dimensional dataelements rather than decomposed

into pre-structured dimensional elements. I suspect that:

we need both; and

from the API, user and reporting perspective they should look the

same (ie a dataelement can have dimensions - how they come about should not

be a concern at the end point).

I’ll try out some of these ideas and point you to the branch.

Regards

Bob

2009/9/29 Lars Helge Øverland larshelge@gmail.com

Thanks for the explanations Jason. The multidimensional model is quite

complicated, is poorly documented, and as you say is DHIS-centric in the way

that it is built around the DHIS notion of a Data Element.

Could we assemble and put some of the text being written on the list to

docbook?

That said, and I think Jason already has made a strong case for this,

also in a 100% DHIS2 scenario you will need more flexibility in defining

dimensions to your data than what categories can provide. Being able to

define data dimensions independent of data collection is powerful and should

be supported in a better way than what data element groups provide today.

Given that we already have the orgunit group set code in place I would

assume that adding group sets to data elements could be a relatively

straight forward thing to do (but then again, I am not the programmer…).

I don’t see any implications in adding this to the system, it won’t

require changes to the existing model as the association goes from the

groupset to the groups. We can prioritize this for the 2.0.3 release.

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

bobj · 29 September 2009 20:33

Hi

Yes your suggestion is doable and less is better … but I think the requirement from the field is more complex.

If, for a moment, we stop talking about datavalues and talk about dataelements - why are we talking about dimension combinations?

Because you are assuming a dataelement to have only one dimension. Am I correct? If that is the case, I see a little bit of inconsistency here. DataElement talks about one dimesion, but its corresponding value talks about combination of dimensions.

No you are misreading me - or I have made a mistake. DataElement can have may dimensions. If it were just one there would just be a n-1 relation between Dimension and DataElement. Because DataElement can have more than one dimension, I have the DateElementDimension table inbetween. I actually meant to call it DateElementDimensions but table names should generally be singular. So the contents of this table might look like:

dimensionID, dataElementID
1, 45
1, 46
2, 45
3, 45
4, 6

So dataelement 45 would have 3 dimensions etc

Yes from the datavalue I can have dimensionelementcombinations, pick dimensionelments regroup and put them in their corresponding dimesions – in the end telling me from which dimension they came from. But from this point onwards I am no more talking about a value of a single dataelement but a value for combination of dataelements (because I have to pull different dataelements which can give me the identified dimensions) … but is this what we want?

The other point I would like the raise is - will there not be any limitation on the flexibility of the system when putting the restriction “A Dimension has many DimensionElements. But a DimensionElement is a member of only one Dimension” ? Not only system flexibility problem, I see a logical problem as well. Because if we think for example beyond the obvious SEX(male,female,unknown) - I see a strong need for letting dimensionelements to be member of multiple dimensions: For example take the other obvious dimension - AGE. And assume <5 yrs, 5-10 yrs, and <5 yrs as its dimesionelements. May be such scaling of the AGE dimension is approrpiate for Malaria case, but for TB case people might be interested to break the AGE dimension into <5yrs, 5-10yrs, 10-15yrs, >15yrs - so how are we going to handle cases like this? Are we going to define a number of <5yrs or are we going to use the same <5yr dimensionelement ?

I think in this case we would have to define a number of “<5” dimensionelements. I agree that the way it is now there is maximum flexibility, but it comes at quite a cost. I haven’t seen much to suggest that this would be a real limitation. Anyway, the way it stands “<5” is just a label without any intrinsic meaning. So we can just as easily combine it with apples or oranges. By binding a set of dimensionelements to a dimension we at least give them some meaning as an aggregation group.

Thanks for your input. I will lokk again at the first issue and see whether I have made a mistake.

Regards
Bob

···

2009/9/29 Abyot Gizaw abyota@gmail.com

Thank you
Abyot.

On Tue, Sep 29, 2009 at 4:45 PM, Bob Jolliffe bobjolliffe@gmail.com wrote:

OK. Here’s my first attempt to rationalize things. Please excuse the attachments. I try not to send attachments to mailing lists but these are at least fairly small. (And Lars I will write it up in docbook after fishing for feedback).

My primary aim has been to disturb the existing model as little as possible whilst trying to simplify wherever possible.

Attached oldmodel.png shows the participants in the existing model. As you can see there are 11 tables in all. I haven’t showed the relations as it becomes a bit of a web.

Also attached is a proposed amended database model which bears sufficient similarity to the old that migration between the two should be feasible. But it is down to 6 tables. And I have named the tables according to the terms we have been discussing. Of course this is just the database model. I’ve also put together an XML view of what some sample dataset might look like. There is also a UML model required which would be richer than the underlying datamodel, but one step at a time …

Walking through:

DataElements can have Dimensions. And different dataElements can (and hopefully will) share some of the same Dimensions. So there is a m-to-n relationship between the two necessitating an extra table (DataElementDimensions). An example of a Dimension is SEX. Nothing new here.

Dimensions have DimensionElements. So SEX for example might have DimensionElements “Male”, “Female”, “Unknown”. A big difference from the old model is that there is 1-n relationship between DimensionElements and Dimensions. A Dimension has many DimensionElements. But a DimensionElement is a a member of only one Dimension.

DataValues represent the values at intersection of these Dimensions. Keeping with the spirit of the old model this intersection is represented by a single key, DimensionElementCombination. The DimensionElementCombinations would be populated when a new Dimension is added to a DataElement. Like the original model there is some fragility here. Changing dimensions on dataelements could create a situation where datavalues become orphaned or misdirected. The API must have robust methods for defending this integrity particulalrly when updating the structural metadata. But this is perhaps doable. Either way its not worse than we have.

I haven’t given a name to DimensionElementCombinations. From the examples I have seen from SL this seems to be unnecessary. The names I have seen being used are generally simply contrived from the dimensions or (worse still) from the categoryoptions. What is important is that dataelements can have sets of dimensions.

And then much of what is different is just a renaming of the original entities. From the attached XML file I think you can see some of the issues faced re names and identifiers. I find myself following a sort of convention of CODE, Name, Description and UUID. CODE’s must be unique within the scope of the database. I suppose this is close to what we currently call ShortName. I would like to place constraints on CODES in terms of length and also the disallowing of spaces and other funny characters. The reason being that we may well have to use these codes in making up uri’s. So CODES must be unique. For the moment we could keep name unique but should migrate from it. Its a matter of rewriting all our comparators I guess. UUIDs I am told are unique through some sort of divinity so we apparently do not need to worry about them

I’ve also tried to reduce the number of knees on the donkey - from 11 tables to 6. I believe this can be done whilst preserving the existing functionality. This arangement would make it much more sensible to produce the XML I need to produce. I’m hoping that it would also be more friendly to those who would be trying to pivot the data across dimensions.

Jason do you think this works for you? I might have missed out something really fundamental. Abyot, you’ve been through this process before - am I missing something? From the DataValue you can see DimensionElements. And once you know a DimensionElement you also know the Dimension to which it belongs. I think thats queryable. Will have to hydrate with some data and see.

Shaking the multidimensional model up like this would obviously have implications. But I suspect most of it is taking stuff away rather than adding new so it might just be doable. Less is more.

Not spending time with docbook yet, till I get some feedback.

Cheers
Bob

2009/9/29 Bob Jolliffe bobjolliffe@gmail.com

Hi

On the back of Jason and others comments, I’ve reached the conclusion that we cannot really live with the MD model the way it is. Whereas I think it is (just about) workable there are some serious optimizations we can and should do. I am going to put my other work back a day or two and propose some changes in a branch.

I think central to the inefficiency is the many-many relation between categories and categoryoptions. This strikes me as illogical as well as being cumbersome in the UI. Do we really want to be able to make categories with options like {‘0<5’,‘6-10’,‘Male’,‘Out of stock’,‘35-40’}. Reducing the relation between categories and category options to 1-n cuts two tables, should make sql queries more efficient and grokkable and also matches other models such as sdmx better.

The other possiible inefficiency is the dimensionset. It can be useful in some contexts but I’m guessing that when querying the data (which we want to be fast) it is not relevant. A dataelement can have dimensions. The fact that some dataelements have the same combinations of dimensions is very useful to know for some purposes, but it should be possible to get from the dataelement to the dimension directly.

On the other side of the road is the hierarchical dimensionality idea I see Ola and Jason have been discussing, where dimensions are composed (perhaps post-facto) of uni-dimensional dataelements rather than decomposed into pre-structured dimensional elements. I suspect that:

we need both; and

from the API, user and reporting perspective they should look the same (ie a dataelement can have dimensions - how they come about should not be a concern at the end point).

I’ll try out some of these ideas and point you to the branch.

Regards
Bob

2009/9/29 Lars Helge Øverland larshelge@gmail.com

Thanks for the explanations Jason. The multidimensional model is quite complicated, is poorly documented, and as you say is DHIS-centric in the way that it is built around the DHIS notion of a Data Element.

Could we assemble and put some of the text being written on the list to docbook?

That said, and I think Jason already has made a strong case for this, also in a 100% DHIS2 scenario you will need more flexibility in defining dimensions to your data than what categories can provide. Being able to define data dimensions independent of data collection is powerful and should be supported in a better way than what data element groups provide today. Given that we already have the orgunit group set code in place I would assume that adding group sets to data elements could be a relatively straight forward thing to do (but then again, I am not the programmer…).

I don’t see any implications in adding this to the system, it won’t require changes to the existing model as the association goes from the groupset to the groups. We can prioritize this for the 2.0.3 release.

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

bobj · 29 September 2009 20:49

I think Abyot raises some good points, especially his last one about

differenences of what the age dimension really is.

I think the biggest challenge is going to be how to unite the concepts

of a multidimensional data element (as it is currently implemented

with categories) and a data element that has no multidimensionality,

at least in the sense of it not being assigned any categories.

I didn’t specifically refer to this but my suggestion would simply to allow Null as the DimensionElementCombination. And by default to not have any entries in the DataElementDimensions table which match that DataElement. Similar to “default” I guess.

From an API perspective it might be nice if all dataValues had dimensions (by default the DataElement, the Source and the Period). And some would have extras. So a getDimensions() method on a dataValue would return a list of 3 or more.

What about the following scenario. Could the cateogry/category combos

be transformed somehow into a sort of data element generator? Users

could define a dimensionality set, assign a master data element, and

DHIS would create all of the necessary data elements. So a category

combination of Patient Status (OPD, IPD, Deaths) and Age (Under 1

,Under 5 and Over 5) and template data element (Clinical malaria)

would produce :

OPD Under 1 Clinical Malaria {OPD, Under 1, Clinical Malaria}

OPD Under 5 Clinical Malaria {OPD, 1-5, Clinical Malaria}

OPD Over 5 Clinical Malaria …

OPD Clinical Malaria Total {OPD, All ages, Clinical Malaria}

…

…

…

IP Clinical Malaria Total {IP, All ages, Clinical Malaria}

…

…

…

Deaths Clinical Malaria Total {Deaths, All ages, Clinical malaria}

Clinical Malaria Total {All patient status, All ages, Clinical malaria}

Each one of those data elements would then be assigned a set of

dimensions, and a set of dimensional elements.

The cateogries functionality would simply be an artifact to produce

multiple data elements, without having to enter them seperately, which

if I understood Ola yesterday, was one of its intended purposes.

Funny I had been thinking of exactly this generating idea but the other way round. One of the aims I understand from Ola, is also reduce explosion of number of DataElements. I was thinking that we could implement a nice tool for the user to map and generate multidimensional elements from unidimensional ones as you might get when importing DHIS1.4.

Now, for those of use such as myself, that do that have already create

dozens of data elements with different dimensions in their names (but

no where in a relational table) we could assign the dimensionality in

a seperate step (post-facto as Bob mentioned earlier). I might want to

assign a “uber” dimension of “Communicalble” and “Non-communicable” to

a disease type that might not have anything to do with the definition

of the data element itself, but would be simply for analysis purposes

later. Again, I may be rehashing my previous emails here, but from a

pure SQl standpoint, the approach I suggest here makes sense to me, in

terms of queries of how to pull this into a crosstab as well as how to

generate a fact table that something like an OLAP server could deal

with

This approach might seem to resolve the issue of how to deal with

these two different beasts, but unfolding the multidimensional data

element into simpler components. Meaning that the

cateorgy/combos/options would be used as a templating mechanisms, but

that dimensionality could be assigned through a separate set of

relations. Perhaps this is what is represented in the diagram, but I

will need to study it tomorrow after some sleep.

I think “unbundling” could happen perhaps in the generation of report tables,

I do think that that dimenional elements should not be able to be

share by dimensions, and that dimensions and dimensional elements

should not be able to be deleted without lots of bells and whistles

going off once they have been assigned to data elements.

I guess the key question is whether data elements should be able to

have multiple DimensionElementCombinations, which I think is the

current implementation. I am just not sure this will work with a

combination of DHIS2-type-multidimensional elements, and DHIS1.4-type

data elements.

I think multiple dimensions is a real requirement. I have taken this from the original model and just tried to simplify it a bit. I guess the hierarchical dataelement approach would start to get quite complicated with multiple dimensions …

Regards
Bob

···

2009/9/29 Jason Pickering jason.p.pickering@gmail.com

Enough for today.

Thanks for this Bob. It is a good start. Can’t you make this diagram

in DocBook so I can edit it?

Regards,

Jason

On Tue, Sep 29, 2009 at 8:01 PM, Abyot Gizaw abyodia@gmail.com wrote:

Yes your suggestion is doable and less is better … but I think the

requirement from the field is more complex.

If, for a moment, we stop talking about datavalues and talk about

dataelements - why are we talking about dimension combinations?

Because you are assuming a dataelement to have only one dimension. Am I

correct? If that is the case, I see a little bit of inconsistency here.

DataElement talks about one dimesion, but its corresponding value talks

about combination of dimensions.

Yes from the datavalue I can have dimensionelementcombinations, pick

dimensionelments regroup and put them in their corresponding dimesions – in

the end telling me from which dimension they came from. But from this point

onwards I am no more talking about a value of a single dataelement but a

value for combination of dataelements (because I have to pull different

dataelements which can give me the identified dimensions) … but is this

what we want?

The other point I would like the raise is - will there not be any limitation

on the flexibility of the system when putting the restriction "A Dimension

has many DimensionElements. But a DimensionElement is a member of only one

Dimension" ? Not only system flexibility problem, I see a logical problem as

well. Because if we think for example beyond the obvious

SEX(male,female,unknown) - I see a strong need for letting dimensionelements

to be member of multiple dimensions: For example take the other obvious

dimension - AGE. And assume <5 yrs, 5-10 yrs, and <5 yrs as its

dimesionelements. May be such scaling of the AGE dimension is approrpiate

for Malaria case, but for TB case people might be interested to break the

AGE dimension into <5yrs, 5-10yrs, 10-15yrs, >15yrs - so how are we going to

handle cases like this? Are we going to define a number of <5yrs or are we

going to use the same <5yr dimensionelement ?

Thank you

Abyot.

On Tue, Sep 29, 2009 at 4:45 PM, Bob Jolliffe bobjolliffe@gmail.com wrote:

OK. Here’s my first attempt to rationalize things. Please excuse the

attachments. I try not to send attachments to mailing lists but these are

at least fairly small. (And Lars I will write it up in docbook after

fishing for feedback).

My primary aim has been to disturb the existing model as little as

possible whilst trying to simplify wherever possible.

Attached oldmodel.png shows the participants in the existing model. As

you can see there are 11 tables in all. I haven’t showed the relations as

it becomes a bit of a web.

Also attached is a proposed amended database model which bears sufficient

similarity to the old that migration between the two should be feasible.

But it is down to 6 tables. And I have named the tables according to the

terms we have been discussing. Of course this is just the database model.

I’ve also put together an XML view of what some sample dataset might look

like. There is also a UML model required which would be richer than the

underlying datamodel, but one step at a time …

Walking through:

DataElements can have Dimensions. And different dataElements can (and

hopefully will) share some of the same Dimensions. So there is a m-to-n

relationship between the two necessitating an extra table

(DataElementDimensions). An example of a Dimension is SEX. Nothing new

here.

Dimensions have DimensionElements. So SEX for example might have

DimensionElements “Male”, “Female”, “Unknown”. A big difference from the

old model is that there is 1-n relationship between DimensionElements and

Dimensions. A Dimension has many DimensionElements. But a DimensionElement

is a a member of only one Dimension.

DataValues represent the values at intersection of these Dimensions.

Keeping with the spirit of the old model this intersection is represented by

a single key, DimensionElementCombination. The DimensionElementCombinations

would be populated when a new Dimension is added to a DataElement. Like the

original model there is some fragility here. Changing dimensions on

dataelements could create a situation where datavalues become orphaned or

misdirected. The API must have robust methods for defending this integrity

particulalrly when updating the structural metadata. But this is perhaps

doable. Either way its not worse than we have.

I haven’t given a name to DimensionElementCombinations. From the examples

I have seen from SL this seems to be unnecessary. The names I have seen

being used are generally simply contrived from the dimensions or (worse

still) from the categoryoptions. What is important is that dataelements can

have sets of dimensions.

And then much of what is different is just a renaming of the original

entities. From the attached XML file I think you can see some of the

issues faced re names and identifiers. I find myself following a sort of

convention of CODE, Name, Description and UUID. CODE’s must be unique

within the scope of the database. I suppose this is close to what we

currently call ShortName. I would like to place constraints on CODES in

terms of length and also the disallowing of spaces and other funny

characters. The reason being that we may well have to use these codes in

making up uri’s. So CODES must be unique. For the moment we could keep

name unique but should migrate from it. Its a matter of rewriting all our

comparators I guess. UUIDs I am told are unique through some sort of

divinity so we apparently do not need to worry about them

I’ve also tried to reduce the number of knees on the donkey - from 11

tables to 6. I believe this can be done whilst preserving the existing

functionality. This arangement would make it much more sensible to produce

the XML I need to produce. I’m hoping that it would also be more friendly

to those who would be trying to pivot the data across dimensions.

Jason do you think this works for you? I might have missed out something

really fundamental. Abyot, you’ve been through this process before - am I

missing something? From the DataValue you can see DimensionElements. And

once you know a DimensionElement you also know the Dimension to which it

belongs. I think thats queryable. Will have to hydrate with some data and

see.

Shaking the multidimensional model up like this would obviously have

implications. But I suspect most of it is taking stuff away rather than

adding new so it might just be doable. Less is more.

Not spending time with docbook yet, till I get some feedback.

Cheers

Bob

2009/9/29 Bob Jolliffe bobjolliffe@gmail.com

Hi

On the back of Jason and others comments, I’ve reached the conclusion

that we cannot really live with the MD model the way it is. Whereas I think

it is (just about) workable there are some serious optimizations we can and

should do. I am going to put my other work back a day or two and propose

some changes in a branch.

I think central to the inefficiency is the many-many relation between

categories and categoryoptions. This strikes me as illogical as well as

being cumbersome in the UI. Do we really want to be able to make categories

with options like {‘0<5’,‘6-10’,‘Male’,‘Out of stock’,‘35-40’}. Reducing

the relation between categories and category options to 1-n cuts two tables,

should make sql queries more efficient and grokkable and also matches other

models such as sdmx better.

The other possiible inefficiency is the dimensionset. It can be useful

in some contexts but I’m guessing that when querying the data (which we want

to be fast) it is not relevant. A dataelement can have dimensions. The

fact that some dataelements have the same combinations of dimensions is very

useful to know for some purposes, but it should be possible to get from the

dataelement to the dimension directly.

On the other side of the road is the hierarchical dimensionality idea I

see Ola and Jason have been discussing, where dimensions are composed

(perhaps post-facto) of uni-dimensional dataelements rather than decomposed

into pre-structured dimensional elements. I suspect that:

we need both; and

from the API, user and reporting perspective they should look the

same (ie a dataelement can have dimensions - how they come about should not

be a concern at the end point).

I’ll try out some of these ideas and point you to the branch.

Regards

Bob

2009/9/29 Lars Helge Øverland larshelge@gmail.com

Thanks for the explanations Jason. The multidimensional model is quite

complicated, is poorly documented, and as you say is DHIS-centric in the way

that it is built around the DHIS notion of a Data Element.

Could we assemble and put some of the text being written on the list to

docbook?

That said, and I think Jason already has made a strong case for this,

also in a 100% DHIS2 scenario you will need more flexibility in defining

dimensions to your data than what categories can provide. Being able to

define data dimensions independent of data collection is powerful and should

be supported in a better way than what data element groups provide today.

Given that we already have the orgunit group set code in place I would

assume that adding group sets to data elements could be a relatively

straight forward thing to do (but then again, I am not the programmer…).

I don’t see any implications in adding this to the system, it won’t

require changes to the existing model as the association goes from the

groupset to the groups. We can prioritize this for the 2.0.3 release.

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

bobj · 29 September 2009 21:02

I think Abyot raises some good points, especially his last one about

differenences of what the age dimension really is.

I think the biggest challenge is going to be how to unite the concepts

of a multidimensional data element (as it is currently implemented

with categories) and a data element that has no multidimensionality,

at least in the sense of it not being assigned any categories.

Isn’t this what we have in the current system? If you are not assigning any combination of categories for a dataelement (well of course for the sake of consistency - from programming logic point of view - implicitly a default category combination with one default category having one default option is assigned - it is like putting your value at zero on the dimensions axis) then the dataelement has no dimensionality.

I don’t really like the default category idea. The way I have currently proposed there is no default category. By default a dataelement has no dimensions. It doesn’t need a default dimension. And also by default the dimensionelementcombination in datavalue is NULL.

What about the following scenario. Could the cateogry/category combos

be transformed somehow into a sort of data element generator? Users

could define a dimensionality set, assign a master data element, and

DHIS would create all of the necessary data elements. So a category

combination of Patient Status (OPD, IPD, Deaths) and Age (Under 1

,Under 5 and Over 5) and template data element (Clinical malaria)

would produce :

OPD Under 1 Clinical Malaria {OPD, Under 1, Clinical Malaria}

OPD Under 5 Clinical Malaria {OPD, 1-5, Clinical Malaria}

OPD Over 5 Clinical Malaria …

OPD Clinical Malaria Total {OPD, All ages, Clinical Malaria}

…

…

…

IP Clinical Malaria Total {IP, All ages, Clinical Malaria}

…

…

…

Deaths Clinical Malaria Total {Deaths, All ages, Clinical malaria}

Clinical Malaria Total {All patient status, All ages, Clinical malaria}

Each one of those data elements would then be assigned a set of

dimensions, and a set of dimensional elements.

The cateogries functionality would simply be an artifact to produce

multiple data elements, without having to enter them seperately, which

if I understood Ola yesterday, was one of its intended purposes.

Now, for those of use such as myself, that do that have already create

dozens of data elements with different dimensions in their names (but

no where in a relational table) we could assign the dimensionality in

a seperate step (post-facto as Bob mentioned earlier). I might want to

assign a “uber” dimension of “Communicalble” and “Non-communicable” to

a disease type that might not have anything to do with the definition

of the data element itself, but would be simply for analysis purposes

later. Again, I may be rehashing my previous emails here, but from a

pure SQl standpoint, the approach I suggest here makes sense to me, in

terms of queries of how to pull this into a crosstab as well as how to

generate a fact table that something like an OLAP server could deal

with

This approach might seem to resolve the issue of how to deal with

these two different beasts, but unfolding the multidimensional data

element into simpler components. Meaning that the

cateorgy/combos/options would be used as a templating mechanisms, but

that dimensionality could be assigned through a separate set of

relations. Perhaps this is what is represented in the diagram, but I

will need to study it tomorrow after some sleep.

I do think that that dimenional elements should not be able to be

share by dimensions, and that dimensions and dimensional elements

should not be able to be deleted without lots of bells and whistles

going off once they have been assigned to data elements.

What is wrong with that as long as values are not associated with them? I think we will be falling back to the current implemention instead - like dimensional elements should not be deleted once values are assigned to their combinations.

I agree. I think we all will agree on this much.

I guess the key question is whether data elements should be able to

have multiple DimensionElementCombinations, which I think is the

current implementation. I am just not sure this will work with a

combination of DHIS2-type-multidimensional elements, and DHIS1.4-type

data elements.

Can anyone explain me how the DHIS2 multidimensional dataelement concept fails to handle the DHIS 1.4 dataelements - sorry may be I missed this from your earlier discussion? I think the way I see it - if the objective is on OLAP, pivoting/querying, then what we need is not to change the model - instead to develop more APIs which can pull data along a dimension, varying degree of overlappings across dimensions - or more generally aggregation of values over a flexible set of dimensionelementcombinations !

Again I am with you mostly on this. In fact that has been my suggestion all along - to push the functionality into the API. But having said that I think the current model is too double-jointed and complex. I have seen by trying to unpick the dimensions using xslt I need too many hash tables which is inefficient. This no doubt would also translate into too many SQL clauses. By trimming the requirement that dimensionelements are freely assignable the model becomes a good bit simpler. Beyond that it is mostly changing names.

Using the example above - {OPD, IPD}, {Male, Female},{Under 1, 1-5, Above 5} and malaria as base dataelement

What we have currently is an API to provide values for

Malaria(OPD,Male,Under 1)
Malaria(OPD,Male,1-5)

Malaria(OPD,Male,Above 5)
Malaria(OPD,Female,Under 1)
Malaria(OPD,Female,1-5)
Malaria(OPD,Female,Above 5)
…
…

And if I understood correctly … what is required is to have registred cases of

Malaria in the OPD,
Malaria in the IPD
Malaria for Males
Malaria for Females
…
…

Malaria In the OPD but only those Female
Malaria In the IPD but for male
…
…
…
we can list different combinations…

or finally ask … for the Malaria

Isn’t this a simple question of Aggregation? Does the multidimensional datamodel have a limitation to handle the above requirements - or am I talking a different stuff here?

No I believe it can probably be done - but yet it doesn’t seem to have been done. When I started looking at how I might do it I realized that it could also be simplified.

Regards

Bob

···

2009/9/29 Abyot Gizaw abyota@gmail.com

On Tue, Sep 29, 2009 at 9:16 PM, Jason Pickering jason.p.pickering@gmail.com wrote:

Enough for today.

Thanks for this Bob. It is a good start. Can’t you make this diagram

in DocBook so I can edit it?

Regards,

Jason

On Tue, Sep 29, 2009 at 8:01 PM, Abyot Gizaw abyodia@gmail.com wrote:

Yes your suggestion is doable and less is better … but I think the

requirement from the field is more complex.

If, for a moment, we stop talking about datavalues and talk about

dataelements - why are we talking about dimension combinations?

Because you are assuming a dataelement to have only one dimension. Am I

correct? If that is the case, I see a little bit of inconsistency here.

DataElement talks about one dimesion, but its corresponding value talks

about combination of dimensions.

Yes from the datavalue I can have dimensionelementcombinations, pick

dimensionelments regroup and put them in their corresponding dimesions – in

the end telling me from which dimension they came from. But from this point

onwards I am no more talking about a value of a single dataelement but a

value for combination of dataelements (because I have to pull different

dataelements which can give me the identified dimensions) … but is this

what we want?

The other point I would like the raise is - will there not be any limitation

on the flexibility of the system when putting the restriction "A Dimension

has many DimensionElements. But a DimensionElement is a member of only one

Dimension" ? Not only system flexibility problem, I see a logical problem as

well. Because if we think for example beyond the obvious

SEX(male,female,unknown) - I see a strong need for letting dimensionelements

to be member of multiple dimensions: For example take the other obvious

dimension - AGE. And assume <5 yrs, 5-10 yrs, and <5 yrs as its

dimesionelements. May be such scaling of the AGE dimension is approrpiate

for Malaria case, but for TB case people might be interested to break the

AGE dimension into <5yrs, 5-10yrs, 10-15yrs, >15yrs - so how are we going to

handle cases like this? Are we going to define a number of <5yrs or are we

going to use the same <5yr dimensionelement ?

Thank you

Abyot.

On Tue, Sep 29, 2009 at 4:45 PM, Bob Jolliffe bobjolliffe@gmail.com wrote:

OK. Here’s my first attempt to rationalize things. Please excuse the

attachments. I try not to send attachments to mailing lists but these are

at least fairly small. (And Lars I will write it up in docbook after

fishing for feedback).

My primary aim has been to disturb the existing model as little as

possible whilst trying to simplify wherever possible.

Attached oldmodel.png shows the participants in the existing model. As

you can see there are 11 tables in all. I haven’t showed the relations as

it becomes a bit of a web.

Also attached is a proposed amended database model which bears sufficient

similarity to the old that migration between the two should be feasible.

But it is down to 6 tables. And I have named the tables according to the

terms we have been discussing. Of course this is just the database model.

I’ve also put together an XML view of what some sample dataset might look

like. There is also a UML model required which would be richer than the

underlying datamodel, but one step at a time …

Walking through:

DataElements can have Dimensions. And different dataElements can (and

hopefully will) share some of the same Dimensions. So there is a m-to-n

relationship between the two necessitating an extra table

(DataElementDimensions). An example of a Dimension is SEX. Nothing new

here.

Dimensions have DimensionElements. So SEX for example might have

DimensionElements “Male”, “Female”, “Unknown”. A big difference from the

old model is that there is 1-n relationship between DimensionElements and

Dimensions. A Dimension has many DimensionElements. But a DimensionElement

is a a member of only one Dimension.

DataValues represent the values at intersection of these Dimensions.

Keeping with the spirit of the old model this intersection is represented by

a single key, DimensionElementCombination. The DimensionElementCombinations

would be populated when a new Dimension is added to a DataElement. Like the

original model there is some fragility here. Changing dimensions on

dataelements could create a situation where datavalues become orphaned or

misdirected. The API must have robust methods for defending this integrity

particulalrly when updating the structural metadata. But this is perhaps

doable. Either way its not worse than we have.

I haven’t given a name to DimensionElementCombinations. From the examples

I have seen from SL this seems to be unnecessary. The names I have seen

being used are generally simply contrived from the dimensions or (worse

still) from the categoryoptions. What is important is that dataelements can

have sets of dimensions.

And then much of what is different is just a renaming of the original

entities. From the attached XML file I think you can see some of the

issues faced re names and identifiers. I find myself following a sort of

convention of CODE, Name, Description and UUID. CODE’s must be unique

within the scope of the database. I suppose this is close to what we

currently call ShortName. I would like to place constraints on CODES in

terms of length and also the disallowing of spaces and other funny

characters. The reason being that we may well have to use these codes in

making up uri’s. So CODES must be unique. For the moment we could keep

name unique but should migrate from it. Its a matter of rewriting all our

comparators I guess. UUIDs I am told are unique through some sort of

divinity so we apparently do not need to worry about them

I’ve also tried to reduce the number of knees on the donkey - from 11

tables to 6. I believe this can be done whilst preserving the existing

functionality. This arangement would make it much more sensible to produce

the XML I need to produce. I’m hoping that it would also be more friendly

to those who would be trying to pivot the data across dimensions.

Jason do you think this works for you? I might have missed out something

really fundamental. Abyot, you’ve been through this process before - am I

missing something? From the DataValue you can see DimensionElements. And

once you know a DimensionElement you also know the Dimension to which it

belongs. I think thats queryable. Will have to hydrate with some data and

see.

Shaking the multidimensional model up like this would obviously have

implications. But I suspect most of it is taking stuff away rather than

adding new so it might just be doable. Less is more.

Not spending time with docbook yet, till I get some feedback.

Cheers

Bob

2009/9/29 Bob Jolliffe bobjolliffe@gmail.com

Hi

On the back of Jason and others comments, I’ve reached the conclusion

that we cannot really live with the MD model the way it is. Whereas I think

it is (just about) workable there are some serious optimizations we can and

should do. I am going to put my other work back a day or two and propose

some changes in a branch.

I think central to the inefficiency is the many-many relation between

categories and categoryoptions. This strikes me as illogical as well as

being cumbersome in the UI. Do we really want to be able to make categories

with options like {‘0<5’,‘6-10’,‘Male’,‘Out of stock’,‘35-40’}. Reducing

the relation between categories and category options to 1-n cuts two tables,

should make sql queries more efficient and grokkable and also matches other

models such as sdmx better.

The other possiible inefficiency is the dimensionset. It can be useful

in some contexts but I’m guessing that when querying the data (which we want

to be fast) it is not relevant. A dataelement can have dimensions. The

fact that some dataelements have the same combinations of dimensions is very

useful to know for some purposes, but it should be possible to get from the

dataelement to the dimension directly.

On the other side of the road is the hierarchical dimensionality idea I

see Ola and Jason have been discussing, where dimensions are composed

(perhaps post-facto) of uni-dimensional dataelements rather than decomposed

into pre-structured dimensional elements. I suspect that:

we need both; and

from the API, user and reporting perspective they should look the

same (ie a dataelement can have dimensions - how they come about should not

be a concern at the end point).

I’ll try out some of these ideas and point you to the branch.

Regards

Bob

2009/9/29 Lars Helge Øverland larshelge@gmail.com

Thanks for the explanations Jason. The multidimensional model is quite

complicated, is poorly documented, and as you say is DHIS-centric in the way

that it is built around the DHIS notion of a Data Element.

Could we assemble and put some of the text being written on the list to

docbook?

That said, and I think Jason already has made a strong case for this,

also in a 100% DHIS2 scenario you will need more flexibility in defining

dimensions to your data than what categories can provide. Being able to

define data dimensions independent of data collection is powerful and should

be supported in a better way than what data element groups provide today.

Given that we already have the orgunit group set code in place I would

assume that adding group sets to data elements could be a relatively

straight forward thing to do (but then again, I am not the programmer…).

I don’t see any implications in adding this to the system, it won’t

require changes to the existing model as the association goes from the

groupset to the groups. We can prioritize this for the 2.0.3 release.

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

bobj · 30 September 2009 11:23

OK. I’ve reached the conclusion that the model can and probably should be simplified, but it is really far too much work for what I have time for now. The categoryoptioncombo is already deeply ingrained in many parts of the system. So don’t hold your breath.

I’m going back to focus on my much simpler problem of exploding categorycombooptions into dimensions and vice versa.

For querying, I can see the API needs methods added to return datavalues by arbitrary collections of category rather than just fixed categoryoptioncombos. These only exist for the purpose of data collection. I suspect that this is what Ola needs to create more flexible reporttables. Then when configuring the reporttable you would freely select the dimensions you were interested in. This is of course do-able - I can see it - but my little brain is struggling with the complexity.

Looking at a two stage process it is a matter of getting the collection of categorycombooptionids which intersect with the given set of categories and then passing that collection to the existing API method which returns collections of datavalues which match particular categorycombooptionids.

In principle if we can expose the required methods in the API then it might be possible at some time in the future to revamp the underlying table structure without disturbing the API.

Two final thoughts:

if we are bound to the model whereby categoryoptions are free standing entitities (ie many to many relation with categories) then, for the purpose of import/export we are obliged to uniquely identify these as well. So I will have to reluctantly also put uuids on categoryoptions. After discussing with Abyot last night, I can see that there is some value in having them the way they are, but we will have to live with the complexity. What you gain on the swings you lose on the roundabouts.
Indicators are not multidimensional. Why is this? Was it a conscious decision resulting from earlier discussion or is it just that we haven’t got there yet?

Regards
Bob

···

2009/9/29 Bob Jolliffe bobjolliffe@gmail.com

2009/9/29 Abyot Gizaw abyota@gmail.com

On Tue, Sep 29, 2009 at 9:16 PM, Jason Pickering jason.p.pickering@gmail.com wrote:

I think Abyot raises some good points, especially his last one about

differenences of what the age dimension really is.

I think the biggest challenge is going to be how to unite the concepts

of a multidimensional data element (as it is currently implemented

with categories) and a data element that has no multidimensionality,

at least in the sense of it not being assigned any categories.

Isn’t this what we have in the current system? If you are not assigning any combination of categories for a dataelement (well of course for the sake of consistency - from programming logic point of view - implicitly a default category combination with one default category having one default option is assigned - it is like putting your value at zero on the dimensions axis) then the dataelement has no dimensionality.

I don’t really like the default category idea. The way I have currently proposed there is no default category. By default a dataelement has no dimensions. It doesn’t need a default dimension. And also by default the dimensionelementcombination in datavalue is NULL.

What about the following scenario. Could the cateogry/category combos

be transformed somehow into a sort of data element generator? Users

could define a dimensionality set, assign a master data element, and

DHIS would create all of the necessary data elements. So a category

combination of Patient Status (OPD, IPD, Deaths) and Age (Under 1

,Under 5 and Over 5) and template data element (Clinical malaria)

would produce :

OPD Under 1 Clinical Malaria {OPD, Under 1, Clinical Malaria}

OPD Under 5 Clinical Malaria {OPD, 1-5, Clinical Malaria}

OPD Over 5 Clinical Malaria …

OPD Clinical Malaria Total {OPD, All ages, Clinical Malaria}

…

…

…

IP Clinical Malaria Total {IP, All ages, Clinical Malaria}

…

…

…

Deaths Clinical Malaria Total {Deaths, All ages, Clinical malaria}

Clinical Malaria Total {All patient status, All ages, Clinical malaria}

Each one of those data elements would then be assigned a set of

dimensions, and a set of dimensional elements.

The cateogries functionality would simply be an artifact to produce

multiple data elements, without having to enter them seperately, which

if I understood Ola yesterday, was one of its intended purposes.

Now, for those of use such as myself, that do that have already create

dozens of data elements with different dimensions in their names (but

no where in a relational table) we could assign the dimensionality in

a seperate step (post-facto as Bob mentioned earlier). I might want to

assign a “uber” dimension of “Communicalble” and “Non-communicable” to

a disease type that might not have anything to do with the definition

of the data element itself, but would be simply for analysis purposes

later. Again, I may be rehashing my previous emails here, but from a

pure SQl standpoint, the approach I suggest here makes sense to me, in

terms of queries of how to pull this into a crosstab as well as how to

generate a fact table that something like an OLAP server could deal

with

This approach might seem to resolve the issue of how to deal with

these two different beasts, but unfolding the multidimensional data

element into simpler components. Meaning that the

cateorgy/combos/options would be used as a templating mechanisms, but

that dimensionality could be assigned through a separate set of

relations. Perhaps this is what is represented in the diagram, but I

will need to study it tomorrow after some sleep.

I do think that that dimenional elements should not be able to be

share by dimensions, and that dimensions and dimensional elements

should not be able to be deleted without lots of bells and whistles

going off once they have been assigned to data elements.

What is wrong with that as long as values are not associated with them? I think we will be falling back to the current implemention instead - like dimensional elements should not be deleted once values are assigned to their combinations.

I agree. I think we all will agree on this much.

I guess the key question is whether data elements should be able to

have multiple DimensionElementCombinations, which I think is the

current implementation. I am just not sure this will work with a

combination of DHIS2-type-multidimensional elements, and DHIS1.4-type

data elements.

Can anyone explain me how the DHIS2 multidimensional dataelement concept fails to handle the DHIS 1.4 dataelements - sorry may be I missed this from your earlier discussion? I think the way I see it - if the objective is on OLAP, pivoting/querying, then what we need is not to change the model - instead to develop more APIs which can pull data along a dimension, varying degree of overlappings across dimensions - or more generally aggregation of values over a flexible set of dimensionelementcombinations !

Again I am with you mostly on this. In fact that has been my suggestion all along - to push the functionality into the API. But having said that I think the current model is too double-jointed and complex. I have seen by trying to unpick the dimensions using xslt I need too many hash tables which is inefficient. This no doubt would also translate into too many SQL clauses. By trimming the requirement that dimensionelements are freely assignable the model becomes a good bit simpler. Beyond that it is mostly changing names.

Using the example above - {OPD, IPD}, {Male, Female},{Under 1, 1-5, Above 5} and malaria as base dataelement

What we have currently is an API to provide values for

Malaria(OPD,Male,Under 1)
Malaria(OPD,Male,1-5)

Malaria(OPD,Male,Above 5)
Malaria(OPD,Female,Under 1)
Malaria(OPD,Female,1-5)
Malaria(OPD,Female,Above 5)
…
…

And if I understood correctly … what is required is to have registred cases of

Malaria in the OPD,
Malaria in the IPD
Malaria for Males
Malaria for Females
…
…

Malaria In the OPD but only those Female
Malaria In the IPD but for male
…
…
…
we can list different combinations…

or finally ask … for the Malaria

Isn’t this a simple question of Aggregation? Does the multidimensional datamodel have a limitation to handle the above requirements - or am I talking a different stuff here?

No I believe it can probably be done - but yet it doesn’t seem to have been done. When I started looking at how I might do it I realized that it could also be simplified.

Regards

Bob

Enough for today.

Thanks for this Bob. It is a good start. Can’t you make this diagram

in DocBook so I can edit it?

Regards,

Jason

On Tue, Sep 29, 2009 at 8:01 PM, Abyot Gizaw abyodia@gmail.com wrote:

Yes your suggestion is doable and less is better … but I think the

requirement from the field is more complex.

If, for a moment, we stop talking about datavalues and talk about

dataelements - why are we talking about dimension combinations?

Because you are assuming a dataelement to have only one dimension. Am I

correct? If that is the case, I see a little bit of inconsistency here.

DataElement talks about one dimesion, but its corresponding value talks

about combination of dimensions.

Yes from the datavalue I can have dimensionelementcombinations, pick

dimensionelments regroup and put them in their corresponding dimesions – in

the end telling me from which dimension they came from. But from this point

onwards I am no more talking about a value of a single dataelement but a

value for combination of dataelements (because I have to pull different

dataelements which can give me the identified dimensions) … but is this

what we want?

The other point I would like the raise is - will there not be any limitation

on the flexibility of the system when putting the restriction "A Dimension

has many DimensionElements. But a DimensionElement is a member of only one

Dimension" ? Not only system flexibility problem, I see a logical problem as

well. Because if we think for example beyond the obvious

SEX(male,female,unknown) - I see a strong need for letting dimensionelements

to be member of multiple dimensions: For example take the other obvious

dimension - AGE. And assume <5 yrs, 5-10 yrs, and <5 yrs as its

dimesionelements. May be such scaling of the AGE dimension is approrpiate

for Malaria case, but for TB case people might be interested to break the

AGE dimension into <5yrs, 5-10yrs, 10-15yrs, >15yrs - so how are we going to

handle cases like this? Are we going to define a number of <5yrs or are we

going to use the same <5yr dimensionelement ?

Thank you

Abyot.

On Tue, Sep 29, 2009 at 4:45 PM, Bob Jolliffe bobjolliffe@gmail.com wrote:

OK. Here’s my first attempt to rationalize things. Please excuse the

attachments. I try not to send attachments to mailing lists but these are

at least fairly small. (And Lars I will write it up in docbook after

fishing for feedback).

My primary aim has been to disturb the existing model as little as

possible whilst trying to simplify wherever possible.

Attached oldmodel.png shows the participants in the existing model. As

you can see there are 11 tables in all. I haven’t showed the relations as

it becomes a bit of a web.

Also attached is a proposed amended database model which bears sufficient

similarity to the old that migration between the two should be feasible.

But it is down to 6 tables. And I have named the tables according to the

terms we have been discussing. Of course this is just the database model.

I’ve also put together an XML view of what some sample dataset might look

like. There is also a UML model required which would be richer than the

underlying datamodel, but one step at a time …

Walking through:

DataElements can have Dimensions. And different dataElements can (and

hopefully will) share some of the same Dimensions. So there is a m-to-n

relationship between the two necessitating an extra table

(DataElementDimensions). An example of a Dimension is SEX. Nothing new

here.

Dimensions have DimensionElements. So SEX for example might have

DimensionElements “Male”, “Female”, “Unknown”. A big difference from the

old model is that there is 1-n relationship between DimensionElements and

Dimensions. A Dimension has many DimensionElements. But a DimensionElement

is a a member of only one Dimension.

DataValues represent the values at intersection of these Dimensions.

Keeping with the spirit of the old model this intersection is represented by

a single key, DimensionElementCombination. The DimensionElementCombinations

would be populated when a new Dimension is added to a DataElement. Like the

original model there is some fragility here. Changing dimensions on

dataelements could create a situation where datavalues become orphaned or

misdirected. The API must have robust methods for defending this integrity

particulalrly when updating the structural metadata. But this is perhaps

doable. Either way its not worse than we have.

I haven’t given a name to DimensionElementCombinations. From the examples

I have seen from SL this seems to be unnecessary. The names I have seen

being used are generally simply contrived from the dimensions or (worse

still) from the categoryoptions. What is important is that dataelements can

have sets of dimensions.

And then much of what is different is just a renaming of the original

entities. From the attached XML file I think you can see some of the

issues faced re names and identifiers. I find myself following a sort of

convention of CODE, Name, Description and UUID. CODE’s must be unique

within the scope of the database. I suppose this is close to what we

currently call ShortName. I would like to place constraints on CODES in

terms of length and also the disallowing of spaces and other funny

characters. The reason being that we may well have to use these codes in

making up uri’s. So CODES must be unique. For the moment we could keep

name unique but should migrate from it. Its a matter of rewriting all our

comparators I guess. UUIDs I am told are unique through some sort of

divinity so we apparently do not need to worry about them

I’ve also tried to reduce the number of knees on the donkey - from 11

tables to 6. I believe this can be done whilst preserving the existing

functionality. This arangement would make it much more sensible to produce

the XML I need to produce. I’m hoping that it would also be more friendly

to those who would be trying to pivot the data across dimensions.

Jason do you think this works for you? I might have missed out something

really fundamental. Abyot, you’ve been through this process before - am I

missing something? From the DataValue you can see DimensionElements. And

once you know a DimensionElement you also know the Dimension to which it

belongs. I think thats queryable. Will have to hydrate with some data and

see.

Shaking the multidimensional model up like this would obviously have

implications. But I suspect most of it is taking stuff away rather than

adding new so it might just be doable. Less is more.

Not spending time with docbook yet, till I get some feedback.

Cheers

Bob

2009/9/29 Bob Jolliffe bobjolliffe@gmail.com

Hi

On the back of Jason and others comments, I’ve reached the conclusion

that we cannot really live with the MD model the way it is. Whereas I think

it is (just about) workable there are some serious optimizations we can and

should do. I am going to put my other work back a day or two and propose

some changes in a branch.

I think central to the inefficiency is the many-many relation between

categories and categoryoptions. This strikes me as illogical as well as

being cumbersome in the UI. Do we really want to be able to make categories

with options like {‘0<5’,‘6-10’,‘Male’,‘Out of stock’,‘35-40’}. Reducing

the relation between categories and category options to 1-n cuts two tables,

should make sql queries more efficient and grokkable and also matches other

models such as sdmx better.

The other possiible inefficiency is the dimensionset. It can be useful

in some contexts but I’m guessing that when querying the data (which we want

to be fast) it is not relevant. A dataelement can have dimensions. The

fact that some dataelements have the same combinations of dimensions is very

useful to know for some purposes, but it should be possible to get from the

dataelement to the dimension directly.

On the other side of the road is the hierarchical dimensionality idea I

see Ola and Jason have been discussing, where dimensions are composed

(perhaps post-facto) of uni-dimensional dataelements rather than decomposed

into pre-structured dimensional elements. I suspect that:

we need both; and

from the API, user and reporting perspective they should look the

same (ie a dataelement can have dimensions - how they come about should not

be a concern at the end point).

I’ll try out some of these ideas and point you to the branch.

Regards

Bob

2009/9/29 Lars Helge Øverland larshelge@gmail.com

Thanks for the explanations Jason. The multidimensional model is quite

complicated, is poorly documented, and as you say is DHIS-centric in the way

that it is built around the DHIS notion of a Data Element.

Could we assemble and put some of the text being written on the list to

docbook?

That said, and I think Jason already has made a strong case for this,

also in a 100% DHIS2 scenario you will need more flexibility in defining

dimensions to your data than what categories can provide. Being able to

define data dimensions independent of data collection is powerful and should

be supported in a better way than what data element groups provide today.

Given that we already have the orgunit group set code in place I would

assume that adding group sets to data elements could be a relatively

straight forward thing to do (but then again, I am not the programmer…).

I don’t see any implications in adding this to the system, it won’t

require changes to the existing model as the association goes from the

groupset to the groups. We can prioritize this for the 2.0.3 release.

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

olatitle · 30 September 2009 12:27

OK. I’ve reached the conclusion that the model can and probably should be simplified, but it is really far too much work for what I have time for now. The categoryoptioncombo is already deeply ingrained in many parts of the system. So don’t hold your breath.

I’m going back to focus on my much simpler problem of exploding categorycombooptions into dimensions and vice versa.

For querying, I can see the API needs methods added to return datavalues by arbitrary collections of category rather than just fixed categoryoptioncombos. These only exist for the purpose of data collection. I suspect that this is what Ola needs to create more flexible reporttables. Then when configuring the reporttable you would freely select the dimensions you were interested in. This is of course do-able - I can see it - but my little brain is struggling with the complexity.

Looking at a two stage process it is a matter of getting the collection of categorycombooptionids which intersect with the given set of categories and then passing that collection to the existing API method which returns collections of datavalues which match particular categorycombooptionids.

In principle if we can expose the required methods in the API then it might be possible at some time in the future to revamp the underlying table structure without disturbing the API.

Two final thoughts:

if we are bound to the model whereby categoryoptions are free standing entitities (ie many to many relation with categories) then, for the purpose of import/export we are obliged to uniquely identify these as well. So I will have to reluctantly also put uuids on categoryoptions. After discussing with Abyot last night, I can see that there is some value in having them the way they are, but we will have to live with the complexity. What you gain on the swings you lose on the roundabouts.

OK. I still don’t get why we need this flexibility though. When using the data values you would only query for data element + categories/dimensions anyway right, and <5 means <5 whether it is part of AGE1, AGE2 or AGE 3. Or?

Indicators are not multidimensional. Why is this? Was it a conscious decision resulting from earlier discussion or is it just that we haven’t got there yet?

Data analysis could benefit from having multidimensional indicators, but then since this is strictly for output and never input I would suggest using the post-method of assigning indicator group sets and groups (or whatever you end up calling it in the UI). What makes indicators interesting and complex in this context is that the numerator/denominator formulas should be able to contain slices of the multidimensional data element, e.g. “Malaria” + “all ages”, “male”, and not only the flat data element (data element + 1 categoryoptioncombo, “Malaria”+ “<5”, “male”) like it is today.

···

2009/9/30 Bob Jolliffe bobjolliffe@gmail.com

Regards
Bob

2009/9/29 Bob Jolliffe bobjolliffe@gmail.com

2009/9/29 Abyot Gizaw abyota@gmail.com

On Tue, Sep 29, 2009 at 9:16 PM, Jason Pickering jason.p.pickering@gmail.com wrote:

I think Abyot raises some good points, especially his last one about

differenences of what the age dimension really is.

I think the biggest challenge is going to be how to unite the concepts

of a multidimensional data element (as it is currently implemented

with categories) and a data element that has no multidimensionality,

at least in the sense of it not being assigned any categories.

Isn’t this what we have in the current system? If you are not assigning any combination of categories for a dataelement (well of course for the sake of consistency - from programming logic point of view - implicitly a default category combination with one default category having one default option is assigned - it is like putting your value at zero on the dimensions axis) then the dataelement has no dimensionality.

I don’t really like the default category idea. The way I have currently proposed there is no default category. By default a dataelement has no dimensions. It doesn’t need a default dimension. And also by default the dimensionelementcombination in datavalue is NULL.

What about the following scenario. Could the cateogry/category combos

be transformed somehow into a sort of data element generator? Users

could define a dimensionality set, assign a master data element, and

DHIS would create all of the necessary data elements. So a category

combination of Patient Status (OPD, IPD, Deaths) and Age (Under 1

,Under 5 and Over 5) and template data element (Clinical malaria)

would produce :

OPD Under 1 Clinical Malaria {OPD, Under 1, Clinical Malaria}

OPD Under 5 Clinical Malaria {OPD, 1-5, Clinical Malaria}

OPD Over 5 Clinical Malaria …

OPD Clinical Malaria Total {OPD, All ages, Clinical Malaria}

…

…

…

IP Clinical Malaria Total {IP, All ages, Clinical Malaria}

…

…

…

Deaths Clinical Malaria Total {Deaths, All ages, Clinical malaria}

Clinical Malaria Total {All patient status, All ages, Clinical malaria}

Each one of those data elements would then be assigned a set of

dimensions, and a set of dimensional elements.

The cateogries functionality would simply be an artifact to produce

multiple data elements, without having to enter them seperately, which

if I understood Ola yesterday, was one of its intended purposes.

Now, for those of use such as myself, that do that have already create

dozens of data elements with different dimensions in their names (but

no where in a relational table) we could assign the dimensionality in

a seperate step (post-facto as Bob mentioned earlier). I might want to

assign a “uber” dimension of “Communicalble” and “Non-communicable” to

a disease type that might not have anything to do with the definition

of the data element itself, but would be simply for analysis purposes

later. Again, I may be rehashing my previous emails here, but from a

pure SQl standpoint, the approach I suggest here makes sense to me, in

terms of queries of how to pull this into a crosstab as well as how to

generate a fact table that something like an OLAP server could deal

with

This approach might seem to resolve the issue of how to deal with

these two different beasts, but unfolding the multidimensional data

element into simpler components. Meaning that the

cateorgy/combos/options would be used as a templating mechanisms, but

that dimensionality could be assigned through a separate set of

relations. Perhaps this is what is represented in the diagram, but I

will need to study it tomorrow after some sleep.

I do think that that dimenional elements should not be able to be

share by dimensions, and that dimensions and dimensional elements

should not be able to be deleted without lots of bells and whistles

going off once they have been assigned to data elements.

What is wrong with that as long as values are not associated with them? I think we will be falling back to the current implemention instead - like dimensional elements should not be deleted once values are assigned to their combinations.

I agree. I think we all will agree on this much.

I guess the key question is whether data elements should be able to

have multiple DimensionElementCombinations, which I think is the

current implementation. I am just not sure this will work with a

combination of DHIS2-type-multidimensional elements, and DHIS1.4-type

data elements.

Can anyone explain me how the DHIS2 multidimensional dataelement concept fails to handle the DHIS 1.4 dataelements - sorry may be I missed this from your earlier discussion? I think the way I see it - if the objective is on OLAP, pivoting/querying, then what we need is not to change the model - instead to develop more APIs which can pull data along a dimension, varying degree of overlappings across dimensions - or more generally aggregation of values over a flexible set of dimensionelementcombinations !

Again I am with you mostly on this. In fact that has been my suggestion all along - to push the functionality into the API. But having said that I think the current model is too double-jointed and complex. I have seen by trying to unpick the dimensions using xslt I need too many hash tables which is inefficient. This no doubt would also translate into too many SQL clauses. By trimming the requirement that dimensionelements are freely assignable the model becomes a good bit simpler. Beyond that it is mostly changing names.

Using the example above - {OPD, IPD}, {Male, Female},{Under 1, 1-5, Above 5} and malaria as base dataelement

What we have currently is an API to provide values for

Malaria(OPD,Male,Under 1)
Malaria(OPD,Male,1-5)

Malaria(OPD,Male,Above 5)
Malaria(OPD,Female,Under 1)
Malaria(OPD,Female,1-5)
Malaria(OPD,Female,Above 5)
…
…

And if I understood correctly … what is required is to have registred cases of

Malaria in the OPD,
Malaria in the IPD
Malaria for Males
Malaria for Females
…
…

Malaria In the OPD but only those Female
Malaria In the IPD but for male
…
…
…
we can list different combinations…

or finally ask … for the Malaria

Isn’t this a simple question of Aggregation? Does the multidimensional datamodel have a limitation to handle the above requirements - or am I talking a different stuff here?

No I believe it can probably be done - but yet it doesn’t seem to have been done. When I started looking at how I might do it I realized that it could also be simplified.

Regards

Bob

Enough for today.

Thanks for this Bob. It is a good start. Can’t you make this diagram

in DocBook so I can edit it?

Regards,

Jason

On Tue, Sep 29, 2009 at 8:01 PM, Abyot Gizaw abyodia@gmail.com wrote:

Yes your suggestion is doable and less is better … but I think the

requirement from the field is more complex.

If, for a moment, we stop talking about datavalues and talk about

dataelements - why are we talking about dimension combinations?

Because you are assuming a dataelement to have only one dimension. Am I

correct? If that is the case, I see a little bit of inconsistency here.

DataElement talks about one dimesion, but its corresponding value talks

about combination of dimensions.

Yes from the datavalue I can have dimensionelementcombinations, pick

dimensionelments regroup and put them in their corresponding dimesions – in

the end telling me from which dimension they came from. But from this point

onwards I am no more talking about a value of a single dataelement but a

value for combination of dataelements (because I have to pull different

dataelements which can give me the identified dimensions) … but is this

what we want?

The other point I would like the raise is - will there not be any limitation

on the flexibility of the system when putting the restriction "A Dimension

has many DimensionElements. But a DimensionElement is a member of only one

Dimension" ? Not only system flexibility problem, I see a logical problem as

well. Because if we think for example beyond the obvious

SEX(male,female,unknown) - I see a strong need for letting dimensionelements

to be member of multiple dimensions: For example take the other obvious

dimension - AGE. And assume <5 yrs, 5-10 yrs, and <5 yrs as its

dimesionelements. May be such scaling of the AGE dimension is approrpiate

for Malaria case, but for TB case people might be interested to break the

AGE dimension into <5yrs, 5-10yrs, 10-15yrs, >15yrs - so how are we going to

handle cases like this? Are we going to define a number of <5yrs or are we

going to use the same <5yr dimensionelement ?

Thank you

Abyot.

On Tue, Sep 29, 2009 at 4:45 PM, Bob Jolliffe bobjolliffe@gmail.com wrote:

OK. Here’s my first attempt to rationalize things. Please excuse the

attachments. I try not to send attachments to mailing lists but these are

at least fairly small. (And Lars I will write it up in docbook after

fishing for feedback).

My primary aim has been to disturb the existing model as little as

possible whilst trying to simplify wherever possible.

Attached oldmodel.png shows the participants in the existing model. As

you can see there are 11 tables in all. I haven’t showed the relations as

it becomes a bit of a web.

Also attached is a proposed amended database model which bears sufficient

similarity to the old that migration between the two should be feasible.

But it is down to 6 tables. And I have named the tables according to the

terms we have been discussing. Of course this is just the database model.

I’ve also put together an XML view of what some sample dataset might look

like. There is also a UML model required which would be richer than the

underlying datamodel, but one step at a time …

Walking through:

DataElements can have Dimensions. And different dataElements can (and

hopefully will) share some of the same Dimensions. So there is a m-to-n

relationship between the two necessitating an extra table

(DataElementDimensions). An example of a Dimension is SEX. Nothing new

here.

Dimensions have DimensionElements. So SEX for example might have

DimensionElements “Male”, “Female”, “Unknown”. A big difference from the

old model is that there is 1-n relationship between DimensionElements and

Dimensions. A Dimension has many DimensionElements. But a DimensionElement

is a a member of only one Dimension.

DataValues represent the values at intersection of these Dimensions.

Keeping with the spirit of the old model this intersection is represented by

a single key, DimensionElementCombination. The DimensionElementCombinations

would be populated when a new Dimension is added to a DataElement. Like the

original model there is some fragility here. Changing dimensions on

dataelements could create a situation where datavalues become orphaned or

misdirected. The API must have robust methods for defending this integrity

particulalrly when updating the structural metadata. But this is perhaps

doable. Either way its not worse than we have.

I haven’t given a name to DimensionElementCombinations. >From the examples

I have seen from SL this seems to be unnecessary. The names I have seen

being used are generally simply contrived from the dimensions or (worse

still) from the categoryoptions. What is important is that dataelements can

have sets of dimensions.

And then much of what is different is just a renaming of the original

entities. From the attached XML file I think you can see some of the

issues faced re names and identifiers. I find myself following a sort of

convention of CODE, Name, Description and UUID. CODE’s must be unique

within the scope of the database. I suppose this is close to what we

currently call ShortName. I would like to place constraints on CODES in

terms of length and also the disallowing of spaces and other funny

characters. The reason being that we may well have to use these codes in

making up uri’s. So CODES must be unique. For the moment we could keep

name unique but should migrate from it. Its a matter of rewriting all our

comparators I guess. UUIDs I am told are unique through some sort of

divinity so we apparently do not need to worry about them

I’ve also tried to reduce the number of knees on the donkey - from 11

tables to 6. I believe this can be done whilst preserving the existing

functionality. This arangement would make it much more sensible to produce

the XML I need to produce. I’m hoping that it would also be more friendly

to those who would be trying to pivot the data across dimensions.

Jason do you think this works for you? I might have missed out something

really fundamental. Abyot, you’ve been through this process before - am I

missing something? From the DataValue you can see DimensionElements. And

once you know a DimensionElement you also know the Dimension to which it

belongs. I think thats queryable. Will have to hydrate with some data and

see.

Shaking the multidimensional model up like this would obviously have

implications. But I suspect most of it is taking stuff away rather than

adding new so it might just be doable. Less is more.

Not spending time with docbook yet, till I get some feedback.

Cheers

Bob

2009/9/29 Bob Jolliffe bobjolliffe@gmail.com

Hi

On the back of Jason and others comments, I’ve reached the conclusion

that we cannot really live with the MD model the way it is. Whereas I think

it is (just about) workable there are some serious optimizations we can and

should do. I am going to put my other work back a day or two and propose

some changes in a branch.

I think central to the inefficiency is the many-many relation between

categories and categoryoptions. This strikes me as illogical as well as

being cumbersome in the UI. Do we really want to be able to make categories

with options like {‘0<5’,‘6-10’,‘Male’,‘Out of stock’,‘35-40’}. Reducing

the relation between categories and category options to 1-n cuts two tables,

should make sql queries more efficient and grokkable and also matches other

models such as sdmx better.

The other possiible inefficiency is the dimensionset. It can be useful

in some contexts but I’m guessing that when querying the data (which we want

to be fast) it is not relevant. A dataelement can have dimensions. The

fact that some dataelements have the same combinations of dimensions is very

useful to know for some purposes, but it should be possible to get from the

dataelement to the dimension directly.

On the other side of the road is the hierarchical dimensionality idea I

see Ola and Jason have been discussing, where dimensions are composed

(perhaps post-facto) of uni-dimensional dataelements rather than decomposed

into pre-structured dimensional elements. I suspect that:

we need both; and

from the API, user and reporting perspective they should look the

same (ie a dataelement can have dimensions - how they come about should not

be a concern at the end point).

I’ll try out some of these ideas and point you to the branch.

Regards

Bob

2009/9/29 Lars Helge Øverland larshelge@gmail.com

Thanks for the explanations Jason. The multidimensional model is quite

complicated, is poorly documented, and as you say is DHIS-centric in the way

that it is built around the DHIS notion of a Data Element.

Could we assemble and put some of the text being written on the list to

docbook?

That said, and I think Jason already has made a strong case for this,

also in a 100% DHIS2 scenario you will need more flexibility in defining

dimensions to your data than what categories can provide. Being able to

define data dimensions independent of data collection is powerful and should

be supported in a better way than what data element groups provide today.

Given that we already have the orgunit group set code in place I would

assume that adding group sets to data elements could be a relatively

straight forward thing to do (but then again, I am not the programmer…).

I don’t see any implications in adding this to the system, it won’t

require changes to the existing model as the association goes from the

groupset to the groups. We can prioritize this for the 2.0.3 release.

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Knut_Staring · 30 September 2009 12:49

OK. I’ve reached the conclusion that the model can and probably should be simplified, but it is really far too much work for what I have time for now. The categoryoptioncombo is already deeply ingrained in many parts of the system. So don’t hold your breath.

I’m going back to focus on my much simpler problem of exploding categorycombooptions into dimensions and vice versa.

For querying, I can see the API needs methods added to return datavalues by arbitrary collections of category rather than just fixed categoryoptioncombos. These only exist for the purpose of data collection. I suspect that this is what Ola needs to create more flexible reporttables. Then when configuring the reporttable you would freely select the dimensions you were interested in. This is of course do-able - I can see it - but my little brain is struggling with the complexity.

Looking at a two stage process it is a matter of getting the collection of categorycombooptionids which intersect with the given set of categories and then passing that collection to the existing API method which returns collections of datavalues which match particular categorycombooptionids.

In principle if we can expose the required methods in the API then it might be possible at some time in the future to revamp the underlying table structure without disturbing the API.

Two final thoughts:

if we are bound to the model whereby categoryoptions are free standing entitities (ie many to many relation with categories) then, for the purpose of import/export we are obliged to uniquely identify these as well. So I will have to reluctantly also put uuids on categoryoptions. After discussing with Abyot last night, I can see that there is some value in having them the way they are, but we will have to live with the complexity. What you gain on the swings you lose on the roundabouts.

OK. I still don’t get why we need this flexibility though. When using the data values you would only query for data element + categories/dimensions anyway right, and <5 means <5 whether it is part of AGE1, AGE2 or AGE 3. Or?

I think it is quite important to simplify things - as Bob very pointedly highlighted, there is currently no multidimensionality for indicators, which I think renders it relatively irrelevant. So I strongly support a simplification (and renaming) which will allow us to relatively quickly have good support for 95% of the cases rather than cater for very esoteric needs that may be useful to a few people. Agile principles and all that.

Knut

···

On Wed, Sep 30, 2009 at 2:27 PM, Ola Hodne Titlestad olatitle@gmail.com wrote:

2009/9/30 Bob Jolliffe bobjolliffe@gmail.com

Indicators are not multidimensional. Why is this? Was it a conscious decision resulting from earlier discussion or is it just that we haven’t got there yet?

Data analysis could benefit from having multidimensional indicators, but then since this is strictly for output and never input I would suggest using the post-method of assigning indicator group sets and groups (or whatever you end up calling it in the UI). What makes indicators interesting and complex in this context is that the numerator/denominator formulas should be able to contain slices of the multidimensional data element, e.g. “Malaria” + “all ages”, “male”, and not only the flat data element (data element + 1 categoryoptioncombo, “Malaria”+ “<5”, “male”) like it is today.

Regards
Bob

2009/9/29 Bob Jolliffe bobjolliffe@gmail.com

2009/9/29 Abyot Gizaw abyota@gmail.com

On Tue, Sep 29, 2009 at 9:16 PM, Jason Pickering jason.p.pickering@gmail.com wrote:

I think Abyot raises some good points, especially his last one about

differenences of what the age dimension really is.

I think the biggest challenge is going to be how to unite the concepts

of a multidimensional data element (as it is currently implemented

with categories) and a data element that has no multidimensionality,

at least in the sense of it not being assigned any categories.

Isn’t this what we have in the current system? If you are not assigning any combination of categories for a dataelement (well of course for the sake of consistency - from programming logic point of view - implicitly a default category combination with one default category having one default option is assigned - it is like putting your value at zero on the dimensions axis) then the dataelement has no dimensionality.

I don’t really like the default category idea. The way I have currently proposed there is no default category. By default a dataelement has no dimensions. It doesn’t need a default dimension. And also by default the dimensionelementcombination in datavalue is NULL.

What about the following scenario. Could the cateogry/category combos

be transformed somehow into a sort of data element generator? Users

could define a dimensionality set, assign a master data element, and

DHIS would create all of the necessary data elements. So a category

combination of Patient Status (OPD, IPD, Deaths) and Age (Under 1

,Under 5 and Over 5) and template data element (Clinical malaria)

would produce :

OPD Under 1 Clinical Malaria {OPD, Under 1, Clinical Malaria}

OPD Under 5 Clinical Malaria {OPD, 1-5, Clinical Malaria}

OPD Over 5 Clinical Malaria …

OPD Clinical Malaria Total {OPD, All ages, Clinical Malaria}

…

…

…

IP Clinical Malaria Total {IP, All ages, Clinical Malaria}

…

…

…

Deaths Clinical Malaria Total {Deaths, All ages, Clinical malaria}

Clinical Malaria Total {All patient status, All ages, Clinical malaria}

Each one of those data elements would then be assigned a set of

dimensions, and a set of dimensional elements.

The cateogries functionality would simply be an artifact to produce

multiple data elements, without having to enter them seperately, which

if I understood Ola yesterday, was one of its intended purposes.

Now, for those of use such as myself, that do that have already create

dozens of data elements with different dimensions in their names (but

no where in a relational table) we could assign the dimensionality in

a seperate step (post-facto as Bob mentioned earlier). I might want to

assign a “uber” dimension of “Communicalble” and “Non-communicable” to

a disease type that might not have anything to do with the definition

of the data element itself, but would be simply for analysis purposes

later. Again, I may be rehashing my previous emails here, but from a

pure SQl standpoint, the approach I suggest here makes sense to me, in

terms of queries of how to pull this into a crosstab as well as how to

generate a fact table that something like an OLAP server could deal

with

This approach might seem to resolve the issue of how to deal with

these two different beasts, but unfolding the multidimensional data

element into simpler components. Meaning that the

cateorgy/combos/options would be used as a templating mechanisms, but

that dimensionality could be assigned through a separate set of

relations. Perhaps this is what is represented in the diagram, but I

will need to study it tomorrow after some sleep.

I do think that that dimenional elements should not be able to be

share by dimensions, and that dimensions and dimensional elements

should not be able to be deleted without lots of bells and whistles

going off once they have been assigned to data elements.

What is wrong with that as long as values are not associated with them? I think we will be falling back to the current implemention instead - like dimensional elements should not be deleted once values are assigned to their combinations.

I agree. I think we all will agree on this much.

I guess the key question is whether data elements should be able to

have multiple DimensionElementCombinations, which I think is the

current implementation. I am just not sure this will work with a

combination of DHIS2-type-multidimensional elements, and DHIS1.4-type

data elements.

Can anyone explain me how the DHIS2 multidimensional dataelement concept fails to handle the DHIS 1.4 dataelements - sorry may be I missed this from your earlier discussion? I think the way I see it - if the objective is on OLAP, pivoting/querying, then what we need is not to change the model - instead to develop more APIs which can pull data along a dimension, varying degree of overlappings across dimensions - or more generally aggregation of values over a flexible set of dimensionelementcombinations !

Again I am with you mostly on this. In fact that has been my suggestion all along - to push the functionality into the API. But having said that I think the current model is too double-jointed and complex. I have seen by trying to unpick the dimensions using xslt I need too many hash tables which is inefficient. This no doubt would also translate into too many SQL clauses. By trimming the requirement that dimensionelements are freely assignable the model becomes a good bit simpler. Beyond that it is mostly changing names.

Using the example above - {OPD, IPD}, {Male, Female},{Under 1, 1-5, Above 5} and malaria as base dataelement

What we have currently is an API to provide values for

Malaria(OPD,Male,Under 1)
Malaria(OPD,Male,1-5)

Malaria(OPD,Male,Above 5)
Malaria(OPD,Female,Under 1)
Malaria(OPD,Female,1-5)
Malaria(OPD,Female,Above 5)
…
…

And if I understood correctly … what is required is to have registred cases of

Malaria in the OPD,
Malaria in the IPD
Malaria for Males
Malaria for Females
…
…

Malaria In the OPD but only those Female
Malaria In the IPD but for male
…
…
…
we can list different combinations…

or finally ask … for the Malaria

Isn’t this a simple question of Aggregation? Does the multidimensional datamodel have a limitation to handle the above requirements - or am I talking a different stuff here?

No I believe it can probably be done - but yet it doesn’t seem to have been done. When I started looking at how I might do it I realized that it could also be simplified.

Regards

Bob

Enough for today.

Thanks for this Bob. It is a good start. Can’t you make this diagram

in DocBook so I can edit it?

Regards,

Jason

On Tue, Sep 29, 2009 at 8:01 PM, Abyot Gizaw abyodia@gmail.com wrote:

Yes your suggestion is doable and less is better … but I think the

requirement from the field is more complex.

If, for a moment, we stop talking about datavalues and talk about

dataelements - why are we talking about dimension combinations?

Because you are assuming a dataelement to have only one dimension. Am I

correct? If that is the case, I see a little bit of inconsistency here.

DataElement talks about one dimesion, but its corresponding value talks

about combination of dimensions.

Yes from the datavalue I can have dimensionelementcombinations, pick

dimensionelments regroup and put them in their corresponding dimesions – in

the end telling me from which dimension they came from. But from this point

onwards I am no more talking about a value of a single dataelement but a

value for combination of dataelements (because I have to pull different

dataelements which can give me the identified dimensions) … but is this

what we want?

The other point I would like the raise is - will there not be any limitation

on the flexibility of the system when putting the restriction "A Dimension

has many DimensionElements. But a DimensionElement is a member of only one

Dimension" ? Not only system flexibility problem, I see a logical problem as

well. Because if we think for example beyond the obvious

SEX(male,female,unknown) - I see a strong need for letting dimensionelements

to be member of multiple dimensions: For example take the other obvious

dimension - AGE. And assume <5 yrs, 5-10 yrs, and <5 yrs as its

dimesionelements. May be such scaling of the AGE dimension is approrpiate

for Malaria case, but for TB case people might be interested to break the

AGE dimension into <5yrs, 5-10yrs, 10-15yrs, >15yrs - so how are we going to

handle cases like this? Are we going to define a number of <5yrs or are we

going to use the same <5yr dimensionelement ?

Thank you

Abyot.

On Tue, Sep 29, 2009 at 4:45 PM, Bob Jolliffe bobjolliffe@gmail.com wrote:

OK. Here’s my first attempt to rationalize things. Please excuse the

attachments. I try not to send attachments to mailing lists but these are

at least fairly small. (And Lars I will write it up in docbook after

fishing for feedback).

My primary aim has been to disturb the existing model as little as

possible whilst trying to simplify wherever possible.

Attached oldmodel.png shows the participants in the existing model. As

you can see there are 11 tables in all. I haven’t showed the relations as

it becomes a bit of a web.

Also attached is a proposed amended database model which bears sufficient

similarity to the old that migration between the two should be feasible.

But it is down to 6 tables. And I have named the tables according to the

terms we have been discussing. Of course this is just the database model.

I’ve also put together an XML view of what some sample dataset might look

like. There is also a UML model required which would be richer than the

underlying datamodel, but one step at a time …

Walking through:

DataElements can have Dimensions. And different dataElements can (and

hopefully will) share some of the same Dimensions. So there is a m-to-n

relationship between the two necessitating an extra table

(DataElementDimensions). An example of a Dimension is SEX. Nothing new

here.

Dimensions have DimensionElements. So SEX for example might have

DimensionElements “Male”, “Female”, “Unknown”. A big difference from the

old model is that there is 1-n relationship between DimensionElements and

Dimensions. A Dimension has many DimensionElements. But a DimensionElement

is a a member of only one Dimension.

DataValues represent the values at intersection of these Dimensions.

Keeping with the spirit of the old model this intersection is represented by

a single key, DimensionElementCombination. The DimensionElementCombinations

would be populated when a new Dimension is added to a DataElement. Like the

original model there is some fragility here. Changing dimensions on

dataelements could create a situation where datavalues become orphaned or

misdirected. The API must have robust methods for defending this integrity

particulalrly when updating the structural metadata. But this is perhaps

doable. Either way its not worse than we have.

I haven’t given a name to DimensionElementCombinations. >From the examples

I have seen from SL this seems to be unnecessary. The names I have seen

being used are generally simply contrived from the dimensions or (worse

still) from the categoryoptions. What is important is that dataelements can

have sets of dimensions.

And then much of what is different is just a renaming of the original

entities. From the attached XML file I think you can see some of the

issues faced re names and identifiers. I find myself following a sort of

convention of CODE, Name, Description and UUID. CODE’s must be unique

within the scope of the database. I suppose this is close to what we

currently call ShortName. I would like to place constraints on CODES in

terms of length and also the disallowing of spaces and other funny

characters. The reason being that we may well have to use these codes in

making up uri’s. So CODES must be unique. For the moment we could keep

name unique but should migrate from it. Its a matter of rewriting all our

comparators I guess. UUIDs I am told are unique through some sort of

divinity so we apparently do not need to worry about them

I’ve also tried to reduce the number of knees on the donkey - from 11

tables to 6. I believe this can be done whilst preserving the existing

functionality. This arangement would make it much more sensible to produce

the XML I need to produce. I’m hoping that it would also be more friendly

to those who would be trying to pivot the data across dimensions.

Jason do you think this works for you? I might have missed out something

really fundamental. Abyot, you’ve been through this process before - am I

missing something? From the DataValue you can see DimensionElements. And

once you know a DimensionElement you also know the Dimension to which it

belongs. I think thats queryable. Will have to hydrate with some data and

see.

Shaking the multidimensional model up like this would obviously have

implications. But I suspect most of it is taking stuff away rather than

adding new so it might just be doable. Less is more.

Not spending time with docbook yet, till I get some feedback.

Cheers

Bob

2009/9/29 Bob Jolliffe bobjolliffe@gmail.com

Hi

On the back of Jason and others comments, I’ve reached the conclusion

that we cannot really live with the MD model the way it is. Whereas I think

it is (just about) workable there are some serious optimizations we can and

should do. I am going to put my other work back a day or two and propose

some changes in a branch.

I think central to the inefficiency is the many-many relation between

categories and categoryoptions. This strikes me as illogical as well as

being cumbersome in the UI. Do we really want to be able to make categories

with options like {‘0<5’,‘6-10’,‘Male’,‘Out of stock’,‘35-40’}. Reducing

the relation between categories and category options to 1-n cuts two tables,

should make sql queries more efficient and grokkable and also matches other

models such as sdmx better.

The other possiible inefficiency is the dimensionset. It can be useful

in some contexts but I’m guessing that when querying the data (which we want

to be fast) it is not relevant. A dataelement can have dimensions. The

fact that some dataelements have the same combinations of dimensions is very

useful to know for some purposes, but it should be possible to get from the

dataelement to the dimension directly.

On the other side of the road is the hierarchical dimensionality idea I

see Ola and Jason have been discussing, where dimensions are composed

(perhaps post-facto) of uni-dimensional dataelements rather than decomposed

into pre-structured dimensional elements. I suspect that:

we need both; and

from the API, user and reporting perspective they should look the

same (ie a dataelement can have dimensions - how they come about should not

be a concern at the end point).

I’ll try out some of these ideas and point you to the branch.

Regards

Bob

2009/9/29 Lars Helge Øverland larshelge@gmail.com

Thanks for the explanations Jason. The multidimensional model is quite

complicated, is poorly documented, and as you say is DHIS-centric in the way

that it is built around the DHIS notion of a Data Element.

Could we assemble and put some of the text being written on the list to

docbook?

That said, and I think Jason already has made a strong case for this,

also in a 100% DHIS2 scenario you will need more flexibility in defining

dimensions to your data than what categories can provide. Being able to

define data dimensions independent of data collection is powerful and should

be supported in a better way than what data element groups provide today.

Given that we already have the orgunit group set code in place I would

assume that adding group sets to data elements could be a relatively

straight forward thing to do (but then again, I am not the programmer…).

I don’t see any implications in adding this to the system, it won’t

require changes to the existing model as the association goes from the

groupset to the groups. We can prioritize this for the 2.0.3 release.

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

–
Cheers,
Knut Staring

bobj · 30 September 2009 13:14

OK. I’ve reached the conclusion that the model can and probably should be simplified, but it is really far too much work for what I have time for now. The categoryoptioncombo is already deeply ingrained in many parts of the system. So don’t hold your breath.

I’m going back to focus on my much simpler problem of exploding categorycombooptions into dimensions and vice versa.

For querying, I can see the API needs methods added to return datavalues by arbitrary collections of category rather than just fixed categoryoptioncombos. These only exist for the purpose of data collection. I suspect that this is what Ola needs to create more flexible reporttables. Then when configuring the reporttable you would freely select the dimensions you were interested in. This is of course do-able - I can see it - but my little brain is struggling with the complexity.

Looking at a two stage process it is a matter of getting the collection of categorycombooptionids which intersect with the given set of categories and then passing that collection to the existing API method which returns collections of datavalues which match particular categorycombooptionids.

In principle if we can expose the required methods in the API then it might be possible at some time in the future to revamp the underlying table structure without disturbing the API.

Two final thoughts:

if we are bound to the model whereby categoryoptions are free standing entitities (ie many to many relation with categories) then, for the purpose of import/export we are obliged to uniquely identify these as well. So I will have to reluctantly also put uuids on categoryoptions. After discussing with Abyot last night, I can see that there is some value in having them the way they are, but we will have to live with the complexity. What you gain on the swings you lose on the roundabouts.

OK. I still don’t get why we need this flexibility though. When using the data values you would only query for data element + categories/dimensions anyway right, and <5 means <5 whether it is part of AGE1, AGE2 or AGE 3. Or?

I guess the problem is that “<5” is just a label. Using OpenMRS-speak you could say there is no “concept” attached to the label. So in another category there could be a label “lessThan5”. By allowing options to be shared between categories, Abyot is hoping you will just use “<5” in all cases. Of course there is nothing forcing you to do this. Just as there is nothing stopping you having a category of “<5”, “Oranges” and “Apples”. So combined with the flexibility to do something possibly useful you also have the flexibility to do quite silly things.

There is a strong sense in which Age is quite a special and common case (like Period). For example, if you had one category with {<5, 5-10, >10} and another with {0-10, >10} then you should really be able to aggregate all the 0-10’s.

Perhaps the category Age (or any categories implementing the Age concept) requires some special status where there are formal requirements on the naming of categoryoptions within it. Don’t know - you are more familiar with the use cases.

Indicators are not multidimensional. Why is this? Was it a conscious decision resulting from earlier discussion or is it just that we haven’t got there yet?

Data analysis could benefit from having multidimensional indicators, but then since this is strictly for output and never input I would suggest using the post-method of assigning indicator group sets and groups (or whatever you end up calling it in the UI). What makes indicators interesting and complex in this context is that the numerator/denominator formulas should be able to contain slices of the multidimensional data element, e.g. “Malaria” + “all ages”, “male”, and not only the flat data element (data element + 1 categoryoptioncombo, “Malaria”+ “<5”, “male”) like it is today.

This distinction between input and output is strange. Having input-only dimensions is like a sort of statistical masturbation Lot of effort with no end result.

Yet looking at SDMX, it is clear that the protocol is much more suited for indicators than it is for dataelements. In fact using it to shunt dataelements around between systems is a bit of a perversion. But my sense is that WHO would like DHIS in national offices to produce SDMX formatted indicator reports for them. Is that your sense too? And should we care? If so there is some expectation that indicators should have dimensions. And including the slices you refer to above. In fact if we were ever to try and import the metadata from the famous WHO indicator repository that is exactly what we will see. Not sure how we might handle it without a md model. I suppose we will create flat indicators with the dimensions encoded in the name and then set about grouping the buggers :-(.

I haven’t really looked much at the indicator end of the beast. Been focussed more on getting datavalues from openmrs.

Regards
Bob

···

2009/9/30 Ola Hodne Titlestad olatitle@gmail.com

2009/9/30 Bob Jolliffe bobjolliffe@gmail.com

Regards
Bob

2009/9/29 Bob Jolliffe bobjolliffe@gmail.com

2009/9/29 Abyot Gizaw abyota@gmail.com

On Tue, Sep 29, 2009 at 9:16 PM, Jason Pickering jason.p.pickering@gmail.com wrote:

I think Abyot raises some good points, especially his last one about

differenences of what the age dimension really is.

I think the biggest challenge is going to be how to unite the concepts

of a multidimensional data element (as it is currently implemented

with categories) and a data element that has no multidimensionality,

at least in the sense of it not being assigned any categories.

Isn’t this what we have in the current system? If you are not assigning any combination of categories for a dataelement (well of course for the sake of consistency - from programming logic point of view - implicitly a default category combination with one default category having one default option is assigned - it is like putting your value at zero on the dimensions axis) then the dataelement has no dimensionality.

I don’t really like the default category idea. The way I have currently proposed there is no default category. By default a dataelement has no dimensions. It doesn’t need a default dimension. And also by default the dimensionelementcombination in datavalue is NULL.

What about the following scenario. Could the cateogry/category combos

be transformed somehow into a sort of data element generator? Users

could define a dimensionality set, assign a master data element, and

DHIS would create all of the necessary data elements. So a category

combination of Patient Status (OPD, IPD, Deaths) and Age (Under 1

,Under 5 and Over 5) and template data element (Clinical malaria)

would produce :

OPD Under 1 Clinical Malaria {OPD, Under 1, Clinical Malaria}

OPD Under 5 Clinical Malaria {OPD, 1-5, Clinical Malaria}

OPD Over 5 Clinical Malaria …

OPD Clinical Malaria Total {OPD, All ages, Clinical Malaria}

…

…

…

IP Clinical Malaria Total {IP, All ages, Clinical Malaria}

…

…

…

Deaths Clinical Malaria Total {Deaths, All ages, Clinical malaria}

Clinical Malaria Total {All patient status, All ages, Clinical malaria}

Each one of those data elements would then be assigned a set of

dimensions, and a set of dimensional elements.

The cateogries functionality would simply be an artifact to produce

multiple data elements, without having to enter them seperately, which

if I understood Ola yesterday, was one of its intended purposes.

Now, for those of use such as myself, that do that have already create

dozens of data elements with different dimensions in their names (but

no where in a relational table) we could assign the dimensionality in

a seperate step (post-facto as Bob mentioned earlier). I might want to

assign a “uber” dimension of “Communicalble” and “Non-communicable” to

a disease type that might not have anything to do with the definition

of the data element itself, but would be simply for analysis purposes

later. Again, I may be rehashing my previous emails here, but from a

pure SQl standpoint, the approach I suggest here makes sense to me, in

terms of queries of how to pull this into a crosstab as well as how to

generate a fact table that something like an OLAP server could deal

with

This approach might seem to resolve the issue of how to deal with

these two different beasts, but unfolding the multidimensional data

element into simpler components. Meaning that the

cateorgy/combos/options would be used as a templating mechanisms, but

that dimensionality could be assigned through a separate set of

relations. Perhaps this is what is represented in the diagram, but I

will need to study it tomorrow after some sleep.

I do think that that dimenional elements should not be able to be

share by dimensions, and that dimensions and dimensional elements

should not be able to be deleted without lots of bells and whistles

going off once they have been assigned to data elements.

What is wrong with that as long as values are not associated with them? I think we will be falling back to the current implemention instead - like dimensional elements should not be deleted once values are assigned to their combinations.

I agree. I think we all will agree on this much.

I guess the key question is whether data elements should be able to

have multiple DimensionElementCombinations, which I think is the

current implementation. I am just not sure this will work with a

combination of DHIS2-type-multidimensional elements, and DHIS1.4-type

data elements.

Can anyone explain me how the DHIS2 multidimensional dataelement concept fails to handle the DHIS 1.4 dataelements - sorry may be I missed this from your earlier discussion? I think the way I see it - if the objective is on OLAP, pivoting/querying, then what we need is not to change the model - instead to develop more APIs which can pull data along a dimension, varying degree of overlappings across dimensions - or more generally aggregation of values over a flexible set of dimensionelementcombinations !

Again I am with you mostly on this. In fact that has been my suggestion all along - to push the functionality into the API. But having said that I think the current model is too double-jointed and complex. I have seen by trying to unpick the dimensions using xslt I need too many hash tables which is inefficient. This no doubt would also translate into too many SQL clauses. By trimming the requirement that dimensionelements are freely assignable the model becomes a good bit simpler. Beyond that it is mostly changing names.

Using the example above - {OPD, IPD}, {Male, Female},{Under 1, 1-5, Above 5} and malaria as base dataelement

What we have currently is an API to provide values for

Malaria(OPD,Male,Under 1)
Malaria(OPD,Male,1-5)

Malaria(OPD,Male,Above 5)
Malaria(OPD,Female,Under 1)
Malaria(OPD,Female,1-5)
Malaria(OPD,Female,Above 5)
…
…

And if I understood correctly … what is required is to have registred cases of

Malaria in the OPD,
Malaria in the IPD
Malaria for Males
Malaria for Females
…
…

Malaria In the OPD but only those Female
Malaria In the IPD but for male
…
…
…
we can list different combinations…

or finally ask … for the Malaria

Isn’t this a simple question of Aggregation? Does the multidimensional datamodel have a limitation to handle the above requirements - or am I talking a different stuff here?

No I believe it can probably be done - but yet it doesn’t seem to have been done. When I started looking at how I might do it I realized that it could also be simplified.

Regards

Bob

Enough for today.

Thanks for this Bob. It is a good start. Can’t you make this diagram

in DocBook so I can edit it?

Regards,

Jason

On Tue, Sep 29, 2009 at 8:01 PM, Abyot Gizaw abyodia@gmail.com wrote:

Yes your suggestion is doable and less is better … but I think the

requirement from the field is more complex.

If, for a moment, we stop talking about datavalues and talk about

dataelements - why are we talking about dimension combinations?

Because you are assuming a dataelement to have only one dimension. Am I

correct? If that is the case, I see a little bit of inconsistency here.

DataElement talks about one dimesion, but its corresponding value talks

about combination of dimensions.

Yes from the datavalue I can have dimensionelementcombinations, pick

dimensionelments regroup and put them in their corresponding dimesions – in

the end telling me from which dimension they came from. But from this point

onwards I am no more talking about a value of a single dataelement but a

value for combination of dataelements (because I have to pull different

dataelements which can give me the identified dimensions) … but is this

what we want?

The other point I would like the raise is - will there not be any limitation

on the flexibility of the system when putting the restriction "A Dimension

has many DimensionElements. But a DimensionElement is a member of only one

Dimension" ? Not only system flexibility problem, I see a logical problem as

well. Because if we think for example beyond the obvious

SEX(male,female,unknown) - I see a strong need for letting dimensionelements

to be member of multiple dimensions: For example take the other obvious

dimension - AGE. And assume <5 yrs, 5-10 yrs, and <5 yrs as its

dimesionelements. May be such scaling of the AGE dimension is approrpiate

for Malaria case, but for TB case people might be interested to break the

AGE dimension into <5yrs, 5-10yrs, 10-15yrs, >15yrs - so how are we going to

handle cases like this? Are we going to define a number of <5yrs or are we

going to use the same <5yr dimensionelement ?

Thank you

Abyot.

On Tue, Sep 29, 2009 at 4:45 PM, Bob Jolliffe bobjolliffe@gmail.com wrote:

OK. Here’s my first attempt to rationalize things. Please excuse the

attachments. I try not to send attachments to mailing lists but these are

at least fairly small. (And Lars I will write it up in docbook after

fishing for feedback).

My primary aim has been to disturb the existing model as little as

possible whilst trying to simplify wherever possible.

Attached oldmodel.png shows the participants in the existing model. As

you can see there are 11 tables in all. I haven’t showed the relations as

it becomes a bit of a web.

Also attached is a proposed amended database model which bears sufficient

similarity to the old that migration between the two should be feasible.

But it is down to 6 tables. And I have named the tables according to the

terms we have been discussing. Of course this is just the database model.

I’ve also put together an XML view of what some sample dataset might look

like. There is also a UML model required which would be richer than the

underlying datamodel, but one step at a time …

Walking through:

DataElements can have Dimensions. And different dataElements can (and

hopefully will) share some of the same Dimensions. So there is a m-to-n

relationship between the two necessitating an extra table

(DataElementDimensions). An example of a Dimension is SEX. Nothing new

here.

Dimensions have DimensionElements. So SEX for example might have

DimensionElements “Male”, “Female”, “Unknown”. A big difference from the

old model is that there is 1-n relationship between DimensionElements and

Dimensions. A Dimension has many DimensionElements. But a DimensionElement

is a a member of only one Dimension.

DataValues represent the values at intersection of these Dimensions.

Keeping with the spirit of the old model this intersection is represented by

a single key, DimensionElementCombination. The DimensionElementCombinations

would be populated when a new Dimension is added to a DataElement. Like the

original model there is some fragility here. Changing dimensions on

dataelements could create a situation where datavalues become orphaned or

misdirected. The API must have robust methods for defending this integrity

particulalrly when updating the structural metadata. But this is perhaps

doable. Either way its not worse than we have.

I haven’t given a name to DimensionElementCombinations. From the examples

I have seen from SL this seems to be unnecessary. The names I have seen

being used are generally simply contrived from the dimensions or (worse

still) from the categoryoptions. What is important is that dataelements can

have sets of dimensions.

And then much of what is different is just a renaming of the original

entities. From the attached XML file I think you can see some of the

issues faced re names and identifiers. I find myself following a sort of

convention of CODE, Name, Description and UUID. CODE’s must be unique

within the scope of the database. I suppose this is close to what we

currently call ShortName. I would like to place constraints on CODES in

terms of length and also the disallowing of spaces and other funny

characters. The reason being that we may well have to use these codes in

making up uri’s. So CODES must be unique. For the moment we could keep

name unique but should migrate from it. Its a matter of rewriting all our

comparators I guess. UUIDs I am told are unique through some sort of

divinity so we apparently do not need to worry about them

I’ve also tried to reduce the number of knees on the donkey - from 11

tables to 6. I believe this can be done whilst preserving the existing

functionality. This arangement would make it much more sensible to produce

the XML I need to produce. I’m hoping that it would also be more friendly

to those who would be trying to pivot the data across dimensions.

Jason do you think this works for you? I might have missed out something

really fundamental. Abyot, you’ve been through this process before - am I

missing something? From the DataValue you can see DimensionElements. And

once you know a DimensionElement you also know the Dimension to which it

belongs. I think thats queryable. Will have to hydrate with some data and

see.

Shaking the multidimensional model up like this would obviously have

implications. But I suspect most of it is taking stuff away rather than

adding new so it might just be doable. Less is more.

Not spending time with docbook yet, till I get some feedback.

Cheers

Bob

2009/9/29 Bob Jolliffe bobjolliffe@gmail.com

Hi

On the back of Jason and others comments, I’ve reached the conclusion

that we cannot really live with the MD model the way it is. Whereas I think

it is (just about) workable there are some serious optimizations we can and

should do. I am going to put my other work back a day or two and propose

some changes in a branch.

I think central to the inefficiency is the many-many relation between

categories and categoryoptions. This strikes me as illogical as well as

being cumbersome in the UI. Do we really want to be able to make categories

with options like {‘0<5’,‘6-10’,‘Male’,‘Out of stock’,‘35-40’}. Reducing

the relation between categories and category options to 1-n cuts two tables,

should make sql queries more efficient and grokkable and also matches other

models such as sdmx better.

The other possiible inefficiency is the dimensionset. It can be useful

in some contexts but I’m guessing that when querying the data (which we want

to be fast) it is not relevant. A dataelement can have dimensions. The

fact that some dataelements have the same combinations of dimensions is very

useful to know for some purposes, but it should be possible to get from the

dataelement to the dimension directly.

On the other side of the road is the hierarchical dimensionality idea I

see Ola and Jason have been discussing, where dimensions are composed

(perhaps post-facto) of uni-dimensional dataelements rather than decomposed

into pre-structured dimensional elements. I suspect that:

we need both; and

from the API, user and reporting perspective they should look the

same (ie a dataelement can have dimensions - how they come about should not

be a concern at the end point).

I’ll try out some of these ideas and point you to the branch.

Regards

Bob

2009/9/29 Lars Helge Øverland larshelge@gmail.com

Thanks for the explanations Jason. The multidimensional model is quite

complicated, is poorly documented, and as you say is DHIS-centric in the way

that it is built around the DHIS notion of a Data Element.

Could we assemble and put some of the text being written on the list to

docbook?

That said, and I think Jason already has made a strong case for this,

also in a 100% DHIS2 scenario you will need more flexibility in defining

dimensions to your data than what categories can provide. Being able to

define data dimensions independent of data collection is powerful and should

be supported in a better way than what data element groups provide today.

Given that we already have the orgunit group set code in place I would

assume that adding group sets to data elements could be a relatively

straight forward thing to do (but then again, I am not the programmer…).

I don’t see any implications in adding this to the system, it won’t

require changes to the existing model as the association goes from the

groupset to the groups. We can prioritize this for the 2.0.3 release.

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

jason · 30 September 2009 13:31

Indicators are most certainly multi-dimensional, but without a formal
way of extending the multidimensional concept to indicators, I cannot
see how it can work. It is still not clear to me how the
multidimensional data elements are used to calculate indicators in the
same was as PODE (plain old data element). I guess this is handled
somehow by the API? For instance, if I define my indicator (Malaria
cases) with category combos (Under 1, Under 5, Over 5) and Patient
status (OPD, IPD, Deaths), how do I calculate the Under 1 malaria
incidence rate, which would be a slice of Under 1 malaria cases (with
the patient status dimension folded) divided by a multi-dimensional
population figure. Does this imply that the population figures and the
incidence/coverages that result from combinations of indicators must
share the exact same dimensionality so that DHIS can divine the
correct dimensional intersections?

I have not played around with this, but I suppose it is possible
somehow from wihtin the indicator definition panels.

I can see why indicators should be multi-dimensional, both in terms in
definition and in terms of analysis, but it feels like it would
require a major rework.

In terms of keeping it simple, again, all i require is the ability to
assign dimensions and dimensional elements to data elements.

···

On Wed, Sep 30, 2009 at 3:14 PM, Bob Jolliffe <bobjolliffe@gmail.com> wrote:

2009/9/30 Ola Hodne Titlestad <olatitle@gmail.com>

2009/9/30 Bob Jolliffe <bobjolliffe@gmail.com>

OK. I've reached the conclusion that the model can and probably should
be simplified, but it is really far too much work for what I have time for
now. The categoryoptioncombo is already deeply ingrained in many parts of
the system. So don't hold your breath.

I'm going back to focus on my much simpler problem of exploding
categorycombooptions into dimensions and vice versa.

For querying, I can see the API needs methods added to return datavalues
by arbitrary collections of category rather than just fixed
categoryoptioncombos. These only exist for the purpose of data collection.
I suspect that this is what Ola needs to create more flexible reporttables.
Then when configuring the reporttable you would freely select the dimensions
you were interested in. This is of course do-able - I can see it - but my
little brain is struggling with the complexity.

Looking at a two stage process it is a matter of getting the collection
of categorycombooptionids which intersect with the given set of categories
and then passing that collection to the existing API method which returns
collections of datavalues which match particular categorycombooptionids.

In principle if we can expose the required methods in the API then it
might be possible at some time in the future to revamp the underlying table
structure without disturbing the API.

Two final thoughts:
1. if we are bound to the model whereby categoryoptions are free
standing entitities (ie many to many relation with categories) then, for the
purpose of import/export we are obliged to uniquely identify these as well.
So I will have to reluctantly also put uuids on categoryoptions. After
discussing with Abyot last night, I can see that there is some value in
having them the way they are, but we will have to live with the complexity.
What you gain on the swings you lose on the roundabouts.

OK. I still don't get why we need this flexibility though. When using the
data values you would only query for data element + categories/dimensions
anyway right, and <5 means <5 whether it is part of AGE1, AGE2 or AGE 3. Or?

I guess the problem is that "<5" is just a label. Using OpenMRS-speak you
could say there is no "concept" attached to the label. So in another
category there could be a label "lessThan5". By allowing options to be
shared between categories, Abyot is hoping you will just use "<5" in all
cases. Of course there is nothing forcing you to do this. Just as there is
nothing stopping you having a category of "<5", "Oranges" and "Apples". So
combined with the flexibility to do something possibly useful you also have
the flexibility to do quite silly things.

There is a strong sense in which Age is quite a special and common case
(like Period). For example, if you had one category with {<5, 5-10, >10}
and another with {0-10, >10} then you should really be able to aggregate all
the 0-10's.

Perhaps the category Age (or any categories implementing the Age concept)
requires some special status where there are formal requirements on the
naming of categoryoptions within it. Don't know - you are more familiar
with the use cases.

2. Indicators are not multidimensional. Why is this? Was it a
conscious decision resulting from earlier discussion or is it just that we
haven't got there yet?

Data analysis could benefit from having multidimensional indicators, but
then since this is strictly for output and never input I would suggest using
the post-method of assigning indicator group sets and groups (or whatever
you end up calling it in the UI). What makes indicators interesting and
complex in this context is that the numerator/denominator formulas should be
able to contain slices of the multidimensional data element, e.g. "Malaria"
+ "all ages", "male", and not only the flat data element (data element + 1
categoryoptioncombo, "Malaria"+ "<5", "male") like it is today.

This distinction between input and output is strange. Having input-only
dimensions is like a sort of statistical masturbation Lot of effort
with no end result.

Yet looking at SDMX, it is clear that the protocol is much more suited for
indicators than it is for dataelements. In fact using it to shunt
dataelements around between systems is a bit of a perversion. But my sense
is that WHO would like DHIS in national offices to produce SDMX formatted
indicator reports for them. Is that your sense too? And should we care?
If so there is some expectation that indicators should have dimensions. And
including the slices you refer to above. In fact if we were ever to try and
import the metadata from the famous WHO indicator repository that is exactly
what we will see. Not sure how we might handle it without a md model. I
suppose we will create flat indicators with the dimensions encoded in the
name and then set about grouping the buggers :-(.

I haven't really looked much at the indicator end of the beast. Been
focussed more on getting datavalues from openmrs.

Regards
Bob

Regards
Bob

2009/9/29 Bob Jolliffe <bobjolliffe@gmail.com>

2009/9/29 Abyot Gizaw <abyota@gmail.com>

On Tue, Sep 29, 2009 at 9:16 PM, Jason Pickering >>>>> <jason.p.pickering@gmail.com> wrote:

I think Abyot raises some good points, especially his last one about
differenences of what the age dimension really is.

I think the biggest challenge is going to be how to unite the concepts
of a multidimensional data element (as it is currently implemented
with categories) and a data element that has no multidimensionality,
at least in the sense of it not being assigned any categories.

Isn't this what we have in the current system? If you are not assigning
any combination of categories for a dataelement (well of course for the sake
of consistency - from programming logic point of view - implicitly a default
category combination with one default category having one default option is
assigned - it is like putting your value at zero on the dimensions axis)
then the dataelement has no dimensionality.

I don't really like the default category idea. The way I have currently
proposed there is no default category. By default a dataelement has no
dimensions. It doesn't need a default dimension. And also by default the
dimensionelementcombination in datavalue is NULL.

What about the following scenario. Could the cateogry/category combos
be transformed somehow into a sort of data element generator? Users
could define a dimensionality set, assign a master data element, and
DHIS would create all of the necessary data elements. So a category
combination of Patient Status (OPD, IPD, Deaths) and Age (Under 1
,Under 5 and Over 5) and template data element (Clinical malaria)
would produce :

OPD Under 1 Clinical Malaria {OPD, Under 1, Clinical Malaria}
OPD Under 5 Clinical Malaria {OPD, 1-5, Clinical Malaria}
OPD Over 5 Clinical Malaria ...
OPD Clinical Malaria Total {OPD, All ages, Clinical Malaria}
...
..
..
IP Clinical Malaria Total {IP, All ages, Clinical Malaria}
...
...
...
Deaths Clinical Malaria Total {Deaths, All ages, Clinical malaria}
Clinical Malaria Total {All patient status, All ages, Clinical
malaria}

Each one of those data elements would then be assigned a set of
dimensions, and a set of dimensional elements.
The cateogries functionality would simply be an artifact to produce
multiple data elements, without having to enter them seperately, which
if I understood Ola yesterday, was one of its intended purposes.

Now, for those of use such as myself, that do that have already create
dozens of data elements with different dimensions in their names (but
no where in a relational table) we could assign the dimensionality in
a seperate step (post-facto as Bob mentioned earlier). I might want to
assign a "uber" dimension of "Communicalble" and "Non-communicable" to
a disease type that might not have anything to do with the definition
of the data element itself, but would be simply for analysis purposes
later. Again, I may be rehashing my previous emails here, but from a
pure SQl standpoint, the approach I suggest here makes sense to me, in
terms of queries of how to pull this into a crosstab as well as how to
generate a fact table that something like an OLAP server could deal
with

This approach might seem to resolve the issue of how to deal with
these two different beasts, but unfolding the multidimensional data
element into simpler components. Meaning that the
cateorgy/combos/options would be used as a templating mechanisms, but
that dimensionality could be assigned through a separate set of
relations. Perhaps this is what is represented in the diagram, but I
will need to study it tomorrow after some sleep.

I do think that that dimenional elements should not be able to be
share by dimensions, and that dimensions and dimensional elements
should not be able to be deleted without lots of bells and whistles
going off once they have been assigned to data elements.

What is wrong with that as long as values are not associated with them?
I think we will be falling back to the current implemention instead - like
dimensional elements should not be deleted once values are assigned to their
combinations.

I agree. I think we all will agree on this much.

I guess the key question is whether data elements should be able to
have multiple DimensionElementCombinations, which I think is the
current implementation. I am just not sure this will work with a
combination of DHIS2-type-multidimensional elements, and DHIS1.4-type

data elements.

Can anyone explain me how the DHIS2 multidimensional dataelement
concept fails to handle the DHIS 1.4 dataelements - sorry may be I missed
this from your earlier discussion? I think the way I see it - if the
objective is on OLAP, pivoting/querying, then what we need is not to change
the model - instead to develop more APIs which can pull data along a
dimension, varying degree of overlappings across dimensions - or more
generally aggregation of values over a flexible set of
dimensionelementcombinations !

Again I am with you mostly on this. In fact that has been my suggestion
all along - to push the functionality into the API. But having said that I
think the current model is too double-jointed and complex. I have seen by
trying to unpick the dimensions using xslt I need too many hash tables which
is inefficient. This no doubt would also translate into too many SQL
clauses. By trimming the requirement that dimensionelements are freely
assignable the model becomes a good bit simpler. Beyond that it is mostly
changing names.

Using the example above - {OPD, IPD}, {Male, Female},{Under 1, 1-5,
Above 5} and malaria as base dataelement

What we have currently is an API to provide values for

Malaria(OPD,Male,Under 1)
Malaria(OPD,Male,1-5)
Malaria(OPD,Male,Above 5)
Malaria(OPD,Female,Under 1)
Malaria(OPD,Female,1-5)
Malaria(OPD,Female,Above 5)
....
...

And if I understood correctly .. what is required is to have registred
cases of

Malaria in the OPD,
Malaria in the IPD
Malaria for Males
Malaria for Females
....
..

Malaria In the OPD but only those Female
Malaria In the IPD but for male
..
..
..
we can list different combinations....

or finally ask ...... for the Malaria

Isn't this a simple question of Aggregation? Does the multidimensional
datamodel have a limitation to handle the above requirements - or am I
talking a different stuff here?

No I believe it can probably be done - but yet it doesn't seem to have
been done. When I started looking at how I might do it I realized that it
could also be simplified.

Regards
Bob

Enough for today.

Thanks for this Bob. It is a good start. Can't you make this diagram
in DocBook so I can edit it?

Regards,
Jason

On Tue, Sep 29, 2009 at 8:01 PM, Abyot Gizaw <abyodia@gmail.com> >>>>>> wrote:
> Yes your suggestion is doable and less is better .... but I think
> the
> requirement from the field is more complex.
>
> If, for a moment, we stop talking about datavalues and talk about
> dataelements - why are we talking about dimension combinations?
>
> Because you are assuming a dataelement to have only one dimension.
> Am I
> correct? If that is the case, I see a little bit of inconsistency
> here.
> DataElement talks about one dimesion, but its corresponding value
> talks
> about combination of dimensions.
>
> Yes from the datavalue I can have dimensionelementcombinations, pick
> dimensionelments regroup and put them in their corresponding
> dimesions -- in
> the end telling me from which dimension they came from. But from
> this point
> onwards I am no more talking about a value of a single dataelement
> but a
> value for combination of dataelements (because I have to pull
> different
> dataelements which can give me the identified dimensions) .... but
> is this
> what we want?
>
> The other point I would like the raise is - will there not be any
> limitation
> on the flexibility of the system when putting the restriction "A
> Dimension
> has many DimensionElements. But a DimensionElement is a member of
> only one
> Dimension" ? Not only system flexibility problem, I see a logical
> problem as
> well. Because if we think for example beyond the obvious
> SEX(male,female,unknown) - I see a strong need for letting
> dimensionelements
> to be member of multiple dimensions: For example take the other
> obvious
> dimension - AGE. And assume <5 yrs, 5-10 yrs, and <5 yrs as its
> dimesionelements. May be such scaling of the AGE dimension is
> approrpiate
> for Malaria case, but for TB case people might be interested to
> break the
> AGE dimension into <5yrs, 5-10yrs, 10-15yrs, >15yrs - so how are we
> going to
> handle cases like this? Are we going to define a number of <5yrs or
> are we
> going to use the same <5yr dimensionelement ?
>
>
> Thank you
> Abyot.
>
>
>
> On Tue, Sep 29, 2009 at 4:45 PM, Bob Jolliffe >>>>>> > <bobjolliffe@gmail.com> wrote:
>>
>> OK. Here's my first attempt to rationalize things. Please excuse
>> the
>> attachments. I try not to send attachments to mailing lists but
>> these are
>> at least fairly small. (And Lars I will write it up in docbook
>> after
>> fishing for feedback).
>>
>> My primary aim has been to disturb the existing model as little as
>> possible whilst trying to simplify wherever possible.
>>
>> Attached oldmodel.png shows the participants in the existing
>> model. As
>> you can see there are 11 tables in all. I haven't showed the
>> relations as
>> it becomes a bit of a web.
>>
>> Also attached is a proposed amended database model which bears
>> sufficient
>> similarity to the old that migration between the two should be
>> feasible.
>> But it is down to 6 tables. And I have named the tables according
>> to the
>> terms we have been discussing. Of course this is just the database
>> model.
>> I've also put together an XML view of what some sample dataset
>> might look
>> like. There is also a UML model required which would be richer
>> than the
>> underlying datamodel, but one step at a time ....
>>
>> Walking through:
>>
>> 1. DataElements can have Dimensions. And different dataElements
>> can (and
>> hopefully will) share some of the same Dimensions. So there is a
>> m-to-n
>> relationship between the two necessitating an extra table
>> (DataElementDimensions). An example of a Dimension is SEX.
>> Nothing new
>> here.
>>
>> 2. Dimensions have DimensionElements. So SEX for example might
>> have
>> DimensionElements "Male", "Female", "Unknown". A big difference
>> from the
>> old model is that there is 1-n relationship between
>> DimensionElements and
>> Dimensions. A Dimension has many DimensionElements. But a
>> DimensionElement
>> is a a member of only one Dimension.
>>
>> 3. DataValues represent the values at intersection of these
>> Dimensions.
>> Keeping with the spirit of the old model this intersection is
>> represented by
>> a single key, DimensionElementCombination. The
>> DimensionElementCombinations
>> would be populated when a new Dimension is added to a DataElement.
>> Like the
>> original model there is some fragility here. Changing dimensions
>> on
>> dataelements could create a situation where datavalues become
>> orphaned or
>> misdirected. The API must have robust methods for defending this
>> integrity
>> particulalrly when updating the structural metadata. But this is
>> perhaps
>> doable. Either way its not worse than we have.
>>
>> I haven't given a name to DimensionElementCombinations. From the
>> examples
>> I have seen from SL this seems to be unnecessary. The names I have
>> seen
>> being used are generally simply contrived from the dimensions or
>> (worse
>> still) from the categoryoptions. What is important is that
>> dataelements can
>> have sets of dimensions.
>>
>> And then much of what is different is just a renaming of the
>> original
>> entities. From the attached XML file I think you can see some of
>> the
>> issues faced re names and identifiers. I find myself following a
>> sort of
>> convention of CODE, Name, Description and UUID. CODE's must be
>> unique
>> within the scope of the database. I suppose this is close to what
>> we
>> currently call ShortName. I would like to place constraints on
>> CODES in
>> terms of length and also the disallowing of spaces and other funny
>> characters. The reason being that we may well have to use these
>> codes in
>> making up uri's. So CODES must be unique. For the moment we could
>> keep
>> name unique but should migrate from it. Its a matter of rewriting
>> all our
>> comparators I guess. UUIDs I am told are unique through some sort
>> of
>> divinity so we apparently do not need to worry about them
>>
>> I've also tried to reduce the number of knees on the donkey - from
>> 11
>> tables to 6. I believe this can be done whilst preserving the
>> existing
>> functionality. This arangement would make it much more sensible to
>> produce
>> the XML I need to produce. I'm hoping that it would also be more
>> friendly
>> to those who would be trying to pivot the data across dimensions.
>>
>> Jason do you think this works for you? I might have missed out
>> something
>> really fundamental. Abyot, you've been through this process before
>> - am I
>> missing something? From the DataValue you can see
>> DimensionElements. And
>> once you know a DimensionElement you also know the Dimension to
>> which it
>> belongs. I think thats queryable. Will have to hydrate with some
>> data and
>> see.
>>
>> Shaking the multidimensional model up like this would obviously
>> have
>> implications. But I suspect most of it is taking stuff away rather
>> than
>> adding new so it might just be doable. Less is more.
>>
>> Not spending time with docbook yet, till I get some feedback.
>>
>> Cheers
>> Bob
>>
>> 2009/9/29 Bob Jolliffe <bobjolliffe@gmail.com>
>>>
>>> Hi
>>>
>>> On the back of Jason and others comments, I've reached the
>>> conclusion
>>> that we cannot really live with the MD model the way it is.
>>> Whereas I think
>>> it is (just about) workable there are some serious optimizations
>>> we can and
>>> should do. I am going to put my other work back a day or two and
>>> propose
>>> some changes in a branch.
>>>
>>> I think central to the inefficiency is the many-many relation
>>> between
>>> categories and categoryoptions. This strikes me as illogical as
>>> well as
>>> being cumbersome in the UI. Do we really want to be able to make
>>> categories
>>> with options like {'0<5','6-10','Male','Out of stock','35-40'}.
>>> Reducing
>>> the relation between categories and category options to 1-n cuts
>>> two tables,
>>> should make sql queries more efficient and grokkable and also
>>> matches other
>>> models such as sdmx better.
>>>
>>> The other possiible inefficiency is the dimensionset. It can be
>>> useful
>>> in some contexts but I'm guessing that when querying the data
>>> (which we want
>>> to be fast) it is not relevant. A dataelement can have
>>> dimensions. The
>>> fact that some dataelements have the same combinations of
>>> dimensions is very
>>> useful to know for some purposes, but it should be possible to get
>>> from the
>>> dataelement to the dimension directly.
>>>
>>> On the other side of the road is the hierarchical dimensionality
>>> idea I
>>> see Ola and Jason have been discussing, where dimensions are
>>> composed
>>> (perhaps post-facto) of uni-dimensional dataelements rather than
>>> decomposed
>>> into pre-structured dimensional elements. I suspect that:
>>> 1. we need both; and
>>> 2. from the API, user and reporting perspective they should look
>>> the
>>> same (ie a dataelement can have dimensions - how they come about
>>> should not
>>> be a concern at the end point).
>>>
>>> I'll try out some of these ideas and point you to the branch.
>>>
>>> Regards
>>> Bob
>>>
>>> 2009/9/29 Lars Helge Øverland <larshelge@gmail.com>
>>>>
>>>>>
>>>>> Thanks for the explanations Jason. The multidimensional model is
>>>>> quite
>>>>> complicated, is poorly documented, and as you say is
>>>>> DHIS-centric in the way
>>>>> that it is built around the DHIS notion of a Data Element.
>>>>>
>>>>
>>>> Could we assemble and put some of the text being written on the
>>>> list to
>>>> docbook?
>>>>
>>>>>
>>>>> That said, and I think Jason already has made a strong case for
>>>>> this,
>>>>> also in a 100% DHIS2 scenario you will need more flexibility in
>>>>> defining
>>>>> dimensions to your data than what categories can provide. Being
>>>>> able to
>>>>> define data dimensions independent of data collection is
>>>>> powerful and should
>>>>> be supported in a better way than what data element groups
>>>>> provide today.
>>>>> Given that we already have the orgunit group set code in place I
>>>>> would
>>>>> assume that adding group sets to data elements could be a
>>>>> relatively
>>>>> straight forward thing to do (but then again, I am not the
>>>>> programmer...).
>>>>
>>>> I don't see any implications in adding this to the system, it
>>>> won't
>>>> require changes to the existing model as the association goes
>>>> from the
>>>> groupset to the groups. We can prioritize this for the 2.0.3
>>>> release.
>>>>
>>>>
>>>> _______________________________________________
>>>> Mailing list: https://launchpad.net/~dhis2-devs
>>>> Post to : dhis2-devs@lists.launchpad.net
>>>> Unsubscribe : https://launchpad.net/~dhis2-devs
>>>> More help : https://help.launchpad.net/ListHelp
>>>>
>>>
>>
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~dhis2-devs
>> Post to : dhis2-devs@lists.launchpad.net
>> Unsubscribe : https://launchpad.net/~dhis2-devs
>> More help : https://help.launchpad.net/ListHelp
>>
>
>
> _______________________________________________
> Mailing list: https://launchpad.net/~dhis2-devs
> Post to : dhis2-devs@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~dhis2-devs
> More help : https://help.launchpad.net/ListHelp
>
>

_______________________________________________
Mailing list: https://launchpad.net/~dhis2-devs
Post to : dhis2-devs@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-devs
More help : https://help.launchpad.net/ListHelp

_______________________________________________
Mailing list: https://launchpad.net/~dhis2-devs
Post to : dhis2-devs@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-devs
More help : https://help.launchpad.net/ListHelp

bobj · 30 September 2009 14:01

Indicators are most certainly multi-dimensional, but without a formal

way of extending the multidimensional concept to indicators, I cannot

see how it can work.

While there is still some disquiet about the current multidimensional concept I would be reluctant to rush in …

It is still not clear to me how the

multidimensional data elements are used to calculate indicators in the

same was as PODE (plain old data element). I guess this is handled

somehow by the API?

Jason don’t overestimate the API. I suppose the point is this functionality should be there as part of the MD implementation. But it seemed work stopped after the input side was done. I can probably understand why, but its a pity. Because it is only when you start looking at this that you realize that the queries involved will be considerably more complex than the existing ones. Curently the API for DataValueService is providing only flat queries off the datavalues table. (Even I can do these). These new ones are a completely different kettle of fish. That’s why you haven’t seen any of the goodness floating up to the reportTable design for instance.

For instance, if I define my indicator (Malaria

cases) with category combos (Under 1, Under 5, Over 5) and Patient

status (OPD, IPD, Deaths), how do I calculate the Under 1 malaria

incidence rate, which would be a slice of Under 1 malaria cases (with

the patient status dimension folded) divided by a multi-dimensional

population figure. Does this imply that the population figures and the

incidence/coverages that result from combinations of indicators must

share the exact same dimensionality so that DHIS can divine the

correct dimensional intersections?

I have not played around with this, but I suppose it is possible

somehow from wihtin the indicator definition panels.

I can see why indicators should be multi-dimensional, both in terms in

definition and in terms of analysis, but it feels like it would

require a major rework.

In terms of keeping it simple, again, all i require is the ability to

assign dimensions and dimensional elements to data elements.

Which of course you can do. Its just that it seems to be tricky to easily get them back out again.

Regards
Bob

···

2009/9/30 Jason Pickering jason.p.pickering@gmail.com

On Wed, Sep 30, 2009 at 3:14 PM, Bob Jolliffe bobjolliffe@gmail.com wrote:

2009/9/30 Ola Hodne Titlestad olatitle@gmail.com

2009/9/30 Bob Jolliffe bobjolliffe@gmail.com

OK. I’ve reached the conclusion that the model can and probably should

be simplified, but it is really far too much work for what I have time for

now. The categoryoptioncombo is already deeply ingrained in many parts of

the system. So don’t hold your breath.

I’m going back to focus on my much simpler problem of exploding

categorycombooptions into dimensions and vice versa.

For querying, I can see the API needs methods added to return datavalues

by arbitrary collections of category rather than just fixed

categoryoptioncombos. These only exist for the purpose of data collection.

I suspect that this is what Ola needs to create more flexible reporttables.

Then when configuring the reporttable you would freely select the dimensions

you were interested in. This is of course do-able - I can see it - but my

little brain is struggling with the complexity.

Looking at a two stage process it is a matter of getting the collection

of categorycombooptionids which intersect with the given set of categories

and then passing that collection to the existing API method which returns

collections of datavalues which match particular categorycombooptionids.

In principle if we can expose the required methods in the API then it

might be possible at some time in the future to revamp the underlying table

structure without disturbing the API.

Two final thoughts:

if we are bound to the model whereby categoryoptions are free

standing entitities (ie many to many relation with categories) then, for the

purpose of import/export we are obliged to uniquely identify these as well.

So I will have to reluctantly also put uuids on categoryoptions. After

discussing with Abyot last night, I can see that there is some value in

having them the way they are, but we will have to live with the complexity.

What you gain on the swings you lose on the roundabouts.

OK. I still don’t get why we need this flexibility though. When using the

data values you would only query for data element + categories/dimensions

anyway right, and <5 means <5 whether it is part of AGE1, AGE2 or AGE 3. Or?

I guess the problem is that “<5” is just a label. Using OpenMRS-speak you

could say there is no “concept” attached to the label. So in another

category there could be a label “lessThan5”. By allowing options to be

shared between categories, Abyot is hoping you will just use “<5” in all

cases. Of course there is nothing forcing you to do this. Just as there is

nothing stopping you having a category of “<5”, “Oranges” and “Apples”. So

combined with the flexibility to do something possibly useful you also have

the flexibility to do quite silly things.

There is a strong sense in which Age is quite a special and common case

(like Period). For example, if you had one category with {<5, 5-10, >10}

and another with {0-10, >10} then you should really be able to aggregate all

the 0-10’s.

Perhaps the category Age (or any categories implementing the Age concept)

requires some special status where there are formal requirements on the

naming of categoryoptions within it. Don’t know - you are more familiar

with the use cases.

Indicators are not multidimensional. Why is this? Was it a

conscious decision resulting from earlier discussion or is it just that we

haven’t got there yet?

Data analysis could benefit from having multidimensional indicators, but

then since this is strictly for output and never input I would suggest using

the post-method of assigning indicator group sets and groups (or whatever

you end up calling it in the UI). What makes indicators interesting and

complex in this context is that the numerator/denominator formulas should be

able to contain slices of the multidimensional data element, e.g. “Malaria”

“all ages”, “male”, and not only the flat data element (data element + 1

categoryoptioncombo, “Malaria”+ “<5”, “male”) like it is today.

This distinction between input and output is strange. Having input-only

dimensions is like a sort of statistical masturbation Lot of effort

with no end result.

Yet looking at SDMX, it is clear that the protocol is much more suited for

indicators than it is for dataelements. In fact using it to shunt

dataelements around between systems is a bit of a perversion. But my sense

is that WHO would like DHIS in national offices to produce SDMX formatted

indicator reports for them. Is that your sense too? And should we care?

If so there is some expectation that indicators should have dimensions. And

including the slices you refer to above. In fact if we were ever to try and

import the metadata from the famous WHO indicator repository that is exactly

what we will see. Not sure how we might handle it without a md model. I

suppose we will create flat indicators with the dimensions encoded in the

name and then set about grouping the buggers :-(.

I haven’t really looked much at the indicator end of the beast. Been

focussed more on getting datavalues from openmrs.

Regards

Bob

Regards

Bob

2009/9/29 Bob Jolliffe bobjolliffe@gmail.com

2009/9/29 Abyot Gizaw abyota@gmail.com

On Tue, Sep 29, 2009 at 9:16 PM, Jason Pickering > > >>>>> jason.p.pickering@gmail.com wrote:

I think Abyot raises some good points, especially his last one about

differenences of what the age dimension really is.

I think the biggest challenge is going to be how to unite the concepts

of a multidimensional data element (as it is currently implemented

with categories) and a data element that has no multidimensionality,

at least in the sense of it not being assigned any categories.

Isn’t this what we have in the current system? If you are not assigning

any combination of categories for a dataelement (well of course for the sake

of consistency - from programming logic point of view - implicitly a default

category combination with one default category having one default option is

assigned - it is like putting your value at zero on the dimensions axis)

then the dataelement has no dimensionality.

I don’t really like the default category idea. The way I have currently

proposed there is no default category. By default a dataelement has no

dimensions. It doesn’t need a default dimension. And also by default the

dimensionelementcombination in datavalue is NULL.

What about the following scenario. Could the cateogry/category combos

be transformed somehow into a sort of data element generator? Users

could define a dimensionality set, assign a master data element, and

DHIS would create all of the necessary data elements. So a category

combination of Patient Status (OPD, IPD, Deaths) and Age (Under 1

,Under 5 and Over 5) and template data element (Clinical malaria)

would produce :

OPD Under 1 Clinical Malaria {OPD, Under 1, Clinical Malaria}

OPD Under 5 Clinical Malaria {OPD, 1-5, Clinical Malaria}

OPD Over 5 Clinical Malaria …

OPD Clinical Malaria Total {OPD, All ages, Clinical Malaria}

…

…

…

IP Clinical Malaria Total {IP, All ages, Clinical Malaria}

…

…

…

Deaths Clinical Malaria Total {Deaths, All ages, Clinical malaria}

Clinical Malaria Total {All patient status, All ages, Clinical

malaria}

Each one of those data elements would then be assigned a set of

dimensions, and a set of dimensional elements.

The cateogries functionality would simply be an artifact to produce

multiple data elements, without having to enter them seperately, which

if I understood Ola yesterday, was one of its intended purposes.

Now, for those of use such as myself, that do that have already create

dozens of data elements with different dimensions in their names (but

no where in a relational table) we could assign the dimensionality in

a seperate step (post-facto as Bob mentioned earlier). I might want to

assign a “uber” dimension of “Communicalble” and “Non-communicable” to

a disease type that might not have anything to do with the definition

of the data element itself, but would be simply for analysis purposes

later. Again, I may be rehashing my previous emails here, but from a

pure SQl standpoint, the approach I suggest here makes sense to me, in

terms of queries of how to pull this into a crosstab as well as how to

generate a fact table that something like an OLAP server could deal

with

This approach might seem to resolve the issue of how to deal with

these two different beasts, but unfolding the multidimensional data

element into simpler components. Meaning that the

cateorgy/combos/options would be used as a templating mechanisms, but

that dimensionality could be assigned through a separate set of

relations. Perhaps this is what is represented in the diagram, but I

will need to study it tomorrow after some sleep.

I do think that that dimenional elements should not be able to be

share by dimensions, and that dimensions and dimensional elements

should not be able to be deleted without lots of bells and whistles

going off once they have been assigned to data elements.

What is wrong with that as long as values are not associated with them?

I think we will be falling back to the current implemention instead - like

dimensional elements should not be deleted once values are assigned to their

combinations.

I agree. I think we all will agree on this much.

I guess the key question is whether data elements should be able to

have multiple DimensionElementCombinations, which I think is the

current implementation. I am just not sure this will work with a

combination of DHIS2-type-multidimensional elements, and DHIS1.4-type

data elements.

Can anyone explain me how the DHIS2 multidimensional dataelement

concept fails to handle the DHIS 1.4 dataelements - sorry may be I missed

this from your earlier discussion? I think the way I see it - if the

objective is on OLAP, pivoting/querying, then what we need is not to change

the model - instead to develop more APIs which can pull data along a

dimension, varying degree of overlappings across dimensions - or more

generally aggregation of values over a flexible set of

dimensionelementcombinations !

Again I am with you mostly on this. In fact that has been my suggestion

all along - to push the functionality into the API. But having said that I

think the current model is too double-jointed and complex. I have seen by

trying to unpick the dimensions using xslt I need too many hash tables which

is inefficient. This no doubt would also translate into too many SQL

clauses. By trimming the requirement that dimensionelements are freely

assignable the model becomes a good bit simpler. Beyond that it is mostly

changing names.

Using the example above - {OPD, IPD}, {Male, Female},{Under 1, 1-5,

Above 5} and malaria as base dataelement

What we have currently is an API to provide values for

Malaria(OPD,Male,Under 1)

Malaria(OPD,Male,1-5)

Malaria(OPD,Male,Above 5)

Malaria(OPD,Female,Under 1)

Malaria(OPD,Female,1-5)

Malaria(OPD,Female,Above 5)

…

…

And if I understood correctly … what is required is to have registred

cases of

Malaria in the OPD,

Malaria in the IPD

Malaria for Males

Malaria for Females

…

…

Malaria In the OPD but only those Female

Malaria In the IPD but for male

…

…

…

we can list different combinations…

or finally ask … for the Malaria

Isn’t this a simple question of Aggregation? Does the multidimensional

datamodel have a limitation to handle the above requirements - or am I

talking a different stuff here?

No I believe it can probably be done - but yet it doesn’t seem to have

been done. When I started looking at how I might do it I realized that it

could also be simplified.

Regards

Bob

Enough for today.

Thanks for this Bob. It is a good start. Can’t you make this diagram

in DocBook so I can edit it?

Regards,

Jason

On Tue, Sep 29, 2009 at 8:01 PM, Abyot Gizaw abyodia@gmail.com > > >>>>>> wrote:

Yes your suggestion is doable and less is better … but I think

the

requirement from the field is more complex.

If, for a moment, we stop talking about datavalues and talk about

dataelements - why are we talking about dimension combinations?

Because you are assuming a dataelement to have only one dimension.

Am I

correct? If that is the case, I see a little bit of inconsistency

here.

DataElement talks about one dimesion, but its corresponding value

talks

about combination of dimensions.

Yes from the datavalue I can have dimensionelementcombinations, pick

dimensionelments regroup and put them in their corresponding

dimesions – in

the end telling me from which dimension they came from. But from

this point

onwards I am no more talking about a value of a single dataelement

but a

value for combination of dataelements (because I have to pull

different

dataelements which can give me the identified dimensions) … but

is this

what we want?

The other point I would like the raise is - will there not be any

limitation

on the flexibility of the system when putting the restriction "A

Dimension

has many DimensionElements. But a DimensionElement is a member of

only one

Dimension" ? Not only system flexibility problem, I see a logical

problem as

well. Because if we think for example beyond the obvious

SEX(male,female,unknown) - I see a strong need for letting

dimensionelements

to be member of multiple dimensions: For example take the other

obvious

dimension - AGE. And assume <5 yrs, 5-10 yrs, and <5 yrs as its

dimesionelements. May be such scaling of the AGE dimension is

approrpiate

for Malaria case, but for TB case people might be interested to

break the

AGE dimension into <5yrs, 5-10yrs, 10-15yrs, >15yrs - so how are we

going to

handle cases like this? Are we going to define a number of <5yrs or

are we

going to use the same <5yr dimensionelement ?

Thank you

Abyot.

On Tue, Sep 29, 2009 at 4:45 PM, Bob Jolliffe > > >>>>>> > bobjolliffe@gmail.com wrote:

OK. Here’s my first attempt to rationalize things. Please excuse

the

attachments. I try not to send attachments to mailing lists but

these are

at least fairly small. (And Lars I will write it up in docbook

after

fishing for feedback).

My primary aim has been to disturb the existing model as little as

possible whilst trying to simplify wherever possible.

Attached oldmodel.png shows the participants in the existing

model. As

you can see there are 11 tables in all. I haven’t showed the

relations as

it becomes a bit of a web.

Also attached is a proposed amended database model which bears

sufficient

similarity to the old that migration between the two should be

feasible.

But it is down to 6 tables. And I have named the tables according

to the

terms we have been discussing. Of course this is just the database

model.

I’ve also put together an XML view of what some sample dataset

might look

like. There is also a UML model required which would be richer

than the

underlying datamodel, but one step at a time …

Walking through:

DataElements can have Dimensions. And different dataElements

can (and

hopefully will) share some of the same Dimensions. So there is a

m-to-n

relationship between the two necessitating an extra table

(DataElementDimensions). An example of a Dimension is SEX.

Nothing new

here.

Dimensions have DimensionElements. So SEX for example might

have

DimensionElements “Male”, “Female”, “Unknown”. A big difference

from the

old model is that there is 1-n relationship between

DimensionElements and

Dimensions. A Dimension has many DimensionElements. But a

DimensionElement

is a a member of only one Dimension.

DataValues represent the values at intersection of these

Dimensions.

Keeping with the spirit of the old model this intersection is

represented by

a single key, DimensionElementCombination. The

DimensionElementCombinations

would be populated when a new Dimension is added to a DataElement.

Like the

original model there is some fragility here. Changing dimensions

on

dataelements could create a situation where datavalues become

orphaned or

misdirected. The API must have robust methods for defending this

integrity

particulalrly when updating the structural metadata. But this is

perhaps

doable. Either way its not worse than we have.

I haven’t given a name to DimensionElementCombinations. From the

examples

I have seen from SL this seems to be unnecessary. The names I have

seen

being used are generally simply contrived from the dimensions or

(worse

still) from the categoryoptions. What is important is that

dataelements can

have sets of dimensions.

And then much of what is different is just a renaming of the

original

entities. From the attached XML file I think you can see some of

the

issues faced re names and identifiers. I find myself following a

sort of

convention of CODE, Name, Description and UUID. CODE’s must be

unique

within the scope of the database. I suppose this is close to what

we

currently call ShortName. I would like to place constraints on

CODES in

terms of length and also the disallowing of spaces and other funny

characters. The reason being that we may well have to use these

codes in

making up uri’s. So CODES must be unique. For the moment we could

keep

name unique but should migrate from it. Its a matter of rewriting

all our

comparators I guess. UUIDs I am told are unique through some sort

of

divinity so we apparently do not need to worry about them

I’ve also tried to reduce the number of knees on the donkey - from

11

tables to 6. I believe this can be done whilst preserving the

existing

functionality. This arangement would make it much more sensible to

produce

the XML I need to produce. I’m hoping that it would also be more

friendly

to those who would be trying to pivot the data across dimensions.

Jason do you think this works for you? I might have missed out

something

really fundamental. Abyot, you’ve been through this process before

am I

missing something? From the DataValue you can see

DimensionElements. And

once you know a DimensionElement you also know the Dimension to

which it

belongs. I think thats queryable. Will have to hydrate with some

data and

see.

Shaking the multidimensional model up like this would obviously

have

implications. But I suspect most of it is taking stuff away rather

than

adding new so it might just be doable. Less is more.

Not spending time with docbook yet, till I get some feedback.

Cheers

Bob

2009/9/29 Bob Jolliffe bobjolliffe@gmail.com

Hi

On the back of Jason and others comments, I’ve reached the

conclusion

that we cannot really live with the MD model the way it is.

Whereas I think

it is (just about) workable there are some serious optimizations

we can and

should do. I am going to put my other work back a day or two and

propose

some changes in a branch.

I think central to the inefficiency is the many-many relation

between

categories and categoryoptions. This strikes me as illogical as

well as

being cumbersome in the UI. Do we really want to be able to make

categories

with options like {‘0<5’,‘6-10’,‘Male’,‘Out of stock’,‘35-40’}.

Reducing

the relation between categories and category options to 1-n cuts

two tables,

should make sql queries more efficient and grokkable and also

matches other

models such as sdmx better.

The other possiible inefficiency is the dimensionset. It can be

useful

in some contexts but I’m guessing that when querying the data

(which we want

to be fast) it is not relevant. A dataelement can have

dimensions. The

fact that some dataelements have the same combinations of

dimensions is very

useful to know for some purposes, but it should be possible to get

from the

dataelement to the dimension directly.

On the other side of the road is the hierarchical dimensionality

idea I

see Ola and Jason have been discussing, where dimensions are

composed

(perhaps post-facto) of uni-dimensional dataelements rather than

decomposed

into pre-structured dimensional elements. I suspect that:

we need both; and

from the API, user and reporting perspective they should look

the

same (ie a dataelement can have dimensions - how they come about

should not

be a concern at the end point).

I’ll try out some of these ideas and point you to the branch.

Regards

Bob

2009/9/29 Lars Helge Øverland larshelge@gmail.com

Thanks for the explanations Jason. The multidimensional model is

quite

complicated, is poorly documented, and as you say is

DHIS-centric in the way

that it is built around the DHIS notion of a Data Element.

Could we assemble and put some of the text being written on the

list to

docbook?

That said, and I think Jason already has made a strong case for

this,

also in a 100% DHIS2 scenario you will need more flexibility in

defining

dimensions to your data than what categories can provide. Being

able to

define data dimensions independent of data collection is

powerful and should

be supported in a better way than what data element groups

provide today.

Given that we already have the orgunit group set code in place I

would

assume that adding group sets to data elements could be a

relatively

straight forward thing to do (but then again, I am not the

programmer…).

I don’t see any implications in adding this to the system, it

won’t

require changes to the existing model as the association goes

from the

groupset to the groups. We can prioritize this for the 2.0.3

release.

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

jason · 1 October 2009 07:28

It is still not clear to me how the
multidimensional data elements are used to calculate indicators in the
same was as PODE (plain ol,d data element). I guess this is handled
somehow by the API?

I have not played around with this, but I suppose it is possible
somehow from wihtin the indicator definition panels.

Well, I have now played around with it and see more or less how it
works. I have promised Lars I would try and put together some more
documentation on the multidimensional data elements funcitonality,
which I will try and distill together out of all these mails. But
first a few more questions.

Once I define an data element (Clinical Malaria) with Cateogries (Age
(3 cateogory options) /Gender (2 cateogory options)) I get six data
elements in my entry screen corresponding to the combination of all of
these. When I go to define an indicator (say Malaria incidence for
under 1), I can select these "sub-elements" . So for the numerator I
get something like

Clinical Malaria (Under1, Female,) + Clinical Malaria (Under1, Male,)

and for the denominator, I would choose a semi-permanent data element
like population..

Age Under 1

That is pretty sweet and I can calculate a Clinical Malaria under 1
incidence rate, so defining inidcators with multidimensional data
elements seems to work fine (have not tried to calculate anything but
I guess this works as well).

Anyway, my first question is about that last little comma. It would
seem somehow (I have not looIked at the code) that there are
three-dimensions that are sort of hard-wired. I have only defined two.
Is that last comma significant, or just a bit of screen lint?

Now, my next question which is a bit more erudite. Supposing I would
go down the route of defining all of my indicators in a
multidimensional fashion, is there any limit to the level of
dimensionality that I can assign them and where should I start? Lets
think about the malaria data element

For malaria I might decide to get very complicated and choose many
categories and options (Cateogories are after the number, possile
cateogry options are in parentheses.

1) Type (Disease, Service delivery, equipment)
2) Disease type (Communicable, Non-communicable)
3) Transmission method (Vector Borne, Water borne, Air borne, Sexually
transmitted)
4) Disease (Malaria, Leprosy, Leishmaniasis, etc)
5) Diagnosis status (Clinical, confirmed)
5) Patient status (OPD, IP, Deaths)
6) Age (Under 1, 1-5, Over 5)
7) Gender (Male, Female)

This list is not complete, and would need some more category elements
to be totally complete, but this enough to get started. So, I can see
that if I define my categories and options like this, I will get a
data element for "OPD Clinical Cases of Malaria Under 1" at some
point.

So, I guess my question is, where do i start to define my data element
dimensionality? With Disease? There are dimensions "above" the disease
however like I indicate here, like the transmission method. What if I
want to be able to know the total number of cases of all vector borne
diseases? Not a totally unusual request. Would I need to start the
definition of my data elements from there? Would this need to be a
"Dataelement group" instead? What about if I need to know the total
number of cases of communicable diseases? Would this not imply I would
need to add this data element to two seperate data element groups,
which at least with DHIS 1.4 is a no-no as it results in duplicates in
the PivotTables?

It seems like we have stumbled on a partical accelerator. The deeper
you dig, the more dimensions there are.

Any practical suggestions. I know this is yet another erutdite
example, but it highlights that if we are going to have
multidimensional data elements, we need to be able to provide guidance
on how they should be setup.

Best regards,
Jason

Abyot_Gizaw · 1 October 2009 08:06

It is still not clear to me how the

multidimensional data elements are used to calculate indicators in the

same was as PODE (plain ol,d data element). I guess this is handled

somehow by the API?

I have not played around with this, but I suppose it is possible

somehow from wihtin the indicator definition panels.

Well, I have now played around with it and see more or less how it

works. I have promised Lars I would try and put together some more

documentation on the multidimensional data elements funcitonality,

which I will try and distill together out of all these mails. But

first a few more questions.

Once I define an data element (Clinical Malaria) with Cateogries (Age

(3 cateogory options) /Gender (2 cateogory options)) I get six data

elements in my entry screen corresponding to the combination of all of

these. When I go to define an indicator (say Malaria incidence for

under 1), I can select these “sub-elements” . So for the numerator I

get something like

Clinical Malaria (Under1, Female,) + Clinical Malaria (Under1, Male,)

and for the denominator, I would choose a semi-permanent data element

like population…

Age Under 1

That is pretty sweet and I can calculate a Clinical Malaria under 1

incidence rate, so defining inidcators with multidimensional data

elements seems to work fine (have not tried to calculate anything but

I guess this works as well).

Anyway, my first question is about that last little comma. It would

seem somehow (I have not looIked at the code) that there are

three-dimensions that are sort of hard-wired. I have only defined two.

Is that last comma significant, or just a bit of screen lint?

Yes that is just a bug! The for-loop adds a comma after each “dimensionelement” assuming there will another one coming we will tell the loop not to add a comma if the “dimensionelement” is the last one (or simply truncate in the end)

Now, my next question which is a bit more erudite. Supposing I would

go down the route of defining all of my indicators in a

multidimensional fashion, is there any limit to the level of

dimensionality that I can assign them and where should I start? Lets

think about the malaria data element

For malaria I might decide to get very complicated and choose many

categories and options (Cateogories are after the number, possile

cateogry options are in parentheses.

Type (Disease, Service delivery, equipment)

Disease type (Communicable, Non-communicable)

Transmission method (Vector Borne, Water borne, Air borne, Sexually

transmitted)

Disease (Malaria, Leprosy, Leishmaniasis, etc)

Diagnosis status (Clinical, confirmed)

Patient status (OPD, IP, Deaths)

Age (Under 1, 1-5, Over 5)

Gender (Male, Female)

This list is not complete, and would need some more category elements

to be totally complete, but this enough to get started. So, I can see

that if I define my categories and options like this, I will get a

data element for “OPD Clinical Cases of Malaria Under 1” at some

point.

So, I guess my question is, where do i start to define my data element

dimensionality? With Disease? There are dimensions “above” the disease

however like I indicate here, like the transmission method. What if I

want to be able to know the total number of cases of all vector borne

diseases? Not a totally unusual request. Would I need to start the

definition of my data elements from there? Would this need to be a

“Dataelement group” instead? What about if I need to know the total

number of cases of communicable diseases? Would this not imply I would

need to add this data element to two seperate data element groups,

which at least with DHIS 1.4 is a no-no as it results in duplicates in

the PivotTables?

It seems like we have stumbled on a partical accelerator. The deeper

you dig, the more dimensions there are.

Emm… I don’t know. But I think there is a sort of bias here. Like starting from a flat DHIS 1.4 dataelements and trying to genereate DHIS 2 dataelements by breaking into pieces. If I am not mistaken “OPD Clinical Cases of Malaria Under 1” is a common dataelement in 1.4 so you can start to break this into pieces and get

“Under 1”
“Malaria”
“Clinical Cases”
“OPD”
…
…

but in the end getting confused which one is the dataelement which one is the dimension. Well the MD model can handle such a breakup I guess but the point is not that.

The point is, what users should do is I guess to first define what they need from that functionality - what kind of data are they going to collect? what does their dataentry screen look like?

The multimdimensionality model came into existence because of tabular dataentry screens. As Ola suggested last time, there might a limitation with this (multidimensionality - input screen) … specifically when trying to do some kind of analysis (like the Piovting thing mentioned). But how different is the analysis going to be from our input formats? The way I see it, if there is a need for further breakup during anaylsis then we have made a mistake in defining our pieces during data collection. In most cases our analysis is going to a combination and rearrangement of different pieces collected by using our input screens.

Anyways for me a dimension is just an attribute to a dataelement. So before talking about a dimension first we need to have a dataelement and (logically) we can’t mix the two!

Thank you
Abyot.

···

On Thu, Oct 1, 2009 at 9:28 AM, Jason Pickering jason.p.pickering@gmail.com wrote:

Any practical suggestions. I know this is yet another erutdite

example, but it highlights that if we are going to have

multidimensional data elements, we need to be able to provide guidance

on how they should be setup.

Best regards,

Jason

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

olatitle · 1 October 2009 08:46

It is still not clear to me how the

multidimensional data elements are used to calculate indicators in the

same was as PODE (plain ol,d data element). I guess this is handled

somehow by the API?

I have not played around with this, but I suppose it is possible

somehow from wihtin the indicator definition panels.

Well, I have now played around with it and see more or less how it

works. I have promised Lars I would try and put together some more

documentation on the multidimensional data elements funcitonality,

which I will try and distill together out of all these mails. But

first a few more questions.

Once I define an data element (Clinical Malaria) with Cateogries (Age

(3 cateogory options) /Gender (2 cateogory options)) I get six data

elements in my entry screen corresponding to the combination of all of

these. When I go to define an indicator (say Malaria incidence for

under 1), I can select these “sub-elements” . So for the numerator I

get something like

Clinical Malaria (Under1, Female,) + Clinical Malaria (Under1, Male,)

and for the denominator, I would choose a semi-permanent data element

like population…

Age Under 1

That is pretty sweet and I can calculate a Clinical Malaria under 1

incidence rate, so defining inidcators with multidimensional data

elements seems to work fine (have not tried to calculate anything but

I guess this works as well).

Anyway, my first question is about that last little comma. It would

seem somehow (I have not looIked at the code) that there are

three-dimensions that are sort of hard-wired. I have only defined two.

Is that last comma significant, or just a bit of screen lint?

Yes that is just a bug! The for-loop adds a comma after each “dimensionelement” assuming there will another one coming we will tell the loop not to add a comma if the “dimensionelement” is the last one (or simply truncate in the end)

Now, my next question which is a bit more erudite. Supposing I would

go down the route of defining all of my indicators in a

multidimensional fashion, is there any limit to the level of

dimensionality that I can assign them and where should I start? Lets

think about the malaria data element

For malaria I might decide to get very complicated and choose many

categories and options (Cateogories are after the number, possile

cateogry options are in parentheses.

Type (Disease, Service delivery, equipment)

Disease type (Communicable, Non-communicable)

Transmission method (Vector Borne, Water borne, Air borne, Sexually

transmitted)

Disease (Malaria, Leprosy, Leishmaniasis, etc)

Diagnosis status (Clinical, confirmed)

Patient status (OPD, IP, Deaths)

Age (Under 1, 1-5, Over 5)

Gender (Male, Female)

This list is not complete, and would need some more category elements

to be totally complete, but this enough to get started. So, I can see

that if I define my categories and options like this, I will get a

data element for “OPD Clinical Cases of Malaria Under 1” at some

point.

Thanks for coming up with this list Jason, It helps the discussion to have some more real examples.

So, I guess my question is, where do i start to define my data element

dimensionality? With Disease? There are dimensions “above” the disease

however like I indicate here, like the transmission method. What if I

want to be able to know the total number of cases of all vector borne

diseases? Not a totally unusual request. Would I need to start the

definition of my data elements from there? Would this need to be a

“Dataelement group” instead? What about if I need to know the total

number of cases of communicable diseases? Would this not imply I would

need to add this data element to two seperate data element groups,

which at least with DHIS 1.4 is a no-no as it results in duplicates in

the PivotTables?

I think we can only go as far as providing guidelines or examples of best practice here, it will eventually be up to the user to define these. I have seen many different approaches and I am not sure there is ONE correct.

To your question of what should be the data element I would start think of what is the most important piece of information you need. The data element will be the total of all dimensions right, at least of all the categoryoptions you define, so it will always be easier to get that total than any slice of it. E.g that total could well be “Malaria cases in OPD”.

To get back to my previous statement of separating input and output, there are some of these dimensions that I would use data element groups for as they would mean e.g. grouping together a large number of diseases or types of data (equipment).

Using data element groups to add dimensionality assumes that we will have a data element group set feature in place soon.
All these I would definitely use group for as they are too broad to capture in one data element.

Type (Disease, Service delivery, equipment)
Disease type (Communicable, Non-communicable)
Transmission method (Vector Borne, Water borne, Air borne, Sexually

transmitted)

Diagnosis status, 6) Age, and 7) Gender I would define as categories. They do not have a large number of options (like all possible diseases) and more importantly they would all be present in the same data entry form, or captured by the same facility.

Then we are only left with Disease and Patient status. Patient status is tricky because its CategoryOptions (OPD, IP, and Death) span over multiple data entry forms and therefore different orgunits would use them, e.g. OPD would only apply to orgunits with outpatient clinics and IP(inpatient) only to orgunits with beds. Rarely one orgunit would do both and therefore they would possibly not be on the same form at all. For this reason I would not create a category Patient Status. Disease would be my obvious choice for data element, because it is almost always in the center of data analysis. It is the dimensional you most often look at, so a total for each disease makes more sense than any other total. You can also easily group diseases by 1) Type (Disease, Service delivery, equipment), 2) Disease type (Communicable, Non-communicable), 3) Transmission method (Vector Borne, Water borne, Air borne, Sexually transmitted), so a data element and data element group sets like the above would make sense.

Since I would not use Patient status as a category, I could use a data element group set to define this dimension. The problem with that is that there is no way I can find out which of the data values for the data element “Malaria” actually belong to OPD and which ones come from IP or Death. You cannot use a data element groups to break up a data element into smaller pieces. Since I cannot use groups and not use categories I would simply include patient status in the data element name itself, ending up with data elements like “Malaria case in OPD”, “Malaria case in IP”, “Malaria Death”.

It seems like we have stumbled on a partical accelerator. The deeper

you dig, the more dimensions there are.

Emm… I don’t know. But I think there is a sort of bias here. Like starting from a flat DHIS 1.4 dataelements and trying to genereate DHIS 2 dataelements by breaking into pieces. If I am not mistaken “OPD Clinical Cases of Malaria Under 1” is a common dataelement in 1.4 so you can start to break this into pieces and get

“Under 1”
“Malaria”
“Clinical Cases”
“OPD”
…
…

but in the end getting confused which one is the dataelement which one is the dimension. Well the MD model can handle such a breakup I guess but the point is not that.

The point is, what users should do is I guess to first define what they need from that functionality - what kind of data are they going to collect? what does their dataentry screen look like?

The multimdimensionality model came into existence because of tabular dataentry screens. As Ola suggested last time, there might a limitation with this (multidimensionality - input screen) … specifically when trying to do some kind of analysis (like the Piovting thing mentioned). But how different is the analysis going to be from our input formats? The way I see it, if there is a need for further breakup during anaylsis then we have made a mistake in defining our pieces during data collection. In most cases our analysis is going to a combination and rearrangement of different pieces collected by using our input screens.

I don’t agree with this, and I think the example I just made above strengthens that. There are dimensions that are needed in data entry to be able to break up a data element (age, gender, etc.), and there are other dimensions that are broader groupings of data like type of diseases that you do not need to know about in order to register data about diseases.

In general I think we should keep the design of the data element as the atomic unit in DHIS, and datasets, groups and indicators as compositions of that unit. That has always been one of the key success factors of DHIS because it provides flexibility to change the compositions over time.

While the category model allows for further break ups of a data element, we should still think of the data element as a small atomic unit and not use this model to create giant data elements like e.g. “Cases in OPD”, “Communicable diseases”, “Equipment”. These are to broad for being data elements and should be data element groups in stead.

Ola

···

2009/10/1 Abyot Gizaw abyota@gmail.com

On Thu, Oct 1, 2009 at 9:28 AM, Jason Pickering jason.p.pickering@gmail.com wrote:

Anyways for me a dimension is just an attribute to a dataelement. So before talking about a dimension first we need to have a dataelement and (logically) we can’t mix the two!

Thank you
Abyot.

Any practical suggestions. I know this is yet another erutdite

example, but it highlights that if we are going to have

multidimensional data elements, we need to be able to provide guidance

on how they should be setup.

Best regards,

Jason

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp