On categories and dimensions and zooks

jason · 16 September 2009 17:11

Here I go again, this time on categories/category options/category combinations.

The concept I want to address is that of dimensional hierarchy within a set of data elements. Let me try my ASCII art skills here…

                                                                                      Total foos distributed
                                                      __________________________|____________________________________________

                                                      >                                                                                                                           >               
                      Total foos distributed to wogs                                                                                             Total foos distributed to zooks

                                >                        >                                                                                                      >                                  >

Total foos distributed to blue wogs Total foos distributed to green wogs Total foos distributed to blue wogs Total foos distributed to green zooks

In this example, the data element “Total foos distributed” is the sum of its component parts. Perhaps it is also collected at various locations which have certain types themselves, which will serve as a dimension later on in this example. Perhaps we also collect information on the number of “bars” distributed as well, which has a corresponding disaggregation hierarchy as presented above.

Let’s assign the first level a dimension of “Parent data element”, the second level “Type” and the third level “Color”.
I am being purposeful abstract here, but in practice these dimensions might correspond to disaggregation dimensions “Age”, “Gender”, ARV Treatment regimen", etc in a real system.

My limited understanding of the Data element category is as follows. In my example here, I would have three categories (Item, Creature, and Color). The first category (Item) might contain the following category options (Foos, Bars). The second category (Creature) would contain the following options (Wogs, zooks), while the third level category (Color) could contain the options (Blue, Green).

Where am I going with this you ask? Well, I would like to be able to pivot/crosstab my data on these possible dimensions, to see how many foos have been distributed to green creatures in rural locations. I might want to know how many Foos and Bars have been distributed to only wogs at private facilities (another dimension). A PivotTable, along with the appropriate data source, would allow me to do this.

Now, when I look at the current implementation of the Data element categories, it is not clear to me how I might be able to accomplish this. I can create these dimensions (categories in DHIS-speak), and assign elements (category options in DHIS-speak) to them. I can then assign a number of dimensions (categories) to a category option. I can then assign a data element to a category option, which translate essentially to assigning a dimensionality to the data element.

However, there does seem to be the ability to assign dimensions, there does not seem to be the ability to assign particular elements within those dimensions to a particular DHIS data element.

Am I missing something here?

Let me ramble on. There is the ability, through organizational unit group sets to assign dimensionality to a certain organizational unit. (Ownership type, urban/rural, facility type). It would seem to me that the organizational unit is simply another dimension, albeit a fundamental one within DHIS and most Health information systems. Nonetheless, it is a dimension, that amends itself to production of data sets that can be crosstabbed on various dimensions (different organizational unit groups).

However, it would seem that the data element itself, as currently defined in both DHIS 1.4 and 2, lack this intrinsic notion of dimensionality.

It would seem that the final data set that should be present in a given crosstab table would be the Cartesian product of each of its possible the dimensions trimmed for values that result in NULLs (e.g Location, Ownership Type, Urban/Rural/Facility Type, Item, Creature, Color in my given example). This does not seem to be the case to me at the moment, but perhaps I am missing something?

Thoughts?

Regards,
Jason

Johan_Ivar_Saebo · 16 September 2009 20:02

I think you made one category too many..
please read on below

Here I go again, this time on categories/category options/category
combinations.

The concept I want to address is that of dimensional hierarchy within a
set
of data elements. Let me try my ASCII art skills here..

Total foos distributed

__________________________|____________________________________________
                                                          >

>
                          Total foos distributed to
wogs
Total foos distributed to zooks
                                    > >

                        > >

Total foos distributed to blue wogs Total foos distributed to green
wogs Total foos distributed to blue wogs
  Total foos distributed to green zooks

In this example, the data element "Total foos distributed" is the sum of
its
component parts. Perhaps it is also collected at various locations which
have certain types themselves, which will serve as a dimension later on in
this example. Perhaps we also collect information on the number of "bars"
distributed as well, which has a corresponding disaggregation hierarchy
as
presented above.

Let's assign the first level a dimension of "Parent data element", the
second level "Type" and the third level "Color".
I am being purposeful abstract here, but in practice these dimensions
might
correspond to disaggregation dimensions "Age", "Gender", ARV Treatment
regimen", etc in a real system.

My limited understanding of the Data element category is as follows. In my
example here, I would have three categories (Item, Creature, and Color).
The first category (Item) might contain the following category options
(Foos, Bars). The second category (Creature) would contain the following
options (Wogs, zooks), while the third level category (Color) could
contain
the options (Blue, Green).

No need for category "item"
Your dataelements would be 1) foos 2) bars
Both with category combo "creature-colour", consisting of the two
categories "creature" and "colour". The two category options wogs and
zooks would go into creature, and blue and green into colour. No need for
foos nor bars to be category options.

Where am I going with this you ask? Well, I would like to be able to
pivot/crosstab my data on these possible dimensions, to see how many foos
have been distributed to green creatures in rural locations. I might want
to
know how many Foos and Bars have been distributed to only wogs at private
facilities (another dimension). A PivotTable, along with the appropriate
data source, would allow me to do this.

Some countries have rural/urban or perhaps private/public as part of the
data form. However, most don�t have that, and they solve it with having
these dimensions as (compulsory, exclusive) org unit groups. In
tajikistan, urban/rural was part of the form, and then we made that
another category, with the two category options urban and rural. Then you
will get 2 data element (foos, bars), with 6 dimensions each(3*(category
with 2 options each)). This should clarify most of the things below... but
keep reading!

Now, when I look at the current implementation of the Data element
categories, it is not clear to me how I might be able to accomplish this.
I
can create these dimensions (categories in DHIS-speak), and assign
elements
(category options in DHIS-speak) to them. I can then assign a number of
dimensions (categories) to a category option. I can then assign a data
element to a category option, which translate essentially to assigning a
dimensionality to the data element.

However, there does seem to be the ability to assign dimensions, there
does
not seem to be the ability to assign particular elements within those
dimensions to a particular DHIS data element.

Am I missing something here?

Let me ramble on. There is the ability, through organizational unit group
sets to assign dimensionality to a certain organizational unit. (Ownership
type, urban/rural, facility type). It would seem to me that the
organizational unit is simply another dimension, albeit a fundamental one
within DHIS and most Health information systems. Nonetheless, it is a
dimension, that amends itself to production of data sets that can be
crosstabbed on various dimensions (different organizational unit groups).

However, it would seem that the data element itself, as currently defined
in
both DHIS 1.4 and 2, lack this intrinsic notion of dimensionality.

It would seem that the final data set that should be present in a given
crosstab table would be the Cartesian product of each of its possible the
dimensions trimmed for values that result in NULLs (e.g Location,
Ownership
Type, Urban/Rural/Facility Type, Item, Creature, Color in my given
example).
This does not seem to be the case to me at the moment, but perhaps I am
missing something?

Thoughts?

Regarding pivot tables then, you can make your pivotviews either based on
the aggregate data element values (add up all the category option values),
or add the category combo, category, and category options as pivot fields.
Then you can easily disaggregate as you please. If you then also have the
orgunit groups as a pivot field, you should be able to do see whatever
level of aggregation you want.

Hope this helps
Johan

···

Regards,
Jason
_______________________________________________
Mailing list: https://launchpad.net/~dhis2-devs
Post to : dhis2-devs@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-devs
More help : https://help.launchpad.net/ListHelp

Johan_Ivar_Saebo · 16 September 2009 20:13

However, there does seem to be the ability to assign dimensions, there
does
not seem to be the ability to assign particular elements within those
dimensions to a particular DHIS data element.

Just some more clarification here: you can make category combos which you
assign to data elements. However, it is not possible to assign just
specific parts of a category combo (only some of the category options) to
a data element. Then you must make a specific category (as the only one in
or part of a new category combo) with just those options. It can be hell;
in Tajikistan there were way over 20 categories I think, at least 10 just
on various age groups.

Johan

jason · 16 September 2009 20:28

However, there does seem to be the ability to assign dimensions, there

does

not seem to be the ability to assign particular elements within those

dimensions to a particular DHIS data element.

Just some more clarification here: you can make category combos which you

assign to data elements. However, it is not possible to assign just

specific parts of a category combo (only some of the category options) to

a data element.

Yes, this was exactly what I wanted. Assigning different categories would seem to break the dimensionality.

Then you must make a specific category (as the only one in

or part of a new category combo) with just those options. It can be hell;

in Tajikistan there were way over 20 categories I think, at least 10 just

on various age groups.

Johan

This was my fear.

I will need to do some testing and see. I still fear it is not exactly the intended functionality.

Basically, I think I need something akin to the exclusive/compulsory groups that are in place for organizational units, but instead, for arbitrary dimensions. I will give a try and see what happens.

Thanks,
Jason

···

On Wed, Sep 16, 2009 at 10:13 PM, johansa@ifi.uio.no wrote:

jason · 24 September 2009 15:39

I have done a bit more thinking on this, and would like to offer some
more examples up for discussion.

Basically, we have a lot of data elements that are somehow related to
each other, similar to my kooky example in my original mail. I assume
this is fairly common throughout other HMIS systems. Here, malaria
attendance is broken down into various dimensions/category by patient
type (outpatient, inpatient, and deaths) and by age (under 1 ,1-5 and
over 5). But say you want to be able to pivot to look at outpatient,
inpatient and deaths totals (i.e. summed up by age). Well, you could
create a separate data element for this, but it sure would be nice to
be able to Pivot the data somehow.

So, I thought about the following. I have created a separate table in
my DHIS database.

I used the DHIS 2 interface to create two categories (Age and Patient
Status) and populated these with various category options.

Now, I find the terminology very confusing. So, if you see me
interchanging terms, it is because I find the DHIS2 way of referring
to things a bit confusing.

Dimension ≡ Category
Dimensional element ≡ Category option ≡ Category combo ( I think)

Anyway, here is the helper table I created.

CREATE TABLE test_dataelementcategorycombo
(
  test_dataelementid integer NOT NULL,
  test_dataelementcategoryid integer NOT NULL,
  test_dataelementcategorycomboid integer NOT NULL,
  CONSTRAINT pk_testdataelementcategory PRIMARY KEY
(test_dataelementid, test_dataelementcategoryid,
test_dataelementcategorycomboid)
)
WITH (OIDS=FALSE);

So this is a real simple table which references a data element, a
data element category, and a data element combo. The reference to a
data element category may be redundant, but anyway, lets leave it in
for now.

I populated the table with some data, which will be used to assign
dimensions to data elements. It looks like this in my DB, which looks
like this.

309;25250;25251
309;25257;25255
348;25250;25252
348;25257;25255
455;25250;25253
455;25257;25255

but of course this is meaningless to you. What do these values correspond to?

"OPD 1st Attendance Clinical Case of Malaria Under 1 Year";"Age";"Under 1"
"OPD 1st Attendance Clinical Case of Malaria 1 to Under 5 Years";"Age";"Age 1-5"
"OPD 1st Attendance Clinical Case of Malaria Over 5 Years";"Age";"Over 5"
"OPD 1st Attendance Clinical Case of Malaria Under 1 Year";"Patient
status";"OPD"
"OPD 1st Attendance Clinical Case of Malaria 1 to Under 5
Years";"Patient status";"OPD"
"OPD 1st Attendance Clinical Case of Malaria Over 5 Years";"Patient
status";"OPD"

which can be produced by the following view.

CREATE OR REPLACE VIEW vw_dataelements_dimensions AS
SELECT dataelement.name, dataelementcategory.name AS dimension,
dataelementcategoryoption.name AS dimension_element
   FROM dataelement
   JOIN test_dataelementcategorycombo ON
test_dataelementcategorycombo.test_dataelementid =
dataelement.dataelementid
   JOIN dataelementcategory ON dataelementcategory.categoryid =
test_dataelementcategorycombo.test_dataelementcategoryid
   JOIN dataelementcategoryoption ON
test_dataelementcategorycombo.test_dataelementcategorycomboid =
dataelementcategoryoption.categoryoptionid;

So, that view just provides a human readable view of those integers
that I populated in the the test_dataelementcategorycombo table I
created above. This table just assigns particular data elements to
different category options (dimensional elements).

OK, so far so good, but the problem now is, how to use this with the
aggregatedatavalue table? If we try and join this table directly, we
will have issues with duplicates in the pivot table, so we need to
transform the data slightly.

This should do the trick.

SELECT * FROM crosstab
(
'SELECT name, dimension, dimension_element FROM
vw_dataelements_dimensions ORDER BY 1,2,3',
'SELECT DISTINCT dimension from vw_dataelements_dimensions ORDER BY 1 ASC'
)
as
(
name character varying(230),
age character varying(160),
status character varying(160)
);

which returns this record set

"OPD 1st Attendance Clinical Case of Malaria 1 to Under 5 Years";"Age 1-5";"OPD"
"OPD 1st Attendance Clinical Case of Malaria Over 5 Years";"Over 5";"OPD"
"OPD 1st Attendance Clinical Case of Malaria Under 1 Year";"Under 1";"OPD"

OK, admittedly, I cheated a bit and used the crosstab function of
Postgresql, but I assume that this query could be rewritten with a few
more lines of code in standard SQL or some procedural language like
Java. Now, this record set looks like something that I can almost use
with the aggregateddatavalue table simply by joining up the table on
the appropriate dataelementid and pulling everything into a pivot
table. I would not have any duplicated values and would have columns
like data element name, period, orgunit, age, patient status and of
course the value of the data element. I hope that part is pretty
clear. Just join up that table to the aggregateddata table, and you
have pretty much what is needed to pull the data directly into a
PivotTable for further analysis.

This is not a complete example, but it is very close to what I need
here ,and I think this type of functionality would be much more useful
than the current data element categories functionality. Basically, all
that would be required, at least initially, would be another user
interface screen to allow the definition of which category(ies) and
category options a data element is a member of. The rest could ,in the
first instance be executed with custom SQL (obviously, I am partial to
this language and hobbled by the fact that I do not know Java), but
eventually this would need to be implemented somehow in Java.

I am not sure if this really solves all of the issues surrounding
multidimensional analysis of data elements, but it seems to solve the
issues that I am having by trying to assign some sort of dimensional
hierarchy to data elements (similar to the exclusive/compulsory
functionality of orgunits). Any thoughts on this?

Best regards,
Jason

···

On Wed, Sep 16, 2009 at 10:28 PM, Jason Pickering <jason.p.pickering@gmail.com> wrote:

On Wed, Sep 16, 2009 at 10:13 PM, <johansa@ifi.uio.no> wrote:

>> However, there does seem to be the ability to assign dimensions, there
>> does
>> not seem to be the ability to assign particular elements within those
>> dimensions to a particular DHIS data element.

Just some more clarification here: you can make category combos which you
assign to data elements. However, it is not possible to assign just
specific parts of a category combo (only some of the category options) to
a data element.

Yes, this was exactly what I wanted. Assigning different categories would
seem to break the dimensionality.

Then you must make a specific category (as the only one in
or part of a new category combo) with just those options. It can be hell;
in Tajikistan there were way over 20 categories I think, at least 10 just
on various age groups.

Johan

This was my fear.

I will need to do some testing and see. I still fear it is not exactly the
intended functionality.

Basically, I think I need something akin to the exclusive/compulsory groups
that are in place for organizational units, but instead, for arbitrary
dimensions. I will give a try and see what happens.

Thanks,
Jason

Johan_Ivar_Saebo · 24 September 2009 15:55

Jason,
I will leave to others to comment the code, but I have a few comments...

I have done a bit more thinking on this, and would like to offer some
more examples up for discussion.

Basically, we have a lot of data elements that are somehow related to
each other, similar to my kooky example in my original mail. I assume
this is fairly common throughout other HMIS systems. Here, malaria
attendance is broken down into various dimensions/category by patient
type (outpatient, inpatient, and deaths) and by age (under 1 ,1-5 and
over 5). But say you want to be able to pivot to look at outpatient,
inpatient and deaths totals (i.e. summed up by age). Well, you could
create a separate data element for this, but it sure would be nice to
be able to Pivot the data somehow.

In the Sierra Leone db, Edem and Romain set up views that pulled the
categories through into a "Category" pivot field, which you can then use
to get what you want. Simply tick the categories (see below) you want to
see, and group them together in excel. Maybe Edem and Romain can help
further here.

Dimension ? Category
Dimensional element ? Category option ? Category combo ( I think)

The right symbol disappeared from my reply-mail here, but some clarification:

Crosstab Dimension (age AND gender) = Category combo
Dimension (age, gender) = Category
Dimensional element (inpatient, outpatient, death, under1, 1-5, and over
5) = Category option

So by assigning a DE the category combo of "gender_age", you get 9
dimensional elements, 3 category options (in category age) by 3 category
options (in category gender)

Johan

···

Anyway, here is the helper table I created.

CREATE TABLE test_dataelementcategorycombo
(
  test_dataelementid integer NOT NULL,
  test_dataelementcategoryid integer NOT NULL,
  test_dataelementcategorycomboid integer NOT NULL,
  CONSTRAINT pk_testdataelementcategory PRIMARY KEY
(test_dataelementid, test_dataelementcategoryid,
test_dataelementcategorycomboid)
)
WITH (OIDS=FALSE);

So this is a real simple table which references a data element, a
data element category, and a data element combo. The reference to a
data element category may be redundant, but anyway, lets leave it in
for now.

I populated the table with some data, which will be used to assign
dimensions to data elements. It looks like this in my DB, which looks
like this.

309;25250;25251
309;25257;25255
348;25250;25252
348;25257;25255
455;25250;25253
455;25257;25255

but of course this is meaningless to you. What do these values correspond
to?

"OPD 1st Attendance Clinical Case of Malaria Under 1 Year";"Age";"Under 1"
"OPD 1st Attendance Clinical Case of Malaria 1 to Under 5
Years";"Age";"Age 1-5"
"OPD 1st Attendance Clinical Case of Malaria Over 5 Years";"Age";"Over 5"
"OPD 1st Attendance Clinical Case of Malaria Under 1 Year";"Patient
status";"OPD"
"OPD 1st Attendance Clinical Case of Malaria 1 to Under 5
Years";"Patient status";"OPD"
"OPD 1st Attendance Clinical Case of Malaria Over 5 Years";"Patient
status";"OPD"

which can be produced by the following view.

CREATE OR REPLACE VIEW vw_dataelements_dimensions AS
SELECT dataelement.name, dataelementcategory.name AS dimension,
dataelementcategoryoption.name AS dimension_element
   FROM dataelement
   JOIN test_dataelementcategorycombo ON
test_dataelementcategorycombo.test_dataelementid =
dataelement.dataelementid
   JOIN dataelementcategory ON dataelementcategory.categoryid =
test_dataelementcategorycombo.test_dataelementcategoryid
   JOIN dataelementcategoryoption ON
test_dataelementcategorycombo.test_dataelementcategorycomboid =
dataelementcategoryoption.categoryoptionid;

So, that view just provides a human readable view of those integers
that I populated in the the test_dataelementcategorycombo table I
created above. This table just assigns particular data elements to
different category options (dimensional elements).

OK, so far so good, but the problem now is, how to use this with the
aggregatedatavalue table? If we try and join this table directly, we
will have issues with duplicates in the pivot table, so we need to
transform the data slightly.

This should do the trick.

SELECT * FROM crosstab
  (
'SELECT name, dimension, dimension_element FROM
vw_dataelements_dimensions ORDER BY 1,2,3',
'SELECT DISTINCT dimension from vw_dataelements_dimensions ORDER BY 1 ASC'
  )
as
(
name character varying(230),
age character varying(160),
status character varying(160)
);

which returns this record set

"OPD 1st Attendance Clinical Case of Malaria 1 to Under 5 Years";"Age
1-5";"OPD"
"OPD 1st Attendance Clinical Case of Malaria Over 5 Years";"Over 5";"OPD"
"OPD 1st Attendance Clinical Case of Malaria Under 1 Year";"Under 1";"OPD"

OK, admittedly, I cheated a bit and used the crosstab function of
Postgresql, but I assume that this query could be rewritten with a few
more lines of code in standard SQL or some procedural language like
Java. Now, this record set looks like something that I can almost use
with the aggregateddatavalue table simply by joining up the table on
the appropriate dataelementid and pulling everything into a pivot
table. I would not have any duplicated values and would have columns
like data element name, period, orgunit, age, patient status and of
course the value of the data element. I hope that part is pretty
clear. Just join up that table to the aggregateddata table, and you
have pretty much what is needed to pull the data directly into a
PivotTable for further analysis.

This is not a complete example, but it is very close to what I need
here ,and I think this type of functionality would be much more useful
than the current data element categories functionality. Basically, all
that would be required, at least initially, would be another user
interface screen to allow the definition of which category(ies) and
category options a data element is a member of. The rest could ,in the
first instance be executed with custom SQL (obviously, I am partial to
this language and hobbled by the fact that I do not know Java), but
eventually this would need to be implemented somehow in Java.

I am not sure if this really solves all of the issues surrounding
multidimensional analysis of data elements, but it seems to solve the
issues that I am having by trying to assign some sort of dimensional
hierarchy to data elements (similar to the exclusive/compulsory
functionality of orgunits). Any thoughts on this?

Best regards,
Jason

On Wed, Sep 16, 2009 at 10:28 PM, Jason Pickering > <jason.p.pickering@gmail.com> wrote:

On Wed, Sep 16, 2009 at 10:13 PM, <johansa@ifi.uio.no> wrote:

>> However, there does seem to be the ability to assign dimensions,
there
>> does
>> not seem to be the ability to assign particular elements within
those
>> dimensions to a particular DHIS data element.

Just some more clarification here: you can make category combos which
you
assign to data elements. However, it is not possible to assign just
specific parts of a category combo (only some of the category options)
to
a data element.

Yes, this was exactly what I wanted. Assigning different categories
would
seem to break the dimensionality.

Then you must make a specific category (as the only one in
or part of a new category combo) with just those options. It can be
hell;
in Tajikistan there were way over 20 categories I think, at least 10
just
on various age groups.

Johan

This was my fear.

I will need to do some testing and see. I still fear it is not exactly
the
intended functionality.

Basically, I think I need something akin to the exclusive/compulsory
groups
that are in place for organizational units, but instead, for arbitrary
dimensions. I will give a try and see what happens.

Thanks,
Jason

jason · 24 September 2009 15:59

Hi there. It would be very good to get those views, as they may
satisfy what I need.

Can you help out?

Best regards,
Jason

···

On Thu, Sep 24, 2009 at 5:55 PM, <johansa@ifi.uio.no> wrote:

Jason,
I will leave to others to comment the code, but I have a few comments...

I have done a bit more thinking on this, and would like to offer some
more examples up for discussion.

Basically, we have a lot of data elements that are somehow related to
each other, similar to my kooky example in my original mail. I assume
this is fairly common throughout other HMIS systems. Here, malaria
attendance is broken down into various dimensions/category by patient
type (outpatient, inpatient, and deaths) and by age (under 1 ,1-5 and
over 5). But say you want to be able to pivot to look at outpatient,
inpatient and deaths totals (i.e. summed up by age). Well, you could
create a separate data element for this, but it sure would be nice to
be able to Pivot the data somehow.

In the Sierra Leone db, Edem and Romain set up views that pulled the
categories through into a "Category" pivot field, which you can then use
to get what you want. Simply tick the categories (see below) you want to
see, and group them together in excel. Maybe Edem and Romain can help
further here.

Dimension ? Category
Dimensional element ? Category option ? Category combo ( I think)

The right symbol disappeared from my reply-mail here, but some clarification:

Crosstab Dimension (age AND gender) = Category combo
Dimension (age, gender) = Category
Dimensional element (inpatient, outpatient, death, under1, 1-5, and over
5) = Category option

So by assigning a DE the category combo of "gender_age", you get 9
dimensional elements, 3 category options (in category age) by 3 category
options (in category gender)

Johan

Anyway, here is the helper table I created.

CREATE TABLE test_dataelementcategorycombo
(
test_dataelementid integer NOT NULL,
test_dataelementcategoryid integer NOT NULL,
test_dataelementcategorycomboid integer NOT NULL,
CONSTRAINT pk_testdataelementcategory PRIMARY KEY
(test_dataelementid, test_dataelementcategoryid,
test_dataelementcategorycomboid)
)
WITH (OIDS=FALSE);

So this is a real simple table which references a data element, a
data element category, and a data element combo. The reference to a
data element category may be redundant, but anyway, lets leave it in
for now.

I populated the table with some data, which will be used to assign
dimensions to data elements. It looks like this in my DB, which looks
like this.

309;25250;25251
309;25257;25255
348;25250;25252
348;25257;25255
455;25250;25253
455;25257;25255

but of course this is meaningless to you. What do these values correspond
to?

"OPD 1st Attendance Clinical Case of Malaria Under 1 Year";"Age";"Under 1"
"OPD 1st Attendance Clinical Case of Malaria 1 to Under 5
Years";"Age";"Age 1-5"
"OPD 1st Attendance Clinical Case of Malaria Over 5 Years";"Age";"Over 5"
"OPD 1st Attendance Clinical Case of Malaria Under 1 Year";"Patient
status";"OPD"
"OPD 1st Attendance Clinical Case of Malaria 1 to Under 5
Years";"Patient status";"OPD"
"OPD 1st Attendance Clinical Case of Malaria Over 5 Years";"Patient
status";"OPD"

which can be produced by the following view.

CREATE OR REPLACE VIEW vw_dataelements_dimensions AS
SELECT dataelement.name, dataelementcategory.name AS dimension,
dataelementcategoryoption.name AS dimension_element
FROM dataelement
JOIN test_dataelementcategorycombo ON
test_dataelementcategorycombo.test_dataelementid =
dataelement.dataelementid
JOIN dataelementcategory ON dataelementcategory.categoryid =
test_dataelementcategorycombo.test_dataelementcategoryid
JOIN dataelementcategoryoption ON
test_dataelementcategorycombo.test_dataelementcategorycomboid =
dataelementcategoryoption.categoryoptionid;

So, that view just provides a human readable view of those integers
that I populated in the the test_dataelementcategorycombo table I
created above. This table just assigns particular data elements to
different category options (dimensional elements).

OK, so far so good, but the problem now is, how to use this with the
aggregatedatavalue table? If we try and join this table directly, we
will have issues with duplicates in the pivot table, so we need to
transform the data slightly.

This should do the trick.

SELECT * FROM crosstab
(
'SELECT name, dimension, dimension_element FROM
vw_dataelements_dimensions ORDER BY 1,2,3',
'SELECT DISTINCT dimension from vw_dataelements_dimensions ORDER BY 1 ASC'
)
as
(
name character varying(230),
age character varying(160),
status character varying(160)
);

which returns this record set

"OPD 1st Attendance Clinical Case of Malaria 1 to Under 5 Years";"Age
1-5";"OPD"
"OPD 1st Attendance Clinical Case of Malaria Over 5 Years";"Over 5";"OPD"
"OPD 1st Attendance Clinical Case of Malaria Under 1 Year";"Under 1";"OPD"

OK, admittedly, I cheated a bit and used the crosstab function of
Postgresql, but I assume that this query could be rewritten with a few
more lines of code in standard SQL or some procedural language like
Java. Now, this record set looks like something that I can almost use
with the aggregateddatavalue table simply by joining up the table on
the appropriate dataelementid and pulling everything into a pivot
table. I would not have any duplicated values and would have columns
like data element name, period, orgunit, age, patient status and of
course the value of the data element. I hope that part is pretty
clear. Just join up that table to the aggregateddata table, and you
have pretty much what is needed to pull the data directly into a
PivotTable for further analysis.

This is not a complete example, but it is very close to what I need
here ,and I think this type of functionality would be much more useful
than the current data element categories functionality. Basically, all
that would be required, at least initially, would be another user
interface screen to allow the definition of which category(ies) and
category options a data element is a member of. The rest could ,in the
first instance be executed with custom SQL (obviously, I am partial to
this language and hobbled by the fact that I do not know Java), but
eventually this would need to be implemented somehow in Java.

I am not sure if this really solves all of the issues surrounding
multidimensional analysis of data elements, but it seems to solve the
issues that I am having by trying to assign some sort of dimensional
hierarchy to data elements (similar to the exclusive/compulsory
functionality of orgunits). Any thoughts on this?

Best regards,
Jason

On Wed, Sep 16, 2009 at 10:28 PM, Jason Pickering >> <jason.p.pickering@gmail.com> wrote:

On Wed, Sep 16, 2009 at 10:13 PM, <johansa@ifi.uio.no> wrote:

>> However, there does seem to be the ability to assign dimensions,
there
>> does
>> not seem to be the ability to assign particular elements within
those
>> dimensions to a particular DHIS data element.

Just some more clarification here: you can make category combos which
you
assign to data elements. However, it is not possible to assign just
specific parts of a category combo (only some of the category options)
to
a data element.

Yes, this was exactly what I wanted. Assigning different categories
would
seem to break the dimensionality.

Then you must make a specific category (as the only one in
or part of a new category combo) with just those options. It can be
hell;
in Tajikistan there were way over 20 categories I think, at least 10
just
on various age groups.

Johan

This was my fear.

I will need to do some testing and see. I still fear it is not exactly
the
intended functionality.

Basically, I think I need something akin to the exclusive/compulsory
groups
that are in place for organizational units, but instead, for arbitrary
dimensions. I will give a try and see what happens.

Thanks,
Jason

bobj · 25 September 2009 08:44

Hi Jason and Johan

I’m really pleased to see you having this discussion as I have been grappling with a similar issue which involves unravelling categories, category options and combos into something more familiar. I have reached similar conclusions regarding nomenclature:

category = dimension
categorycombo - I have been calling a dimension set (it bears a strong, and useful, resemblance to xslt:attribute-set)

category option - I like your suggestion of DimensionalElement. I am going to start calling it that too.

In my case I need to export (and import data) into a standard format called sdmx. So whereas in the DHIS2 native DXF we export datavalues with effectively three dimensions (source, period, categorycombooption) the last dimension is a sort of uber-dimension. Like a peppercorn or a cardamon seed, when you break it open it explodes its rich complexity of dimensions.

In sdmx we need the dimensions exploded. So data values look like:

…

My approach to unpicking the dimensions from the dxf file is to transform it with an xslt transformation which is still incomplete but seems to work well.

One other nomenclatures issue which has surfaced as a result is what we call a “dataset”. In DHIS2, if I understand correctly, a dataset corresponds roughly to all the dataelements which might occur on a datacollection form. If we view all dataelements as having just the three “dimensions” then all is well, but if we explode the actual dimensions then we have an issue. In the sdmx model a dataset consists only of dataelements with the same dimensionset. After discussing this with Ola we have reached the conclusion that we need another level of grouping, primarily for the UI - eg FormSet or ReportSet which allows us to group related datasets. But that is an aside from what you are talking about.

I know that you guys can do magic with sql, but it seems that we should try to capture some of this and place it down in the datamodel API. It occurs to me that for a multidimensional dataelement we might benefit from some utility methods to retrieve slices and dices which might assist in constructing the pivot tables around dimensions. Does this sound like the right thing to do.

Regards
Bob

···

2009/9/24 johansa@ifi.uio.no

Jason,

I will leave to others to comment the code, but I have a few comments…

I have done a bit more thinking on this, and would like to offer some

more examples up for discussion.

Basically, we have a lot of data elements that are somehow related to

each other, similar to my kooky example in my original mail. I assume

this is fairly common throughout other HMIS systems. Here, malaria

attendance is broken down into various dimensions/category by patient

type (outpatient, inpatient, and deaths) and by age (under 1 ,1-5 and

over 5). But say you want to be able to pivot to look at outpatient,

inpatient and deaths totals (i.e. summed up by age). Well, you could

create a separate data element for this, but it sure would be nice to

be able to Pivot the data somehow.

In the Sierra Leone db, Edem and Romain set up views that pulled the

categories through into a “Category” pivot field, which you can then use

to get what you want. Simply tick the categories (see below) you want to

see, and group them together in excel. Maybe Edem and Romain can help

further here.

Dimension ? Category

Dimensional element ? Category option ? Category combo ( I think)

The right symbol disappeared from my reply-mail here, but some clarification:

Crosstab Dimension (age AND gender) = Category combo

Dimension (age, gender) = Category

Dimensional element (inpatient, outpatient, death, under1, 1-5, and over

= Category option

So by assigning a DE the category combo of “gender_age”, you get 9

dimensional elements, 3 category options (in category age) by 3 category

options (in category gender)

Johan

Anyway, here is the helper table I created.

CREATE TABLE test_dataelementcategorycombo

(

test_dataelementid integer NOT NULL,

test_dataelementcategoryid integer NOT NULL,

test_dataelementcategorycomboid integer NOT NULL,

CONSTRAINT pk_testdataelementcategory PRIMARY KEY

(test_dataelementid, test_dataelementcategoryid,

test_dataelementcategorycomboid)

)

WITH (OIDS=FALSE);

So this is a real simple table which references a data element, a

data element category, and a data element combo. The reference to a

data element category may be redundant, but anyway, lets leave it in

for now.

I populated the table with some data, which will be used to assign

dimensions to data elements. It looks like this in my DB, which looks

like this.

309;25250;25251

309;25257;25255

348;25250;25252

348;25257;25255

455;25250;25253

455;25257;25255

but of course this is meaningless to you. What do these values correspond

to?

“OPD 1st Attendance Clinical Case of Malaria Under 1 Year”;“Age”;“Under 1”

"OPD 1st Attendance Clinical Case of Malaria 1 to Under 5

Years";“Age”;“Age 1-5”

“OPD 1st Attendance Clinical Case of Malaria Over 5 Years”;“Age”;“Over 5”

“OPD 1st Attendance Clinical Case of Malaria Under 1 Year”;"Patient

status";“OPD”

"OPD 1st Attendance Clinical Case of Malaria 1 to Under 5

Years";“Patient status”;“OPD”

“OPD 1st Attendance Clinical Case of Malaria Over 5 Years”;"Patient

status";“OPD”

which can be produced by the following view.

CREATE OR REPLACE VIEW vw_dataelements_dimensions AS

SELECT dataelement.name, dataelementcategory.name AS dimension,

dataelementcategoryoption.name AS dimension_element

FROM dataelement

JOIN test_dataelementcategorycombo ON

test_dataelementcategorycombo.test_dataelementid =

dataelement.dataelementid

JOIN dataelementcategory ON dataelementcategory.categoryid =

test_dataelementcategorycombo.test_dataelementcategoryid

JOIN dataelementcategoryoption ON

test_dataelementcategorycombo.test_dataelementcategorycomboid =

dataelementcategoryoption.categoryoptionid;

So, that view just provides a human readable view of those integers

that I populated in the the test_dataelementcategorycombo table I

created above. This table just assigns particular data elements to

different category options (dimensional elements).

OK, so far so good, but the problem now is, how to use this with the

aggregatedatavalue table? If we try and join this table directly, we

will have issues with duplicates in the pivot table, so we need to

transform the data slightly.

This should do the trick.

SELECT * FROM crosstab
  (
'SELECT name, dimension, dimension_element FROM

vw_dataelements_dimensions ORDER BY 1,2,3’,

‘SELECT DISTINCT dimension from vw_dataelements_dimensions ORDER BY 1 ASC’
  )
as

(

name character varying(230),

age character varying(160),

status character varying(160)

);

which returns this record set

“OPD 1st Attendance Clinical Case of Malaria 1 to Under 5 Years”;"Age

1-5";“OPD”

“OPD 1st Attendance Clinical Case of Malaria Over 5 Years”;“Over 5”;“OPD”

“OPD 1st Attendance Clinical Case of Malaria Under 1 Year”;“Under 1”;“OPD”

OK, admittedly, I cheated a bit and used the crosstab function of

Postgresql, but I assume that this query could be rewritten with a few

more lines of code in standard SQL or some procedural language like

Java. Now, this record set looks like something that I can almost use

with the aggregateddatavalue table simply by joining up the table on

the appropriate dataelementid and pulling everything into a pivot

table. I would not have any duplicated values and would have columns

like data element name, period, orgunit, age, patient status and of

course the value of the data element. I hope that part is pretty

clear. Just join up that table to the aggregateddata table, and you

have pretty much what is needed to pull the data directly into a

PivotTable for further analysis.

This is not a complete example, but it is very close to what I need

here ,and I think this type of functionality would be much more useful

than the current data element categories functionality. Basically, all

that would be required, at least initially, would be another user

interface screen to allow the definition of which category(ies) and

category options a data element is a member of. The rest could ,in the

first instance be executed with custom SQL (obviously, I am partial to

this language and hobbled by the fact that I do not know Java), but

eventually this would need to be implemented somehow in Java.

I am not sure if this really solves all of the issues surrounding

multidimensional analysis of data elements, but it seems to solve the

issues that I am having by trying to assign some sort of dimensional

hierarchy to data elements (similar to the exclusive/compulsory

functionality of orgunits). Any thoughts on this?

Best regards,

Jason

On Wed, Sep 16, 2009 at 10:28 PM, Jason Pickering > > > jason.p.pickering@gmail.com wrote:

On Wed, Sep 16, 2009 at 10:13 PM, johansa@ifi.uio.no wrote:

However, there does seem to be the ability to assign dimensions,

there

does

not seem to be the ability to assign particular elements within

those

dimensions to a particular DHIS data element.

Just some more clarification here: you can make category combos which

you

assign to data elements. However, it is not possible to assign just

specific parts of a category combo (only some of the category options)

to

a data element.

Yes, this was exactly what I wanted. Assigning different categories

would

seem to break the dimensionality.

Then you must make a specific category (as the only one in

or part of a new category combo) with just those options. It can be

hell;

in Tajikistan there were way over 20 categories I think, at least 10

just

on various age groups.

Johan

This was my fear.

I will need to do some testing and see. I still fear it is not exactly

the

intended functionality.

Basically, I think I need something akin to the exclusive/compulsory

groups

that are in place for organizational units, but instead, for arbitrary

dimensions. I will give a try and see what happens.

Thanks,

Jason

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

jason · 25 September 2009 09:44

Hi there.

My basic issue with the category/category combo is that it appears to be a one-to-one relationship with data elements. If I look at the data model, there is a one-to-one relationship between dataelement and categorycomboid. For a given category combo, you can have multiple options. So, you can establish a relationship for a given data element and a group of category options.

Let me try and describe the issue. We have a set of data elements related to malaria for this example. We would like to be able to pivot the data on other dimensions dimensions (Data element, age, disease, patient status). Obviously there are other dimensions that are pivotable (orgunit, period, dataset)

The data elements look like this. I have put the dimensions in square brackets, and the dimensional elements into curly brackets.

[Data element, Age, Disease, Patient status]
Deaths Confirmed Malaria total (composed of) {All ages, Malaria Cases, Deaths}

Deaths Confirmed Malaria 1 to Under 5 Years {1-5, Malaria Cases, Deaths}
Deaths Confirmed Malaria Over 5 Years {Over 5, Malaria Cases, Deaths}
Deaths Confirmed Malaria Under 1 Year {Under 1, Malaria Cases, Deaths}

IP Discharge Confirmed Malaria total (composed of) {All ages, Malaria Cases, IP}
IP Discharge Confirmed Malaria 1 to Under 5 Years {1-5, Malaria Cases, IP}
IP Discharge Confirmed Malaria Over 5 Years {Over 5, Malaria Cases, Deaths}

IP Discharge Confirmed Malaria Under 1 Year {Under 1, Malaria Cases, Deaths}
OPD 1st Attendance Confirmed Malaria total (composed of) {All ages, Malaria Cases, OPD}
OPD 1st Attendance Confirmed Malaria 1 to Under 5 Years {1-5, Malaria Cases, OPD}

OPD 1st Attendance Confirmed Malaria Over 5 Years {Over 5, Malaria Cases, OPD}
OPD 1st Attendance Confirmed Malaria Under 1 Year {Under 1, Malaria Cases, OPD}

OK, I hope this is pretty clear. Obviously, there are more data elements (Typhoid, Yellow fever, etc). I might want to know how many Under 1 deaths I have had for all diseases, or how many OPD cases I have had for each disease. How can I do this with the existing data model? It is not obvious to me because there is no relationship between dimensional elements (categoryoptions) to each other. Category options can be related through a cateogry combination, but since data elements can only be assigned a single category option, the dimensionality is broken once it gets time to pull the data into a pivot table.

In the incomplete example that I gave yesterday, I established a one-to-many relationship between a data element and a dimension. If I understand the current data model, I would have to create a separate categorycombo for each of these data elements, and assign this categorycombo to the data element. Now, I might be able to unfold the dimensions using the categories and categorycombos. I it is not apparent how the dimensional elements correspond themselves to a particular dimension, as there is no relation for this in the database as I can see it.

As (Johan pointed out a few mails ago, if I understand him correctly) is different categorycombo’s can be created for individual data elements, and assigned to these elements. However, this seems to be 1) incredibly inefficient and 2) does not establish any relationship between dimensional elements and dimensions. Perhaps it is there, and maybe it has been done in SL, but the SQL is not apparent to me at all.

It would appear to me, looking from an SQL perspective, that a one-to-many relationship between a data element, a dimension (category) and dimensional element (category combo) would be much more effieicnet, and highly usable from an SQL perspective. As I mentioned in my mail, I am not sure how easy this would be to implement in a procedural language like Java, but I assume it should be possible to either do it this way, or rewrite my Postgres proprietary query in standard SQL (which there are ways to do with ANSI SQL). This would require modification to the data model (similar to the table I provided yesterday) and modification to the UI to allow users to 1) select a dimension (category) 2) Select a dimensional element for the given dimension. This would populate the table with a dataelementid, a dimensionid (categoryid) and a dimensional element (cateogryoptionid).

My gut feeling this is exactly the same functionality as has currently been implemented for organizational units. Users can define a hierarchy for organizational units, and then assign them to categories/dimension/organizational group sets, decide whether the groups are compulsory and exlusive, and then assign a particular organizational unit to a particular group (which is analogous to a dimensional element). Organizational group sets define the dimension, and one-to-one assignment of an organizational unit to a particular organizational group defines which dimensional element the organizational unit is a member of. These dimensions can then be used in PivotTable analyses, where the orgunitgroupsets become dimensions, and orgunitgroups become dimensional elements…

I beleive that data elements are no different than organizational units. They should be able to be grouped into some sort of hierarchy and pivoted on any dimension. Data elements groups establish a one-to-many relationship between data elements and a data element group, but there is no concept of how data element groups relate to each other.

I think this is perhaps the same concept you mention, ReportSet.

I suspect we would need to potentially rethink the entire concept of multidimensionality if we really wanted to get it right. It would see to me that the DHIS datamodel and associated aggregation methods have been hardwired into aggregation across time (period) and geography (orgunit). What we can do with PivotTables and (and OLAP) is to aggregate across any possible dimension, slicing as you mention ,on any dimension . I am not sure this will be so simple to implement but I think there is a way to do it, without major modifications.

I am not sure it solves the SDMX issue. There are potential issues related to “ragged” dimensions and how these get handled. Some data elements might have three dimensions, while others may have more. I have not thought about this in detail, but know it is an issue with cross-tab queries in SQL. You normally have to know how many dimensions you are working with in order to perform a cross-tab, but there are dynamic solutions. Perhaps this could be dealt with somehow in SDMX.

Anyway ,I am rambling. Hope this mail helps though to push my point further. Once I get the SQL from SL, I will see if perhaps it has been done already, and that I am just writing long emails for nothing.

Regards,

Jason

···

On Fri, Sep 25, 2009 at 10:44 AM, Bob Jolliffe bobjolliffe@gmail.com wrote:

Hi Jason and Johan

I’m really pleased to see you having this discussion as I have been

grappling with a similar issue which involves unravelling categories,
category options and combos into something more familiar. I have reached
similar conclusions regarding nomenclature:

category = dimension

categorycombo - I have been calling a dimension set (it bears a strong, and
useful, resemblance to xslt:attribute-set)
category option - I like your suggestion of DimensionalElement. I am going
to start calling it that too.

In my case I need to export (and import data) into a standard format called
sdmx. So whereas in the DHIS2 native DXF we export datavalues with
effectively three dimensions (source, period, categorycombooption) the last

dimension is a sort of uber-dimension. Like a peppercorn or a cardamon
seed, when you break it open it explodes its rich complexity of dimensions.

In sdmx we need the dimensions exploded. So data values look like:

<datavalue name="TB test given" uid="44344 ...44" gender="Female"

age=“0-5” value=“38” />
…

My approach to unpicking the dimensions from the dxf file is to transform it
with an xslt transformation which is still incomplete but seems to work

well.

One other nomenclatures issue which has surfaced as a result is what we call
a “dataset”. In DHIS2, if I understand correctly, a dataset corresponds
roughly to all the dataelements which might occur on a datacollection form.

If we view all dataelements as having just the three “dimensions” then all
is well, but if we explode the actual dimensions then we have an issue. In
the sdmx model a dataset consists only of dataelements with the same

dimensionset. After discussing this with Ola we have reached the conclusion
that we need another level of grouping, primarily for the UI - eg FormSet or
ReportSet which allows us to group related datasets. But that is an aside

from what you are talking about.

I know that you guys can do magic with sql, but it seems that we should try
to capture some of this and place it down in the datamodel API. It occurs
to me that for a multidimensional dataelement we might benefit from some

utility methods to retrieve slices and dices which might assist in
constructing the pivot tables around dimensions. Does this sound like the
right thing to do.

Regards
Bob

2009/9/24 johansa@ifi.uio.no

Jason,
I will leave to others to comment the code, but I have a few comments…

I have done a bit more thinking on this, and would like to offer some
more examples up for discussion.

Basically, we have a lot of data elements that are somehow related to

each other, similar to my kooky example in my original mail. I assume
this is fairly common throughout other HMIS systems. Here, malaria
attendance is broken down into various dimensions/category by patient

type (outpatient, inpatient, and deaths) and by age (under 1 ,1-5 and
over 5). But say you want to be able to pivot to look at outpatient,
inpatient and deaths totals (i.e. summed up by age). Well, you could

create a separate data element for this, but it sure would be nice to
be able to Pivot the data somehow.

In the Sierra Leone db, Edem and Romain set up views that pulled the

categories through into a “Category” pivot field, which you can then use
to get what you want. Simply tick the categories (see below) you want to
see, and group them together in excel. Maybe Edem and Romain can help

further here.

Dimension ? Category
Dimensional element ? Category option ? Category combo ( I think)

The right symbol disappeared from my reply-mail here, but some

clarification:

Crosstab Dimension (age AND gender) = Category combo
Dimension (age, gender) = Category
Dimensional element (inpatient, outpatient, death, under1, 1-5, and over

= Category option

So by assigning a DE the category combo of “gender_age”, you get 9
dimensional elements, 3 category options (in category age) by 3 category
options (in category gender)

Johan

Anyway, here is the helper table I created.

CREATE TABLE test_dataelementcategorycombo
(

test_dataelementid integer NOT NULL,
test_dataelementcategoryid integer NOT NULL,
test_dataelementcategorycomboid integer NOT NULL,
CONSTRAINT pk_testdataelementcategory PRIMARY KEY

(test_dataelementid, test_dataelementcategoryid,
test_dataelementcategorycomboid)
)
WITH (OIDS=FALSE);

So this is a real simple table which references a data element, a

data element category, and a data element combo. The reference to a
data element category may be redundant, but anyway, lets leave it in
for now.

I populated the table with some data, which will be used to assign

dimensions to data elements. It looks like this in my DB, which looks
like this.

309;25250;25251
309;25257;25255
348;25250;25252

348;25257;25255
455;25250;25253
455;25257;25255

but of course this is meaningless to you. What do these values
correspond

to?

“OPD 1st Attendance Clinical Case of Malaria Under 1 Year”;“Age”;“Under
1”
"OPD 1st Attendance Clinical Case of Malaria 1 to Under 5

Years";“Age”;“Age 1-5”
“OPD 1st Attendance Clinical Case of Malaria Over 5 Years”;“Age”;“Over
5”
“OPD 1st Attendance Clinical Case of Malaria Under 1 Year”;"Patient

status";“OPD”
“OPD 1st Attendance Clinical Case of Malaria 1 to Under 5
Years”;“Patient status”;“OPD”
“OPD 1st Attendance Clinical Case of Malaria Over 5 Years”;"Patient

status";“OPD”

which can be produced by the following view.

CREATE OR REPLACE VIEW vw_dataelements_dimensions AS
SELECT dataelement.name, dataelementcategory.name AS dimension,

dataelementcategoryoption.name AS dimension_element
FROM dataelement
JOIN test_dataelementcategorycombo ON
test_dataelementcategorycombo.test_dataelementid =

dataelement.dataelementid
JOIN dataelementcategory ON dataelementcategory.categoryid =
test_dataelementcategorycombo.test_dataelementcategoryid
JOIN dataelementcategoryoption ON

test_dataelementcategorycombo.test_dataelementcategorycomboid =
dataelementcategoryoption.categoryoptionid;

So, that view just provides a human readable view of those integers

that I populated in the the test_dataelementcategorycombo table I
created above. This table just assigns particular data elements to
different category options (dimensional elements).

OK, so far so good, but the problem now is, how to use this with the
aggregatedatavalue table? If we try and join this table directly, we
will have issues with duplicates in the pivot table, so we need to

transform the data slightly.

This should do the trick.

SELECT * FROM crosstab
(
'SELECT name, dimension, dimension_element FROM

vw_dataelements_dimensions ORDER BY 1,2,3’,
‘SELECT DISTINCT dimension from vw_dataelements_dimensions ORDER BY 1
ASC’
)
as

(
name character varying(230),
age character varying(160),
status character varying(160)
);

which returns this record set

“OPD 1st Attendance Clinical Case of Malaria 1 to Under 5 Years”;“Age
1-5”;“OPD”
“OPD 1st Attendance Clinical Case of Malaria Over 5 Years”;"Over

5";“OPD”
“OPD 1st Attendance Clinical Case of Malaria Under 1 Year”;“Under
1”;“OPD”

OK, admittedly, I cheated a bit and used the crosstab function of

Postgresql, but I assume that this query could be rewritten with a few
more lines of code in standard SQL or some procedural language like
Java. Now, this record set looks like something that I can almost use

with the aggregateddatavalue table simply by joining up the table on
the appropriate dataelementid and pulling everything into a pivot
table. I would not have any duplicated values and would have columns

like data element name, period, orgunit, age, patient status and of
course the value of the data element. I hope that part is pretty
clear. Just join up that table to the aggregateddata table, and you

have pretty much what is needed to pull the data directly into a
PivotTable for further analysis.

This is not a complete example, but it is very close to what I need

here ,and I think this type of functionality would be much more useful
than the current data element categories functionality. Basically, all
that would be required, at least initially, would be another user

interface screen to allow the definition of which category(ies) and
category options a data element is a member of. The rest could ,in the
first instance be executed with custom SQL (obviously, I am partial to

this language and hobbled by the fact that I do not know Java), but
eventually this would need to be implemented somehow in Java.

I am not sure if this really solves all of the issues surrounding

multidimensional analysis of data elements, but it seems to solve the
issues that I am having by trying to assign some sort of dimensional
hierarchy to data elements (similar to the exclusive/compulsory

functionality of orgunits). Any thoughts on this?

Best regards,
Jason

On Wed, Sep 16, 2009 at 10:28 PM, Jason Pickering >> > jason.p.pickering@gmail.com wrote:

On Wed, Sep 16, 2009 at 10:13 PM, johansa@ifi.uio.no wrote:

However, there does seem to be the ability to assign dimensions,
there
does
not seem to be the ability to assign particular elements within

those

dimensions to a particular DHIS data element.

Just some more clarification here: you can make category combos which

you
assign to data elements. However, it is not possible to assign just
specific parts of a category combo (only some of the category options)
to

a data element.

Yes, this was exactly what I wanted. Assigning different categories
would
seem to break the dimensionality.

Then you must make a specific category (as the only one in
or part of a new category combo) with just those options. It can be

hell;
in Tajikistan there were way over 20 categories I think, at least 10
just
on various age groups.

Johan

This was my fear.

I will need to do some testing and see. I still fear it is not exactly

the
intended functionality.

Basically, I think I need something akin to the exclusive/compulsory
groups
that are in place for organizational units, but instead, for arbitrary

dimensions. I will give a try and see what happens.

Thanks,
Jason

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-devs
More help : https://help.launchpad.net/ListHelp

Abyot_Gizaw · 25 September 2009 09:55

The one-to-one relationship mentioned between dataelement and categorycombo is not correct !

The realtionship is one-to-many. A categorycombo can be assigned for many dataelements. But a dataelement can have only one categorycombo.

Thank you
Abyot.

···

On Fri, Sep 25, 2009 at 11:44 AM, Jason Pickering jason.p.pickering@gmail.com wrote:

Hi there.

My basic issue with the category/category combo is that it appears to be a one-to-one relationship with data elements. If I look at the data model, there is a one-to-one relationship between dataelement and categorycomboid. For a given category combo, you can have multiple options. So, you can establish a relationship for a given data element and a group of category options.

Let me try and describe the issue. We have a set of data elements related to malaria for this example. We would like to be able to pivot the data on other dimensions dimensions (Data element, age, disease, patient status). Obviously there are other dimensions that are pivotable (orgunit, period, dataset)

The data elements look like this. I have put the dimensions in square brackets, and the dimensional elements into curly brackets.

[Data element, Age, Disease, Patient status]
Deaths Confirmed Malaria total (composed of) {All ages, Malaria Cases, Deaths}

Deaths Confirmed Malaria 1 to Under 5 Years {1-5, Malaria Cases, Deaths}
Deaths Confirmed Malaria Over 5 Years {Over 5, Malaria Cases, Deaths}
Deaths Confirmed Malaria Under 1 Year {Under 1, Malaria Cases, Deaths}

IP Discharge Confirmed Malaria total (composed of) {All ages, Malaria Cases, IP}
IP Discharge Confirmed Malaria 1 to Under 5 Years {1-5, Malaria Cases, IP}
IP Discharge Confirmed Malaria Over 5 Years {Over 5, Malaria Cases, Deaths}

IP Discharge Confirmed Malaria Under 1 Year {Under 1, Malaria Cases, Deaths}
OPD 1st Attendance Confirmed Malaria total (composed of) {All ages, Malaria Cases, OPD}
OPD 1st Attendance Confirmed Malaria 1 to Under 5 Years {1-5, Malaria Cases, OPD}

OPD 1st Attendance Confirmed Malaria Over 5 Years {Over 5, Malaria Cases, OPD}
OPD 1st Attendance Confirmed Malaria Under 1 Year {Under 1, Malaria Cases, OPD}

OK, I hope this is pretty clear. Obviously, there are more data elements (Typhoid, Yellow fever, etc). I might want to know how many Under 1 deaths I have had for all diseases, or how many OPD cases I have had for each disease. How can I do this with the existing data model? It is not obvious to me because there is no relationship between dimensional elements (categoryoptions) to each other. Category options can be related through a cateogry combination, but since data elements can only be assigned a single category option, the dimensionality is broken once it gets time to pull the data into a pivot table.

In the incomplete example that I gave yesterday, I established a one-to-many relationship between a data element and a dimension. If I understand the current data model, I would have to create a separate categorycombo for each of these data elements, and assign this categorycombo to the data element. Now, I might be able to unfold the dimensions using the categories and categorycombos. I it is not apparent how the dimensional elements correspond themselves to a particular dimension, as there is no relation for this in the database as I can see it.

As (Johan pointed out a few mails ago, if I understand him correctly) is different categorycombo’s can be created for individual data elements, and assigned to these elements. However, this seems to be 1) incredibly inefficient and 2) does not establish any relationship between dimensional elements and dimensions. Perhaps it is there, and maybe it has been done in SL, but the SQL is not apparent to me at all.

It would appear to me, looking from an SQL perspective, that a one-to-many relationship between a data element, a dimension (category) and dimensional element (category combo) would be much more effieicnet, and highly usable from an SQL perspective. As I mentioned in my mail, I am not sure how easy this would be to implement in a procedural language like Java, but I assume it should be possible to either do it this way, or rewrite my Postgres proprietary query in standard SQL (which there are ways to do with ANSI SQL). This would require modification to the data model (similar to the table I provided yesterday) and modification to the UI to allow users to 1) select a dimension (category) 2) Select a dimensional element for the given dimension. This would populate the table with a dataelementid, a dimensionid (categoryid) and a dimensional element (cateogryoptionid).

My gut feeling this is exactly the same functionality as has currently been implemented for organizational units. Users can define a hierarchy for organizational units, and then assign them to categories/dimension/organizational group sets, decide whether the groups are compulsory and exlusive, and then assign a particular organizational unit to a particular group (which is analogous to a dimensional element). Organizational group sets define the dimension, and one-to-one assignment of an organizational unit to a particular organizational group defines which dimensional element the organizational unit is a member of. These dimensions can then be used in PivotTable analyses, where the orgunitgroupsets become dimensions, and orgunitgroups become dimensional elements…

I beleive that data elements are no different than organizational units. They should be able to be grouped into some sort of hierarchy and pivoted on any dimension. Data elements groups establish a one-to-many relationship between data elements and a data element group, but there is no concept of how data element groups relate to each other.

I think this is perhaps the same concept you mention, ReportSet.

I suspect we would need to potentially rethink the entire concept of multidimensionality if we really wanted to get it right. It would see to me that the DHIS datamodel and associated aggregation methods have been hardwired into aggregation across time (period) and geography (orgunit). What we can do with PivotTables and (and OLAP) is to aggregate across any possible dimension, slicing as you mention ,on any dimension . I am not sure this will be so simple to implement but I think there is a way to do it, without major modifications.

I am not sure it solves the SDMX issue. There are potential issues related to “ragged” dimensions and how these get handled. Some data elements might have three dimensions, while others may have more. I have not thought about this in detail, but know it is an issue with cross-tab queries in SQL. You normally have to know how many dimensions you are working with in order to perform a cross-tab, but there are dynamic solutions. Perhaps this could be dealt with somehow in SDMX.

Anyway ,I am rambling. Hope this mail helps though to push my point further. Once I get the SQL from SL, I will see if perhaps it has been done already, and that I am just writing long emails for nothing.

Regards,

Jason

On Fri, Sep 25, 2009 at 10:44 AM, Bob Jolliffe bobjolliffe@gmail.com wrote:

Hi Jason and Johan

I’m really pleased to see you having this discussion as I have been

grappling with a similar issue which involves unravelling categories,
category options and combos into something more familiar. I have reached
similar conclusions regarding nomenclature:

category = dimension

categorycombo - I have been calling a dimension set (it bears a strong, and
useful, resemblance to xslt:attribute-set)
category option - I like your suggestion of DimensionalElement. I am going
to start calling it that too.

In my case I need to export (and import data) into a standard format called
sdmx. So whereas in the DHIS2 native DXF we export datavalues with
effectively three dimensions (source, period, categorycombooption) the last

dimension is a sort of uber-dimension. Like a peppercorn or a cardamon
seed, when you break it open it explodes its rich complexity of dimensions.

In sdmx we need the dimensions exploded. So data values look like:

<datavalue name="TB test given" uid="44344 ...44" gender="Female"

age=“0-5” value=“38” />
…

My approach to unpicking the dimensions from the dxf file is to transform it
with an xslt transformation which is still incomplete but seems to work

well.

One other nomenclatures issue which has surfaced as a result is what we call
a “dataset”. In DHIS2, if I understand correctly, a dataset corresponds
roughly to all the dataelements which might occur on a datacollection form.

If we view all dataelements as having just the three “dimensions” then all
is well, but if we explode the actual dimensions then we have an issue. In
the sdmx model a dataset consists only of dataelements with the same

dimensionset. After discussing this with Ola we have reached the conclusion
that we need another level of grouping, primarily for the UI - eg FormSet or
ReportSet which allows us to group related datasets. But that is an aside

from what you are talking about.

I know that you guys can do magic with sql, but it seems that we should try
to capture some of this and place it down in the datamodel API. It occurs
to me that for a multidimensional dataelement we might benefit from some

utility methods to retrieve slices and dices which might assist in
constructing the pivot tables around dimensions. Does this sound like the
right thing to do.

Regards
Bob

2009/9/24 johansa@ifi.uio.no

Jason,
I will leave to others to comment the code, but I have a few comments…

I have done a bit more thinking on this, and would like to offer some
more examples up for discussion.

Basically, we have a lot of data elements that are somehow related to

each other, similar to my kooky example in my original mail. I assume
this is fairly common throughout other HMIS systems. Here, malaria
attendance is broken down into various dimensions/category by patient

type (outpatient, inpatient, and deaths) and by age (under 1 ,1-5 and
over 5). But say you want to be able to pivot to look at outpatient,
inpatient and deaths totals (i.e. summed up by age). Well, you could

create a separate data element for this, but it sure would be nice to
be able to Pivot the data somehow.

In the Sierra Leone db, Edem and Romain set up views that pulled the

categories through into a “Category” pivot field, which you can then use
to get what you want. Simply tick the categories (see below) you want to
see, and group them together in excel. Maybe Edem and Romain can help

further here.

Dimension ? Category
Dimensional element ? Category option ? Category combo ( I think)

The right symbol disappeared from my reply-mail here, but some

clarification:

Crosstab Dimension (age AND gender) = Category combo
Dimension (age, gender) = Category
Dimensional element (inpatient, outpatient, death, under1, 1-5, and over

= Category option

So by assigning a DE the category combo of “gender_age”, you get 9
dimensional elements, 3 category options (in category age) by 3 category
options (in category gender)

Johan

Anyway, here is the helper table I created.

CREATE TABLE test_dataelementcategorycombo
(

test_dataelementid integer NOT NULL,
test_dataelementcategoryid integer NOT NULL,
test_dataelementcategorycomboid integer NOT NULL,
CONSTRAINT pk_testdataelementcategory PRIMARY KEY

(test_dataelementid, test_dataelementcategoryid,
test_dataelementcategorycomboid)
)
WITH (OIDS=FALSE);

So this is a real simple table which references a data element, a

data element category, and a data element combo. The reference to a
data element category may be redundant, but anyway, lets leave it in
for now.

I populated the table with some data, which will be used to assign

dimensions to data elements. It looks like this in my DB, which looks
like this.

309;25250;25251
309;25257;25255
348;25250;25252

348;25257;25255
455;25250;25253
455;25257;25255

but of course this is meaningless to you. What do these values
correspond

to?

“OPD 1st Attendance Clinical Case of Malaria Under 1 Year”;“Age”;“Under
1”
"OPD 1st Attendance Clinical Case of Malaria 1 to Under 5

Years";“Age”;“Age 1-5”
“OPD 1st Attendance Clinical Case of Malaria Over 5 Years”;“Age”;“Over
5”
“OPD 1st Attendance Clinical Case of Malaria Under 1 Year”;"Patient

status";“OPD”
“OPD 1st Attendance Clinical Case of Malaria 1 to Under 5
Years”;“Patient status”;“OPD”
“OPD 1st Attendance Clinical Case of Malaria Over 5 Years”;"Patient

status";“OPD”

which can be produced by the following view.

CREATE OR REPLACE VIEW vw_dataelements_dimensions AS
SELECT dataelement.name, dataelementcategory.name AS dimension,

dataelementcategoryoption.name AS dimension_element
FROM dataelement
JOIN test_dataelementcategorycombo ON

test_dataelementcategorycombo.test_dataelementid =

dataelement.dataelementid
JOIN dataelementcategory ON dataelementcategory.categoryid =
test_dataelementcategorycombo.test_dataelementcategoryid
JOIN dataelementcategoryoption ON

test_dataelementcategorycombo.test_dataelementcategorycomboid =
dataelementcategoryoption.categoryoptionid;

So, that view just provides a human readable view of those integers

that I populated in the the test_dataelementcategorycombo table I
created above. This table just assigns particular data elements to
different category options (dimensional elements).

OK, so far so good, but the problem now is, how to use this with the
aggregatedatavalue table? If we try and join this table directly, we
will have issues with duplicates in the pivot table, so we need to

transform the data slightly.

This should do the trick.

SELECT * FROM crosstab
(
'SELECT name, dimension, dimension_element FROM

vw_dataelements_dimensions ORDER BY 1,2,3’,
‘SELECT DISTINCT dimension from vw_dataelements_dimensions ORDER BY 1
ASC’
)
as

(
name character varying(230),
age character varying(160),
status character varying(160)
);

which returns this record set

“OPD 1st Attendance Clinical Case of Malaria 1 to Under 5 Years”;“Age
1-5”;“OPD”
“OPD 1st Attendance Clinical Case of Malaria Over 5 Years”;"Over

5";“OPD”
“OPD 1st Attendance Clinical Case of Malaria Under 1 Year”;“Under
1”;“OPD”

OK, admittedly, I cheated a bit and used the crosstab function of

Postgresql, but I assume that this query could be rewritten with a few
more lines of code in standard SQL or some procedural language like
Java. Now, this record set looks like something that I can almost use

with the aggregateddatavalue table simply by joining up the table on
the appropriate dataelementid and pulling everything into a pivot
table. I would not have any duplicated values and would have columns

like data element name, period, orgunit, age, patient status and of
course the value of the data element. I hope that part is pretty
clear. Just join up that table to the aggregateddata table, and you

have pretty much what is needed to pull the data directly into a
PivotTable for further analysis.

This is not a complete example, but it is very close to what I need

here ,and I think this type of functionality would be much more useful
than the current data element categories functionality. Basically, all
that would be required, at least initially, would be another user

interface screen to allow the definition of which category(ies) and
category options a data element is a member of. The rest could ,in the
first instance be executed with custom SQL (obviously, I am partial to

this language and hobbled by the fact that I do not know Java), but
eventually this would need to be implemented somehow in Java.

I am not sure if this really solves all of the issues surrounding

multidimensional analysis of data elements, but it seems to solve the
issues that I am having by trying to assign some sort of dimensional
hierarchy to data elements (similar to the exclusive/compulsory

functionality of orgunits). Any thoughts on this?

Best regards,
Jason

On Wed, Sep 16, 2009 at 10:28 PM, Jason Pickering > > > >> > jason.p.pickering@gmail.com wrote:

On Wed, Sep 16, 2009 at 10:13 PM, johansa@ifi.uio.no wrote:

However, there does seem to be the ability to assign dimensions,
there
does
not seem to be the ability to assign particular elements within

those

dimensions to a particular DHIS data element.

Just some more clarification here: you can make category combos which

you
assign to data elements. However, it is not possible to assign just
specific parts of a category combo (only some of the category options)
to

a data element.

Yes, this was exactly what I wanted. Assigning different categories
would
seem to break the dimensionality.

Then you must make a specific category (as the only one in
or part of a new category combo) with just those options. It can be

hell;
in Tajikistan there were way over 20 categories I think, at least 10
just
on various age groups.

Johan

This was my fear.

I will need to do some testing and see. I still fear it is not exactly

the
intended functionality.

Basically, I think I need something akin to the exclusive/compulsory
groups
that are in place for organizational units, but instead, for arbitrary

dimensions. I will give a try and see what happens.

Thanks,
Jason

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Knut_Staring · 25 September 2009 10:07

http://208.76.222.114/confluence/display/RandD/General+multi-dimensional+model

···

On Fri, Sep 25, 2009 at 11:55 AM, Abyot Gizaw abyota@gmail.com wrote:

The one-to-one relationship mentioned between dataelement and categorycombo is not correct !

The realtionship is one-to-many. A categorycombo can be assigned for many dataelements. But a dataelement can have only one categorycombo.

Thank you
Abyot.

On Fri, Sep 25, 2009 at 11:44 AM, Jason Pickering jason.p.pickering@gmail.com wrote:

Hi there.

My basic issue with the category/category combo is that it appears to be a one-to-one relationship with data elements. If I look at the data model, there is a one-to-one relationship between dataelement and categorycomboid. For a given category combo, you can have multiple options. So, you can establish a relationship for a given data element and a group of category options.

Let me try and describe the issue. We have a set of data elements related to malaria for this example. We would like to be able to pivot the data on other dimensions dimensions (Data element, age, disease, patient status). Obviously there are other dimensions that are pivotable (orgunit, period, dataset)

The data elements look like this. I have put the dimensions in square brackets, and the dimensional elements into curly brackets.

[Data element, Age, Disease, Patient status]
Deaths Confirmed Malaria total (composed of) {All ages, Malaria Cases, Deaths}

Deaths Confirmed Malaria 1 to Under 5 Years {1-5, Malaria Cases, Deaths}
Deaths Confirmed Malaria Over 5 Years {Over 5, Malaria Cases, Deaths}
Deaths Confirmed Malaria Under 1 Year {Under 1, Malaria Cases, Deaths}

IP Discharge Confirmed Malaria total (composed of) {All ages, Malaria Cases, IP}
IP Discharge Confirmed Malaria 1 to Under 5 Years {1-5, Malaria Cases, IP}
IP Discharge Confirmed Malaria Over 5 Years {Over 5, Malaria Cases, Deaths}

IP Discharge Confirmed Malaria Under 1 Year {Under 1, Malaria Cases, Deaths}
OPD 1st Attendance Confirmed Malaria total (composed of) {All ages, Malaria Cases, OPD}
OPD 1st Attendance Confirmed Malaria 1 to Under 5 Years {1-5, Malaria Cases, OPD}

OPD 1st Attendance Confirmed Malaria Over 5 Years {Over 5, Malaria Cases, OPD}
OPD 1st Attendance Confirmed Malaria Under 1 Year {Under 1, Malaria Cases, OPD}

OK, I hope this is pretty clear. Obviously, there are more data elements (Typhoid, Yellow fever, etc). I might want to know how many Under 1 deaths I have had for all diseases, or how many OPD cases I have had for each disease. How can I do this with the existing data model? It is not obvious to me because there is no relationship between dimensional elements (categoryoptions) to each other. Category options can be related through a cateogry combination, but since data elements can only be assigned a single category option, the dimensionality is broken once it gets time to pull the data into a pivot table.

In the incomplete example that I gave yesterday, I established a one-to-many relationship between a data element and a dimension. If I understand the current data model, I would have to create a separate categorycombo for each of these data elements, and assign this categorycombo to the data element. Now, I might be able to unfold the dimensions using the categories and categorycombos. I it is not apparent how the dimensional elements correspond themselves to a particular dimension, as there is no relation for this in the database as I can see it.

As (Johan pointed out a few mails ago, if I understand him correctly) is different categorycombo’s can be created for individual data elements, and assigned to these elements. However, this seems to be 1) incredibly inefficient and 2) does not establish any relationship between dimensional elements and dimensions. Perhaps it is there, and maybe it has been done in SL, but the SQL is not apparent to me at all.

It would appear to me, looking from an SQL perspective, that a one-to-many relationship between a data element, a dimension (category) and dimensional element (category combo) would be much more effieicnet, and highly usable from an SQL perspective. As I mentioned in my mail, I am not sure how easy this would be to implement in a procedural language like Java, but I assume it should be possible to either do it this way, or rewrite my Postgres proprietary query in standard SQL (which there are ways to do with ANSI SQL). This would require modification to the data model (similar to the table I provided yesterday) and modification to the UI to allow users to 1) select a dimension (category) 2) Select a dimensional element for the given dimension. This would populate the table with a dataelementid, a dimensionid (categoryid) and a dimensional element (cateogryoptionid).

My gut feeling this is exactly the same functionality as has currently been implemented for organizational units. Users can define a hierarchy for organizational units, and then assign them to categories/dimension/organizational group sets, decide whether the groups are compulsory and exlusive, and then assign a particular organizational unit to a particular group (which is analogous to a dimensional element). Organizational group sets define the dimension, and one-to-one assignment of an organizational unit to a particular organizational group defines which dimensional element the organizational unit is a member of. These dimensions can then be used in PivotTable analyses, where the orgunitgroupsets become dimensions, and orgunitgroups become dimensional elements…

I beleive that data elements are no different than organizational units. They should be able to be grouped into some sort of hierarchy and pivoted on any dimension. Data elements groups establish a one-to-many relationship between data elements and a data element group, but there is no concept of how data element groups relate to each other.

I think this is perhaps the same concept you mention, ReportSet.

I suspect we would need to potentially rethink the entire concept of multidimensionality if we really wanted to get it right. It would see to me that the DHIS datamodel and associated aggregation methods have been hardwired into aggregation across time (period) and geography (orgunit). What we can do with PivotTables and (and OLAP) is to aggregate across any possible dimension, slicing as you mention ,on any dimension . I am not sure this will be so simple to implement but I think there is a way to do it, without major modifications.

I am not sure it solves the SDMX issue. There are potential issues related to “ragged” dimensions and how these get handled. Some data elements might have three dimensions, while others may have more. I have not thought about this in detail, but know it is an issue with cross-tab queries in SQL. You normally have to know how many dimensions you are working with in order to perform a cross-tab, but there are dynamic solutions. Perhaps this could be dealt with somehow in SDMX.

Anyway ,I am rambling. Hope this mail helps though to push my point further. Once I get the SQL from SL, I will see if perhaps it has been done already, and that I am just writing long emails for nothing.

Regards,

Jason

On Fri, Sep 25, 2009 at 10:44 AM, Bob Jolliffe bobjolliffe@gmail.com wrote:

Hi Jason and Johan

I’m really pleased to see you having this discussion as I have been

grappling with a similar issue which involves unravelling categories,
category options and combos into something more familiar. I have reached
similar conclusions regarding nomenclature:

category = dimension

categorycombo - I have been calling a dimension set (it bears a strong, and
useful, resemblance to xslt:attribute-set)
category option - I like your suggestion of DimensionalElement. I am going
to start calling it that too.

In my case I need to export (and import data) into a standard format called
sdmx. So whereas in the DHIS2 native DXF we export datavalues with
effectively three dimensions (source, period, categorycombooption) the last

dimension is a sort of uber-dimension. Like a peppercorn or a cardamon
seed, when you break it open it explodes its rich complexity of dimensions.

In sdmx we need the dimensions exploded. So data values look like:

<datavalue name="TB test given" uid="44344 ...44" gender="Female"

age=“0-5” value=“38” />
…

My approach to unpicking the dimensions from the dxf file is to transform it
with an xslt transformation which is still incomplete but seems to work

well.

One other nomenclatures issue which has surfaced as a result is what we call
a “dataset”. In DHIS2, if I understand correctly, a dataset corresponds
roughly to all the dataelements which might occur on a datacollection form.

If we view all dataelements as having just the three “dimensions” then all
is well, but if we explode the actual dimensions then we have an issue. In
the sdmx model a dataset consists only of dataelements with the same

dimensionset. After discussing this with Ola we have reached the conclusion
that we need another level of grouping, primarily for the UI - eg FormSet or
ReportSet which allows us to group related datasets. But that is an aside

from what you are talking about.

I know that you guys can do magic with sql, but it seems that we should try
to capture some of this and place it down in the datamodel API. It occurs
to me that for a multidimensional dataelement we might benefit from some

utility methods to retrieve slices and dices which might assist in
constructing the pivot tables around dimensions. Does this sound like the
right thing to do.

Regards
Bob

2009/9/24 johansa@ifi.uio.no

Jason,
I will leave to others to comment the code, but I have a few comments…

I have done a bit more thinking on this, and would like to offer some
more examples up for discussion.

Basically, we have a lot of data elements that are somehow related to

each other, similar to my kooky example in my original mail. I assume
this is fairly common throughout other HMIS systems. Here, malaria
attendance is broken down into various dimensions/category by patient

type (outpatient, inpatient, and deaths) and by age (under 1 ,1-5 and
over 5). But say you want to be able to pivot to look at outpatient,
inpatient and deaths totals (i.e. summed up by age). Well, you could

create a separate data element for this, but it sure would be nice to
be able to Pivot the data somehow.

In the Sierra Leone db, Edem and Romain set up views that pulled the

categories through into a “Category” pivot field, which you can then use
to get what you want. Simply tick the categories (see below) you want to
see, and group them together in excel. Maybe Edem and Romain can help

further here.

Dimension ? Category
Dimensional element ? Category option ? Category combo ( I think)

The right symbol disappeared from my reply-mail here, but some

clarification:

Crosstab Dimension (age AND gender) = Category combo
Dimension (age, gender) = Category
Dimensional element (inpatient, outpatient, death, under1, 1-5, and over

= Category option

So by assigning a DE the category combo of “gender_age”, you get 9
dimensional elements, 3 category options (in category age) by 3 category
options (in category gender)

Johan

Anyway, here is the helper table I created.

CREATE TABLE test_dataelementcategorycombo
(

test_dataelementid integer NOT NULL,
test_dataelementcategoryid integer NOT NULL,
test_dataelementcategorycomboid integer NOT NULL,
CONSTRAINT pk_testdataelementcategory PRIMARY KEY

(test_dataelementid, test_dataelementcategoryid,
test_dataelementcategorycomboid)
)
WITH (OIDS=FALSE);

So this is a real simple table which references a data element, a

data element category, and a data element combo. The reference to a
data element category may be redundant, but anyway, lets leave it in
for now.

I populated the table with some data, which will be used to assign

dimensions to data elements. It looks like this in my DB, which looks
like this.

309;25250;25251
309;25257;25255
348;25250;25252

348;25257;25255
455;25250;25253
455;25257;25255

but of course this is meaningless to you. What do these values
correspond

to?

“OPD 1st Attendance Clinical Case of Malaria Under 1 Year”;“Age”;“Under
1”
"OPD 1st Attendance Clinical Case of Malaria 1 to Under 5

Years";“Age”;“Age 1-5”
“OPD 1st Attendance Clinical Case of Malaria Over 5 Years”;“Age”;“Over
5”
“OPD 1st Attendance Clinical Case of Malaria Under 1 Year”;"Patient

status";“OPD”
“OPD 1st Attendance Clinical Case of Malaria 1 to Under 5
Years”;“Patient status”;“OPD”
“OPD 1st Attendance Clinical Case of Malaria Over 5 Years”;"Patient

status";“OPD”

which can be produced by the following view.

CREATE OR REPLACE VIEW vw_dataelements_dimensions AS
SELECT dataelement.name, dataelementcategory.name AS dimension,

dataelementcategoryoption.name AS dimension_element
FROM dataelement
JOIN test_dataelementcategorycombo ON

test_dataelementcategorycombo.test_dataelementid =

dataelement.dataelementid
JOIN dataelementcategory ON dataelementcategory.categoryid =
test_dataelementcategorycombo.test_dataelementcategoryid
JOIN dataelementcategoryoption ON

test_dataelementcategorycombo.test_dataelementcategorycomboid =
dataelementcategoryoption.categoryoptionid;

So, that view just provides a human readable view of those integers

that I populated in the the test_dataelementcategorycombo table I
created above. This table just assigns particular data elements to
different category options (dimensional elements).

OK, so far so good, but the problem now is, how to use this with the
aggregatedatavalue table? If we try and join this table directly, we
will have issues with duplicates in the pivot table, so we need to

transform the data slightly.

This should do the trick.

SELECT * FROM crosstab
(
'SELECT name, dimension, dimension_element FROM

vw_dataelements_dimensions ORDER BY 1,2,3’,
‘SELECT DISTINCT dimension from vw_dataelements_dimensions ORDER BY 1
ASC’
)
as

(
name character varying(230),
age character varying(160),
status character varying(160)
);

which returns this record set

“OPD 1st Attendance Clinical Case of Malaria 1 to Under 5 Years”;“Age
1-5”;“OPD”
“OPD 1st Attendance Clinical Case of Malaria Over 5 Years”;"Over

5";“OPD”
“OPD 1st Attendance Clinical Case of Malaria Under 1 Year”;“Under
1”;“OPD”

OK, admittedly, I cheated a bit and used the crosstab function of

Postgresql, but I assume that this query could be rewritten with a few
more lines of code in standard SQL or some procedural language like
Java. Now, this record set looks like something that I can almost use

with the aggregateddatavalue table simply by joining up the table on
the appropriate dataelementid and pulling everything into a pivot
table. I would not have any duplicated values and would have columns

like data element name, period, orgunit, age, patient status and of
course the value of the data element. I hope that part is pretty
clear. Just join up that table to the aggregateddata table, and you

have pretty much what is needed to pull the data directly into a
PivotTable for further analysis.

This is not a complete example, but it is very close to what I need

here ,and I think this type of functionality would be much more useful
than the current data element categories functionality. Basically, all
that would be required, at least initially, would be another user

interface screen to allow the definition of which category(ies) and
category options a data element is a member of. The rest could ,in the
first instance be executed with custom SQL (obviously, I am partial to

this language and hobbled by the fact that I do not know Java), but
eventually this would need to be implemented somehow in Java.

I am not sure if this really solves all of the issues surrounding

multidimensional analysis of data elements, but it seems to solve the
issues that I am having by trying to assign some sort of dimensional
hierarchy to data elements (similar to the exclusive/compulsory

functionality of orgunits). Any thoughts on this?

Best regards,
Jason

On Wed, Sep 16, 2009 at 10:28 PM, Jason Pickering

jason.p.pickering@gmail.com wrote:

On Wed, Sep 16, 2009 at 10:13 PM, johansa@ifi.uio.no wrote:

However, there does seem to be the ability to assign dimensions,
there
does
not seem to be the ability to assign particular elements within

those

dimensions to a particular DHIS data element.

Just some more clarification here: you can make category combos which

you
assign to data elements. However, it is not possible to assign just
specific parts of a category combo (only some of the category options)
to

a data element.

Yes, this was exactly what I wanted. Assigning different categories
would
seem to break the dimensionality.

Then you must make a specific category (as the only one in
or part of a new category combo) with just those options. It can be

hell;
in Tajikistan there were way over 20 categories I think, at least 10
just
on various age groups.

Johan

This was my fear.

I will need to do some testing and see. I still fear it is not exactly

the
intended functionality.

Basically, I think I need something akin to the exclusive/compulsory
groups
that are in place for organizational units, but instead, for arbitrary

dimensions. I will give a try and see what happens.

Thanks,
Jason

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

–
Cheers,
Knut Staring

Knut_Staring · 25 September 2009 10:09

I am in favor of making the terminology adhere more to the general terms in use in the Data warehouse/OLAP/SDMX fields (though it must also be understandable to the users, of course - and documented!)

Knut

···

On Fri, Sep 25, 2009 at 12:07 PM, Knut Staring knutst@gmail.com wrote:

http://208.76.222.114/confluence/display/RandD/General+multi-dimensional+model

On Fri, Sep 25, 2009 at 11:55 AM, Abyot Gizaw abyota@gmail.com wrote:

The one-to-one relationship mentioned between dataelement and categorycombo is not correct !

The realtionship is one-to-many. A categorycombo can be assigned for many dataelements. But a dataelement can have only one categorycombo.

Thank you
Abyot.

On Fri, Sep 25, 2009 at 11:44 AM, Jason Pickering jason.p.pickering@gmail.com wrote:

Hi there.

My basic issue with the category/category combo is that it appears to be a one-to-one relationship with data elements. If I look at the data model, there is a one-to-one relationship between dataelement and categorycomboid. For a given category combo, you can have multiple options. So, you can establish a relationship for a given data element and a group of category options.

Let me try and describe the issue. We have a set of data elements related to malaria for this example. We would like to be able to pivot the data on other dimensions dimensions (Data element, age, disease, patient status). Obviously there are other dimensions that are pivotable (orgunit, period, dataset)

The data elements look like this. I have put the dimensions in square brackets, and the dimensional elements into curly brackets.

[Data element, Age, Disease, Patient status]
Deaths Confirmed Malaria total (composed of) {All ages, Malaria Cases, Deaths}

Deaths Confirmed Malaria 1 to Under 5 Years {1-5, Malaria Cases, Deaths}
Deaths Confirmed Malaria Over 5 Years {Over 5, Malaria Cases, Deaths}
Deaths Confirmed Malaria Under 1 Year {Under 1, Malaria Cases, Deaths}

IP Discharge Confirmed Malaria total (composed of) {All ages, Malaria Cases, IP}
IP Discharge Confirmed Malaria 1 to Under 5 Years {1-5, Malaria Cases, IP}
IP Discharge Confirmed Malaria Over 5 Years {Over 5, Malaria Cases, Deaths}

IP Discharge Confirmed Malaria Under 1 Year {Under 1, Malaria Cases, Deaths}
OPD 1st Attendance Confirmed Malaria total (composed of) {All ages, Malaria Cases, OPD}
OPD 1st Attendance Confirmed Malaria 1 to Under 5 Years {1-5, Malaria Cases, OPD}

OPD 1st Attendance Confirmed Malaria Over 5 Years {Over 5, Malaria Cases, OPD}
OPD 1st Attendance Confirmed Malaria Under 1 Year {Under 1, Malaria Cases, OPD}

OK, I hope this is pretty clear. Obviously, there are more data elements (Typhoid, Yellow fever, etc). I might want to know how many Under 1 deaths I have had for all diseases, or how many OPD cases I have had for each disease. How can I do this with the existing data model? It is not obvious to me because there is no relationship between dimensional elements (categoryoptions) to each other. Category options can be related through a cateogry combination, but since data elements can only be assigned a single category option, the dimensionality is broken once it gets time to pull the data into a pivot table.

In the incomplete example that I gave yesterday, I established a one-to-many relationship between a data element and a dimension. If I understand the current data model, I would have to create a separate categorycombo for each of these data elements, and assign this categorycombo to the data element. Now, I might be able to unfold the dimensions using the categories and categorycombos. I it is not apparent how the dimensional elements correspond themselves to a particular dimension, as there is no relation for this in the database as I can see it.

As (Johan pointed out a few mails ago, if I understand him correctly) is different categorycombo’s can be created for individual data elements, and assigned to these elements. However, this seems to be 1) incredibly inefficient and 2) does not establish any relationship between dimensional elements and dimensions. Perhaps it is there, and maybe it has been done in SL, but the SQL is not apparent to me at all.

It would appear to me, looking from an SQL perspective, that a one-to-many relationship between a data element, a dimension (category) and dimensional element (category combo) would be much more effieicnet, and highly usable from an SQL perspective. As I mentioned in my mail, I am not sure how easy this would be to implement in a procedural language like Java, but I assume it should be possible to either do it this way, or rewrite my Postgres proprietary query in standard SQL (which there are ways to do with ANSI SQL). This would require modification to the data model (similar to the table I provided yesterday) and modification to the UI to allow users to 1) select a dimension (category) 2) Select a dimensional element for the given dimension. This would populate the table with a dataelementid, a dimensionid (categoryid) and a dimensional element (cateogryoptionid).

My gut feeling this is exactly the same functionality as has currently been implemented for organizational units. Users can define a hierarchy for organizational units, and then assign them to categories/dimension/organizational group sets, decide whether the groups are compulsory and exlusive, and then assign a particular organizational unit to a particular group (which is analogous to a dimensional element). Organizational group sets define the dimension, and one-to-one assignment of an organizational unit to a particular organizational group defines which dimensional element the organizational unit is a member of. These dimensions can then be used in PivotTable analyses, where the orgunitgroupsets become dimensions, and orgunitgroups become dimensional elements…

I beleive that data elements are no different than organizational units. They should be able to be grouped into some sort of hierarchy and pivoted on any dimension. Data elements groups establish a one-to-many relationship between data elements and a data element group, but there is no concept of how data element groups relate to each other.

I think this is perhaps the same concept you mention, ReportSet.

I suspect we would need to potentially rethink the entire concept of multidimensionality if we really wanted to get it right. It would see to me that the DHIS datamodel and associated aggregation methods have been hardwired into aggregation across time (period) and geography (orgunit). What we can do with PivotTables and (and OLAP) is to aggregate across any possible dimension, slicing as you mention ,on any dimension . I am not sure this will be so simple to implement but I think there is a way to do it, without major modifications.

I am not sure it solves the SDMX issue. There are potential issues related to “ragged” dimensions and how these get handled. Some data elements might have three dimensions, while others may have more. I have not thought about this in detail, but know it is an issue with cross-tab queries in SQL. You normally have to know how many dimensions you are working with in order to perform a cross-tab, but there are dynamic solutions. Perhaps this could be dealt with somehow in SDMX.

Anyway ,I am rambling. Hope this mail helps though to push my point further. Once I get the SQL from SL, I will see if perhaps it has been done already, and that I am just writing long emails for nothing.

Regards,

Jason

On Fri, Sep 25, 2009 at 10:44 AM, Bob Jolliffe bobjolliffe@gmail.com wrote:

Hi Jason and Johan

I’m really pleased to see you having this discussion as I have been

grappling with a similar issue which involves unravelling categories,
category options and combos into something more familiar. I have reached
similar conclusions regarding nomenclature:

category = dimension

categorycombo - I have been calling a dimension set (it bears a strong, and
useful, resemblance to xslt:attribute-set)
category option - I like your suggestion of DimensionalElement. I am going
to start calling it that too.

In my case I need to export (and import data) into a standard format called
sdmx. So whereas in the DHIS2 native DXF we export datavalues with
effectively three dimensions (source, period, categorycombooption) the last

dimension is a sort of uber-dimension. Like a peppercorn or a cardamon
seed, when you break it open it explodes its rich complexity of dimensions.

In sdmx we need the dimensions exploded. So data values look like:

<datavalue name="TB test given" uid="44344 ...44" gender="Female"

age=“0-5” value=“38” />
…

My approach to unpicking the dimensions from the dxf file is to transform it
with an xslt transformation which is still incomplete but seems to work

well.

One other nomenclatures issue which has surfaced as a result is what we call
a “dataset”. In DHIS2, if I understand correctly, a dataset corresponds
roughly to all the dataelements which might occur on a datacollection form.

If we view all dataelements as having just the three “dimensions” then all
is well, but if we explode the actual dimensions then we have an issue. In
the sdmx model a dataset consists only of dataelements with the same

dimensionset. After discussing this with Ola we have reached the conclusion
that we need another level of grouping, primarily for the UI - eg FormSet or
ReportSet which allows us to group related datasets. But that is an aside

from what you are talking about.

I know that you guys can do magic with sql, but it seems that we should try
to capture some of this and place it down in the datamodel API. It occurs
to me that for a multidimensional dataelement we might benefit from some

utility methods to retrieve slices and dices which might assist in
constructing the pivot tables around dimensions. Does this sound like the
right thing to do.

Regards
Bob

2009/9/24 johansa@ifi.uio.no

Jason,
I will leave to others to comment the code, but I have a few comments…

I have done a bit more thinking on this, and would like to offer some
more examples up for discussion.

Basically, we have a lot of data elements that are somehow related to

each other, similar to my kooky example in my original mail. I assume
this is fairly common throughout other HMIS systems. Here, malaria
attendance is broken down into various dimensions/category by patient

type (outpatient, inpatient, and deaths) and by age (under 1 ,1-5 and
over 5). But say you want to be able to pivot to look at outpatient,
inpatient and deaths totals (i.e. summed up by age). Well, you could

create a separate data element for this, but it sure would be nice to
be able to Pivot the data somehow.

In the Sierra Leone db, Edem and Romain set up views that pulled the

categories through into a “Category” pivot field, which you can then use
to get what you want. Simply tick the categories (see below) you want to
see, and group them together in excel. Maybe Edem and Romain can help

further here.

Dimension ? Category
Dimensional element ? Category option ? Category combo ( I think)

The right symbol disappeared from my reply-mail here, but some

clarification:

Crosstab Dimension (age AND gender) = Category combo
Dimension (age, gender) = Category
Dimensional element (inpatient, outpatient, death, under1, 1-5, and over

= Category option

So by assigning a DE the category combo of “gender_age”, you get 9
dimensional elements, 3 category options (in category age) by 3 category
options (in category gender)

Johan

Anyway, here is the helper table I created.

CREATE TABLE test_dataelementcategorycombo
(

test_dataelementid integer NOT NULL,
test_dataelementcategoryid integer NOT NULL,
test_dataelementcategorycomboid integer NOT NULL,
CONSTRAINT pk_testdataelementcategory PRIMARY KEY

(test_dataelementid, test_dataelementcategoryid,
test_dataelementcategorycomboid)
)
WITH (OIDS=FALSE);

So this is a real simple table which references a data element, a

data element category, and a data element combo. The reference to a
data element category may be redundant, but anyway, lets leave it in
for now.

I populated the table with some data, which will be used to assign

dimensions to data elements. It looks like this in my DB, which looks
like this.

309;25250;25251
309;25257;25255
348;25250;25252

348;25257;25255
455;25250;25253
455;25257;25255

but of course this is meaningless to you. What do these values
correspond

to?

“OPD 1st Attendance Clinical Case of Malaria Under 1 Year”;“Age”;“Under
1”
"OPD 1st Attendance Clinical Case of Malaria 1 to Under 5

Years";“Age”;“Age 1-5”
“OPD 1st Attendance Clinical Case of Malaria Over 5 Years”;“Age”;“Over
5”
“OPD 1st Attendance Clinical Case of Malaria Under 1 Year”;"Patient

status";“OPD”
“OPD 1st Attendance Clinical Case of Malaria 1 to Under 5
Years”;“Patient status”;“OPD”
“OPD 1st Attendance Clinical Case of Malaria Over 5 Years”;"Patient

status";“OPD”

which can be produced by the following view.

CREATE OR REPLACE VIEW vw_dataelements_dimensions AS
SELECT dataelement.name, dataelementcategory.name AS dimension,

dataelementcategoryoption.name AS dimension_element
FROM dataelement
JOIN test_dataelementcategorycombo ON

test_dataelementcategorycombo.test_dataelementid =

dataelement.dataelementid
JOIN dataelementcategory ON dataelementcategory.categoryid =
test_dataelementcategorycombo.test_dataelementcategoryid
JOIN dataelementcategoryoption ON

test_dataelementcategorycombo.test_dataelementcategorycomboid =
dataelementcategoryoption.categoryoptionid;

So, that view just provides a human readable view of those integers

that I populated in the the test_dataelementcategorycombo table I
created above. This table just assigns particular data elements to
different category options (dimensional elements).

OK, so far so good, but the problem now is, how to use this with the
aggregatedatavalue table? If we try and join this table directly, we
will have issues with duplicates in the pivot table, so we need to

transform the data slightly.

This should do the trick.

SELECT * FROM crosstab
(
'SELECT name, dimension, dimension_element FROM

vw_dataelements_dimensions ORDER BY 1,2,3’,
‘SELECT DISTINCT dimension from vw_dataelements_dimensions ORDER BY 1
ASC’
)
as

(
name character varying(230),
age character varying(160),
status character varying(160)
);

which returns this record set

“OPD 1st Attendance Clinical Case of Malaria 1 to Under 5 Years”;“Age
1-5”;“OPD”
“OPD 1st Attendance Clinical Case of Malaria Over 5 Years”;"Over

5";“OPD”
“OPD 1st Attendance Clinical Case of Malaria Under 1 Year”;“Under
1”;“OPD”

OK, admittedly, I cheated a bit and used the crosstab function of

Postgresql, but I assume that this query could be rewritten with a few
more lines of code in standard SQL or some procedural language like
Java. Now, this record set looks like something that I can almost use

with the aggregateddatavalue table simply by joining up the table on
the appropriate dataelementid and pulling everything into a pivot
table. I would not have any duplicated values and would have columns

like data element name, period, orgunit, age, patient status and of
course the value of the data element. I hope that part is pretty
clear. Just join up that table to the aggregateddata table, and you

have pretty much what is needed to pull the data directly into a
PivotTable for further analysis.

This is not a complete example, but it is very close to what I need

here ,and I think this type of functionality would be much more useful
than the current data element categories functionality. Basically, all
that would be required, at least initially, would be another user

interface screen to allow the definition of which category(ies) and
category options a data element is a member of. The rest could ,in the
first instance be executed with custom SQL (obviously, I am partial to

this language and hobbled by the fact that I do not know Java), but
eventually this would need to be implemented somehow in Java.

I am not sure if this really solves all of the issues surrounding

multidimensional analysis of data elements, but it seems to solve the
issues that I am having by trying to assign some sort of dimensional
hierarchy to data elements (similar to the exclusive/compulsory

functionality of orgunits). Any thoughts on this?

Best regards,
Jason

On Wed, Sep 16, 2009 at 10:28 PM, Jason Pickering

jason.p.pickering@gmail.com wrote:

On Wed, Sep 16, 2009 at 10:13 PM, johansa@ifi.uio.no wrote:

However, there does seem to be the ability to assign dimensions,
there
does
not seem to be the ability to assign particular elements within

those

dimensions to a particular DHIS data element.

Just some more clarification here: you can make category combos which

you
assign to data elements. However, it is not possible to assign just
specific parts of a category combo (only some of the category options)
to

a data element.

Yes, this was exactly what I wanted. Assigning different categories
would
seem to break the dimensionality.

Then you must make a specific category (as the only one in
or part of a new category combo) with just those options. It can be

hell;
in Tajikistan there were way over 20 categories I think, at least 10
just
on various age groups.

Johan

This was my fear.

I will need to do some testing and see. I still fear it is not exactly

the
intended functionality.

Basically, I think I need something akin to the exclusive/compulsory
groups
that are in place for organizational units, but instead, for arbitrary

dimensions. I will give a try and see what happens.

Thanks,
Jason

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

–
Cheers,
Knut Staring

–
Cheers,
Knut Staring

bobj · 25 September 2009 10:57

Hi Jason

Hi there.

My basic issue with the category/category combo is that it appears to be a one-to-one relationship with data elements. If I look at the data model, there is a one-to-one relationship between dataelement and categorycomboid. For a given category combo, you can have multiple options. So, you can establish a relationship for a given data element and a group of category options.

Let me try and describe the issue. We have a set of data elements related to malaria for this example. We would like to be able to pivot the data on other dimensions dimensions (Data element, age, disease, patient status). Obviously there are other dimensions that are pivotable (orgunit, period, dataset)

The data elements look like this. I have put the dimensions in square brackets, and the dimensional elements into curly brackets.

[Data element, Age, Disease, Patient status]
Deaths Confirmed Malaria total (composed of) {All ages, Malaria Cases, Deaths}

Deaths Confirmed Malaria 1 to Under 5 Years {1-5, Malaria Cases, Deaths}
Deaths Confirmed Malaria Over 5 Years {Over 5, Malaria Cases, Deaths}
Deaths Confirmed Malaria Under 1 Year {Under 1, Malaria Cases, Deaths}

IP Discharge Confirmed Malaria total (composed of) {All ages, Malaria Cases, IP}
IP Discharge Confirmed Malaria 1 to Under 5 Years {1-5, Malaria Cases, IP}
IP Discharge Confirmed Malaria Over 5 Years {Over 5, Malaria Cases, Deaths}

IP Discharge Confirmed Malaria Under 1 Year {Under 1, Malaria Cases, Deaths}
OPD 1st Attendance Confirmed Malaria total (composed of) {All ages, Malaria Cases, OPD}
OPD 1st Attendance Confirmed Malaria 1 to Under 5 Years {1-5, Malaria Cases, OPD}

OPD 1st Attendance Confirmed Malaria Over 5 Years {Over 5, Malaria Cases, OPD}
OPD 1st Attendance Confirmed Malaria Under 1 Year {Under 1, Malaria Cases, OPD}

OK, I hope this is pretty clear. Obviously, there are more data elements (Typhoid, Yellow fever, etc). I might want to know how many Under 1 deaths I have had for all diseases, or how many OPD cases I have had for each disease. How can I do this with the existing data model? It is not obvious to me because there is no relationship between dimensional elements (categoryoptions) to each other. Category options can be related through a cateogry combination, but since data elements can only be assigned a single category option, the dimensionality is broken once it gets time to pull the data into a pivot table.

I think you mean datelements can be assigned a single categorycombo? I have found the multi-dimensional model confusing but I am not convinced that it is broken. All of the dimensionality is indeed captured but we seem to be missing the convenience methods in the API and the UI to make use of it.

In the incomplete example that I gave yesterday, I established a one-to-many relationship between a data element and a dimension. If I understand the current data model, I would have to create a separate categorycombo for each of these data elements, and assign this categorycombo to the data element. Now, I might be able to unfold the dimensions using the categories and categorycombos. I it is not apparent how the dimensional elements correspond themselves to a particular dimension, as there is no relation for this in the database as I can see it.

As I see it, from the reporting/analysis perspective the categorycombo is of little use. It is simply a convenience to indicate the dimensions which exist for the dataelement. Which is why I have been referring to it a dimensionset. The relation between a dimension and its dimensional element is captured through the categoryComboCategoryAssociations.

I think you are right and the whole categorycombo could be done away with by simpling having a one to many relationship between dataelements and dimensions (categories). Though the categoryCombo does seem useful when creating datelements in the user interface. There may be many dataelements which share the same dimensionsets. For example Age/Disease/Patient Status in your example above. So this set is given a name which allows it to be easily reused across dataelements. The downside being that it creates an extra layer of indirection to get at particular dimensional elements. This results in the SQL efficiency issue you refer to below.

As (Johan pointed out a few mails ago, if I understand him correctly) is different categorycombo’s can be created for individual data elements, and assigned to these elements. However, this seems to be 1) incredibly inefficient and 2) does not establish any relationship between dimensional elements and dimensions. Perhaps it is there, and maybe it has been done in SL, but the SQL is not apparent to me at all.

It would appear to me, looking from an SQL perspective, that a one-to-many relationship between a data element, a dimension (category) and dimensional element (category combo) would be much more effieicnet, and highly usable from an SQL perspective.

Agreed. As I said above, I don’t think the current model is broken, but it has an extra indirection level.

As I mentioned in my mail, I am not sure how easy this would be to implement in a procedural language like Java, but I assume it should be possible to either do it this way, or rewrite my Postgres proprietary query in standard SQL (which there are ways to do with ANSI SQL). This would require modification to the data model (similar to the table I provided yesterday) and modification to the UI to allow users to 1) select a dimension (category) 2) Select a dimensional element for the given dimension. This would populate the table with a dataelementid, a dimensionid (categoryid) and a dimensional element (cateogryoptionid).

It should certainly be possible to implement this UI interface with the current datamodel. step (1) would involve selecting a dimension from the dimensionset associated with the dataelement. I think this is currently lacking.

My gut feeling this is exactly the same functionality as has currently been implemented for organizational units. Users can define a hierarchy for organizational units, and then assign them to categories/dimension/organizational group sets, decide whether the groups are compulsory and exlusive, and then assign a particular organizational unit to a particular group (which is analogous to a dimensional element). Organizational group sets define the dimension, and one-to-one assignment of an organizational unit to a particular organizational group defines which dimensional element the organizational unit is a member of. These dimensions can then be used in PivotTable analyses, where the orgunitgroupsets become dimensions, and orgunitgroups become dimensional elements…

This model is also supported in SDMX ie. composing hierarchies of dataelements. I suppose it comes down to naming. Our model allows a single dataelement which contains multiple dimensions within it. The hierarchical model has multiple dataelements which can be grouped hierarchically for the purpose of aggregation. In the end the effect is the same. Possibly this is similar to what we do with dataelement groups.

I beleive that data elements are no different than organizational units. They should be able to be grouped into some sort of hierarchy and pivoted on any dimension. Data elements groups establish a one-to-many relationship between data elements and a data element group, but there is no concept of how data element groups relate to each other.

I think this is perhaps the same concept you mention, ReportSet.

Not really. The ReportSet would have no real significance from your perspective - rather it would be a way to define the set of (ragged) datasets which form a particular input form.

I suspect we would need to potentially rethink the entire concept of multidimensionality if we really wanted to get it right. It would see to me that the DHIS datamodel and associated aggregation methods have been hardwired into aggregation across time (period) and geography (orgunit). What we can do with PivotTables and (and OLAP) is to aggregate across any possible dimension, slicing as you mention ,on any dimension . I am not sure this will be so simple to implement but I think there is a way to do it, without major modifications.

Having the existing two fixed or mandatory dimensions is not necessarily a bad thing. But we must also be able to aggregate across the others (currently hidden behind the categorycombo veil). If the day goes well I’ll give you a stylesheet later for picking them out of a dxf file. But probably we do want to push this unravelling into the API.

In the meantime (since you are out docbook champion) I’ve attached a simple (and incomplete) stylesheet for making a docbook rendition of dxf metadata. I’ve been using this to try and make it easier to look at different databases and how they have implemented dimensions.

Run xsltproc (or some other xslt processor) on dxf2docbook.xsl with a dxf metadata file as as a source file.

Regards
Bob

dxf2docbook.xsl (4.51 KB)

···

2009/9/25 Jason Pickering jason.p.pickering@gmail.com

I am not sure it solves the SDMX issue. There are potential issues related to “ragged” dimensions and how these get handled. Some data elements might have three dimensions, while others may have more. I have not thought about this in detail, but know it is an issue with cross-tab queries in SQL. You normally have to know how many dimensions you are working with in order to perform a cross-tab, but there are dynamic solutions. Perhaps this could be dealt with somehow in SDMX.

Anyway ,I am rambling. Hope this mail helps though to push my point further. Once I get the SQL from SL, I will see if perhaps it has been done already, and that I am just writing long emails for nothing.

Regards,

Jason

On Fri, Sep 25, 2009 at 10:44 AM, Bob Jolliffe bobjolliffe@gmail.com wrote:

Hi Jason and Johan

I’m really pleased to see you having this discussion as I have been

grappling with a similar issue which involves unravelling categories,
category options and combos into something more familiar. I have reached
similar conclusions regarding nomenclature:

category = dimension

categorycombo - I have been calling a dimension set (it bears a strong, and
useful, resemblance to xslt:attribute-set)
category option - I like your suggestion of DimensionalElement. I am going
to start calling it that too.

In my case I need to export (and import data) into a standard format called
sdmx. So whereas in the DHIS2 native DXF we export datavalues with
effectively three dimensions (source, period, categorycombooption) the last

dimension is a sort of uber-dimension. Like a peppercorn or a cardamon
seed, when you break it open it explodes its rich complexity of dimensions.

In sdmx we need the dimensions exploded. So data values look like:

<datavalue name="TB test given" uid="44344 ...44" gender="Female"

age=“0-5” value=“38” />
…

My approach to unpicking the dimensions from the dxf file is to transform it
with an xslt transformation which is still incomplete but seems to work

well.

One other nomenclatures issue which has surfaced as a result is what we call
a “dataset”. In DHIS2, if I understand correctly, a dataset corresponds
roughly to all the dataelements which might occur on a datacollection form.

If we view all dataelements as having just the three “dimensions” then all
is well, but if we explode the actual dimensions then we have an issue. In
the sdmx model a dataset consists only of dataelements with the same

dimensionset. After discussing this with Ola we have reached the conclusion
that we need another level of grouping, primarily for the UI - eg FormSet or
ReportSet which allows us to group related datasets. But that is an aside

from what you are talking about.

I know that you guys can do magic with sql, but it seems that we should try
to capture some of this and place it down in the datamodel API. It occurs
to me that for a multidimensional dataelement we might benefit from some

utility methods to retrieve slices and dices which might assist in
constructing the pivot tables around dimensions. Does this sound like the
right thing to do.

Regards
Bob

2009/9/24 johansa@ifi.uio.no

Jason,
I will leave to others to comment the code, but I have a few comments…

I have done a bit more thinking on this, and would like to offer some
more examples up for discussion.

Basically, we have a lot of data elements that are somehow related to

each other, similar to my kooky example in my original mail. I assume
this is fairly common throughout other HMIS systems. Here, malaria
attendance is broken down into various dimensions/category by patient

type (outpatient, inpatient, and deaths) and by age (under 1 ,1-5 and
over 5). But say you want to be able to pivot to look at outpatient,
inpatient and deaths totals (i.e. summed up by age). Well, you could

create a separate data element for this, but it sure would be nice to
be able to Pivot the data somehow.

In the Sierra Leone db, Edem and Romain set up views that pulled the

categories through into a “Category” pivot field, which you can then use
to get what you want. Simply tick the categories (see below) you want to
see, and group them together in excel. Maybe Edem and Romain can help

further here.

Dimension ? Category
Dimensional element ? Category option ? Category combo ( I think)

The right symbol disappeared from my reply-mail here, but some

clarification:

Crosstab Dimension (age AND gender) = Category combo
Dimension (age, gender) = Category
Dimensional element (inpatient, outpatient, death, under1, 1-5, and over

= Category option

So by assigning a DE the category combo of “gender_age”, you get 9
dimensional elements, 3 category options (in category age) by 3 category
options (in category gender)

Johan

Anyway, here is the helper table I created.

CREATE TABLE test_dataelementcategorycombo
(

test_dataelementid integer NOT NULL,
test_dataelementcategoryid integer NOT NULL,
test_dataelementcategorycomboid integer NOT NULL,
CONSTRAINT pk_testdataelementcategory PRIMARY KEY

(test_dataelementid, test_dataelementcategoryid,
test_dataelementcategorycomboid)
)
WITH (OIDS=FALSE);

So this is a real simple table which references a data element, a

data element category, and a data element combo. The reference to a
data element category may be redundant, but anyway, lets leave it in
for now.

I populated the table with some data, which will be used to assign

dimensions to data elements. It looks like this in my DB, which looks
like this.

309;25250;25251
309;25257;25255
348;25250;25252

348;25257;25255
455;25250;25253
455;25257;25255

but of course this is meaningless to you. What do these values
correspond

to?

“OPD 1st Attendance Clinical Case of Malaria Under 1 Year”;“Age”;“Under
1”
"OPD 1st Attendance Clinical Case of Malaria 1 to Under 5

Years";“Age”;“Age 1-5”
“OPD 1st Attendance Clinical Case of Malaria Over 5 Years”;“Age”;“Over
5”
“OPD 1st Attendance Clinical Case of Malaria Under 1 Year”;"Patient

status";“OPD”
“OPD 1st Attendance Clinical Case of Malaria 1 to Under 5
Years”;“Patient status”;“OPD”
“OPD 1st Attendance Clinical Case of Malaria Over 5 Years”;"Patient

status";“OPD”

which can be produced by the following view.

CREATE OR REPLACE VIEW vw_dataelements_dimensions AS
SELECT dataelement.name, dataelementcategory.name AS dimension,

dataelementcategoryoption.name AS dimension_element
FROM dataelement
JOIN test_dataelementcategorycombo ON

test_dataelementcategorycombo.test_dataelementid =

dataelement.dataelementid
JOIN dataelementcategory ON dataelementcategory.categoryid =
test_dataelementcategorycombo.test_dataelementcategoryid
JOIN dataelementcategoryoption ON

test_dataelementcategorycombo.test_dataelementcategorycomboid =
dataelementcategoryoption.categoryoptionid;

So, that view just provides a human readable view of those integers

that I populated in the the test_dataelementcategorycombo table I
created above. This table just assigns particular data elements to
different category options (dimensional elements).

OK, so far so good, but the problem now is, how to use this with the
aggregatedatavalue table? If we try and join this table directly, we
will have issues with duplicates in the pivot table, so we need to

transform the data slightly.

This should do the trick.

SELECT * FROM crosstab
(
'SELECT name, dimension, dimension_element FROM

vw_dataelements_dimensions ORDER BY 1,2,3’,
‘SELECT DISTINCT dimension from vw_dataelements_dimensions ORDER BY 1
ASC’
)
as

(
name character varying(230),
age character varying(160),
status character varying(160)
);

which returns this record set

“OPD 1st Attendance Clinical Case of Malaria 1 to Under 5 Years”;“Age
1-5”;“OPD”
“OPD 1st Attendance Clinical Case of Malaria Over 5 Years”;"Over

5";“OPD”
“OPD 1st Attendance Clinical Case of Malaria Under 1 Year”;“Under
1”;“OPD”

OK, admittedly, I cheated a bit and used the crosstab function of

Postgresql, but I assume that this query could be rewritten with a few
more lines of code in standard SQL or some procedural language like
Java. Now, this record set looks like something that I can almost use

with the aggregateddatavalue table simply by joining up the table on
the appropriate dataelementid and pulling everything into a pivot
table. I would not have any duplicated values and would have columns

like data element name, period, orgunit, age, patient status and of
course the value of the data element. I hope that part is pretty
clear. Just join up that table to the aggregateddata table, and you

have pretty much what is needed to pull the data directly into a
PivotTable for further analysis.

This is not a complete example, but it is very close to what I need

here ,and I think this type of functionality would be much more useful
than the current data element categories functionality. Basically, all
that would be required, at least initially, would be another user

interface screen to allow the definition of which category(ies) and
category options a data element is a member of. The rest could ,in the
first instance be executed with custom SQL (obviously, I am partial to

this language and hobbled by the fact that I do not know Java), but
eventually this would need to be implemented somehow in Java.

I am not sure if this really solves all of the issues surrounding

multidimensional analysis of data elements, but it seems to solve the
issues that I am having by trying to assign some sort of dimensional
hierarchy to data elements (similar to the exclusive/compulsory

functionality of orgunits). Any thoughts on this?

Best regards,
Jason

On Wed, Sep 16, 2009 at 10:28 PM, Jason Pickering > > > >> > jason.p.pickering@gmail.com wrote:

On Wed, Sep 16, 2009 at 10:13 PM, johansa@ifi.uio.no wrote:

However, there does seem to be the ability to assign dimensions,
there
does
not seem to be the ability to assign particular elements within

those

dimensions to a particular DHIS data element.

Just some more clarification here: you can make category combos which

you
assign to data elements. However, it is not possible to assign just
specific parts of a category combo (only some of the category options)
to

a data element.

Yes, this was exactly what I wanted. Assigning different categories
would
seem to break the dimensionality.

Then you must make a specific category (as the only one in
or part of a new category combo) with just those options. It can be

hell;
in Tajikistan there were way over 20 categories I think, at least 10
just
on various age groups.

Johan

This was my fear.

I will need to do some testing and see. I still fear it is not exactly

the
intended functionality.

Basically, I think I need something akin to the exclusive/compulsory
groups
that are in place for organizational units, but instead, for arbitrary

dimensions. I will give a try and see what happens.

Thanks,
Jason

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

olatitle · 25 September 2009 11:20

Hi there.

My basic issue with the category/category combo is that it appears to be a one-to-one relationship with data elements. If I look at the data model, there is a one-to-one relationship between dataelement and categorycomboid. For a given category combo, you can have multiple options. So, you can establish a relationship for a given data element and a group of category options.

Let me try and describe the issue. We have a set of data elements related to malaria for this example. We would like to be able to pivot the data on other dimensions dimensions (Data element, age, disease, patient status). Obviously there are other dimensions that are pivotable (orgunit, period, dataset)

Aren’t your diseases already in the data element name? While data element is one of many dimensions e.g. in a pivot table it is more than just a dimension in the DHIS model I guess, so it is important to be consistent breaking up "name of what you capture (e.g. 1.4 data element names) into data element + categories. Diseases are normally what is repeated across many datasets so I would use that as the Data element. But that is a design consideration I guess.

The data elements look like this. I have put the dimensions in square brackets, and the dimensional elements into curly brackets.

[Data element, Age, Disease, Patient status]
Deaths Confirmed Malaria total (composed of) {All ages, Malaria Cases, Deaths}

Deaths Confirmed Malaria 1 to Under 5 Years {1-5, Malaria Cases, Deaths}
Deaths Confirmed Malaria Over 5 Years {Over 5, Malaria Cases, Deaths}
Deaths Confirmed Malaria Under 1 Year {Under 1, Malaria Cases, Deaths}

IP Discharge Confirmed Malaria total (composed of) {All ages, Malaria Cases, IP}
IP Discharge Confirmed Malaria 1 to Under 5 Years {1-5, Malaria Cases, IP}
IP Discharge Confirmed Malaria Over 5 Years {Over 5, Malaria Cases, Deaths}

IP Discharge Confirmed Malaria Under 1 Year {Under 1, Malaria Cases, Deaths}
OPD 1st Attendance Confirmed Malaria total (composed of) {All ages, Malaria Cases, OPD}
OPD 1st Attendance Confirmed Malaria 1 to Under 5 Years {1-5, Malaria Cases, OPD}

OPD 1st Attendance Confirmed Malaria Over 5 Years {Over 5, Malaria Cases, OPD}
OPD 1st Attendance Confirmed Malaria Under 1 Year {Under 1, Malaria Cases, OPD}
OK, I hope this is pretty clear. Obviously, there are more data elements (Typhoid, Yellow fever, etc).
I might want to know how many Under 1 deaths I have had for all diseases, or how many OPD cases I have had for each disease. How can I do this with the existing data model? It is not obvious to me because there is no relationship between dimensional elements (categoryoptions) to each other. Category options can be related through a cateogry combination, but since data elements can only be assigned a single category option, the dimensionality is broken once it gets time to pull the data into a pivot table.

Hi Jason,

Thanks for all these examples and suggestions. I have been waiting for users to request dimensional data analysis for a long time, as it is something that has been neglected in DHIS so far. There is functionality to capture data linked to categoryoptioncombos (Male, <5), (Female,<5),(Male,>5),(Female,>5), but there is no functionality yet (although blueprints exist) to use these dimensions in report tables, or in data mart in general. Currently you can get the total for the data element or the flattened out value (data element + categoryoptioncombo, as above. After discussing with Bob, and I know he has investigated the model in detail I understand that it is possible to bring categories and category options into e.g. a pivot table or report table and analyse data elements across these, e.g. data element + male, data element + <5, etc. So there must (I hope) be something in the database that you’re not seeing that can help you pull this into a pivot table. You SHOULD be able to bring in categories as your pivot fields (dimensions), and for each of them list their category options (dimensional elements). In my example there should be at least three dimensions (with options): data element name (Malaria, Typhoid, Tuberculosis, gender (male, female, age (<5,>5).

At least according to Bob this is possible from the API, I hope also from the database directly. We want to make this available in data mart tables or report tables as soon as possible I think.

There are some minor changes to the model that are needed in order to get here. Bob mentioned datasets. To be able to use the multidimensional model for anything else than data entry in a custom form we need to make it a constraint that one data set can only have 1 categorycombo. I guess this will lead to more datasets than we have today in most places at least, and it breaks the 1-1 relationship between data set and data entry form that we have today. We need to look at forms as a separate object that can hold 1 or more data sets. This can e.g. allow for a data entry form with multiple autogenerated tables based on the datasets’ categorycombo (like we have today for datasets that actually have only 1 categoprycombo.

Another important constraint I see is to limit the use of dimensional data elements to use cases where there is a hierarchy of data values that actually match up to a total. E.g. the categorycombo gender+age with datavalues for (Male, <5), (Female,<5),(Male,>5),(Female,>5) match up to a total of all cases for that data element. I know that the model has been used as a tool to shorten data element names without any logical hierarchy where values add up. E.g. BCG + “in stock”, BCG “doses wasted”, BCG + “Doses received”. Although I can see the point of reducing data element names I this exampel is more of a presentation or view problem and this break up can be sorted out in the data entry form design, and should not be in the data element model as it makes no sense for aggregation and data analysis to have those breakups.

I also support a name change of the multidimensional model. I prefer dimensions, dimensional elements(or options), and dimension sets.

In the incomplete example that I gave yesterday, I established a one-to-many relationship between a data element and a dimension. If I understand the current data model, I would have to create a separate categorycombo for each of these data elements, and assign this categorycombo to the data element. Now, I might be able to unfold the dimensions using the categories and categorycombos. I it is not apparent how the dimensional elements correspond themselves to a particular dimension, as there is no relation for this in the database as I can see it.

As (Johan pointed out a few mails ago, if I understand him correctly) is different categorycombo’s can be created for individual data elements, and assigned to these elements. However, this seems to be 1) incredibly inefficient and 2) does not establish any relationship between dimensional elements and dimensions. Perhaps it is there, and maybe it has been done in SL, but the SQL is not apparent to me at all.

Categorycombos will be assigned to a whole datasets of data elements sharing the dimension set. We should look into some kind of bulk update of data elements in the UI where you can apply one categorycombo to a list of ticked data elements or something.

It would appear to me, looking from an SQL perspective, that a one-to-many relationship between a data element, a dimension (category) and dimensional element (category combo) would be much more effieicnet, and highly usable from an SQL perspective. As I mentioned in my mail, I am not sure how easy this would be to implement in a procedural language like Java, but I assume it should be possible to either do it this way, or rewrite my Postgres proprietary query in standard SQL (which there are ways to do with ANSI SQL). This would require modification to the data model (similar to the table I provided yesterday) and modification to the UI to allow users to 1) select a dimension (category) 2) Select a dimensional element for the given dimension. This would populate the table with a dataelementid, a dimensionid (categoryid) and a dimensional element (cateogryoptionid).

My gut feeling this is exactly the same functionality as has currently been implemented for organizational units. Users can define a hierarchy for organizational units, and then assign them to categories/dimension/organizational group sets, decide whether the groups are compulsory and exlusive, and then assign a particular organizational unit to a particular group (which is analogous to a dimensional element). Organizational group sets define the dimension, and one-to-one assignment of an organizational unit to a particular organizational group defines which dimensional element the organizational unit is a member of. These dimensions can then be used in PivotTable analyses, where the orgunitgroupsets become dimensions, and orgunitgroups become dimensional elements…

I beleive that data elements are no different than organizational units. They should be able to be grouped into some sort of hierarchy and pivoted on any dimension. Data elements groups establish a one-to-many relationship between data elements and a data element group, but there is no concept of how data element groups relate to each other.

I think this is perhaps the same concept you mention, ReportSet.

I think data element groups should play a role of more broader dimensions, not linked to how data values are captured, but reflect themes or collections of data that makes sense to look at together. This means groupings across datasets (or across paper forms as we often like to see it). This implies that we probably would not like to use the data element categories and groups together i.e a pivot table, but in stead look at data elements totals (sum of all categories) and groups only since the data elements would not necessarily share categories. This makes sense e.g. when doing correlation beetween antenatal care, deliveries and immunisation, following the life cycle of clients. These data elements do not share age or gender break downs, but still are very useful to analyse together and a group would help to quickly show these data elements together in a table or report.

So for this reason I am a bit reluctant to merge data element categories/dimensions and groups. I know that a lot of data analysis does not require all the break downs of age and gender etc. and its always a battle whether you put them in or not. The way we can easily hide all that detail by just looking at the total of each data element without any categories and still add other useful groupings provides flexibility I think.

Totally agree with Knut that we need to document much better what we mean by data set, data element group, data element category, and what the intended use is. To an outsider they might all mean the same thing, but inside DHIS they serve specific needs.

I suspect we would need to potentially rethink the entire concept of multidimensionality if we really wanted to get it right. It would see to me that the DHIS datamodel and associated aggregation methods have been hardwired into aggregation across time (period) and geography (orgunit). What we can do with PivotTables and (and OLAP) is to aggregate across any possible dimension, slicing as you mention ,on any dimension . I am not sure this will be so simple to implement but I think there is a way to do it, without major modifications.

I am not sure it solves the SDMX issue. There are potential issues related to “ragged” dimensions and how these get handled. Some data elements might have three dimensions, while others may have more. I have not thought about this in detail, but know it is an issue with cross-tab queries in SQL. You normally have to know how many dimensions you are working with in order to perform a cross-tab, but there are dynamic solutions. Perhaps this could be dealt with somehow in SDMX.

From Bob I understand that this will not be a problem as every dataset would define its own “structure” that will be used for exchanging data on that specific set. Bob can fill in…

Anyway ,I am rambling. Hope this mail helps though to push my point further. Once I get the SQL from SL, I will see if perhaps it has been done already, and that I am just writing long emails for nothing.

Regards,

Jason

While writing this very long email I noticed that Bob has replied already…

Ola

···

2009/9/25 Jason Pickering jason.p.pickering@gmail.com

On Fri, Sep 25, 2009 at 10:44 AM, Bob Jolliffe bobjolliffe@gmail.com wrote:

Hi Jason and Johan

I’m really pleased to see you having this discussion as I have been

grappling with a similar issue which involves unravelling categories,
category options and combos into something more familiar. I have reached
similar conclusions regarding nomenclature:

category = dimension

categorycombo - I have been calling a dimension set (it bears a strong, and
useful, resemblance to xslt:attribute-set)
category option - I like your suggestion of DimensionalElement. I am going
to start calling it that too.

In my case I need to export (and import data) into a standard format called
sdmx. So whereas in the DHIS2 native DXF we export datavalues with
effectively three dimensions (source, period, categorycombooption) the last

dimension is a sort of uber-dimension. Like a peppercorn or a cardamon
seed, when you break it open it explodes its rich complexity of dimensions.

In sdmx we need the dimensions exploded. So data values look like:

<datavalue name="TB test given" uid="44344 ...44" gender="Female"

age=“0-5” value=“38” />
…

My approach to unpicking the dimensions from the dxf file is to transform it
with an xslt transformation which is still incomplete but seems to work

well.

One other nomenclatures issue which has surfaced as a result is what we call
a “dataset”. In DHIS2, if I understand correctly, a dataset corresponds
roughly to all the dataelements which might occur on a datacollection form.

If we view all dataelements as having just the three “dimensions” then all
is well, but if we explode the actual dimensions then we have an issue. In
the sdmx model a dataset consists only of dataelements with the same

dimensionset. After discussing this with Ola we have reached the conclusion
that we need another level of grouping, primarily for the UI - eg FormSet or
ReportSet which allows us to group related datasets. But that is an aside

from what you are talking about.

I know that you guys can do magic with sql, but it seems that we should try
to capture some of this and place it down in the datamodel API. It occurs
to me that for a multidimensional dataelement we might benefit from some

utility methods to retrieve slices and dices which might assist in
constructing the pivot tables around dimensions. Does this sound like the
right thing to do.

Regards
Bob

2009/9/24 johansa@ifi.uio.no

Jason,
I will leave to others to comment the code, but I have a few comments…

I have done a bit more thinking on this, and would like to offer some
more examples up for discussion.

Basically, we have a lot of data elements that are somehow related to

each other, similar to my kooky example in my original mail. I assume
this is fairly common throughout other HMIS systems. Here, malaria
attendance is broken down into various dimensions/category by patient

type (outpatient, inpatient, and deaths) and by age (under 1 ,1-5 and
over 5). But say you want to be able to pivot to look at outpatient,
inpatient and deaths totals (i.e. summed up by age). Well, you could

create a separate data element for this, but it sure would be nice to
be able to Pivot the data somehow.

In the Sierra Leone db, Edem and Romain set up views that pulled the

categories through into a “Category” pivot field, which you can then use
to get what you want. Simply tick the categories (see below) you want to
see, and group them together in excel. Maybe Edem and Romain can help

further here.

Dimension ? Category
Dimensional element ? Category option ? Category combo ( I think)

The right symbol disappeared from my reply-mail here, but some

clarification:

Crosstab Dimension (age AND gender) = Category combo
Dimension (age, gender) = Category
Dimensional element (inpatient, outpatient, death, under1, 1-5, and over

= Category option

So by assigning a DE the category combo of “gender_age”, you get 9
dimensional elements, 3 category options (in category age) by 3 category
options (in category gender)

Johan

Anyway, here is the helper table I created.

CREATE TABLE test_dataelementcategorycombo
(

test_dataelementid integer NOT NULL,
test_dataelementcategoryid integer NOT NULL,
test_dataelementcategorycomboid integer NOT NULL,
CONSTRAINT pk_testdataelementcategory PRIMARY KEY

(test_dataelementid, test_dataelementcategoryid,
test_dataelementcategorycomboid)
)
WITH (OIDS=FALSE);

So this is a real simple table which references a data element, a

data element category, and a data element combo. The reference to a
data element category may be redundant, but anyway, lets leave it in
for now.

I populated the table with some data, which will be used to assign

dimensions to data elements. It looks like this in my DB, which looks
like this.

309;25250;25251
309;25257;25255
348;25250;25252

348;25257;25255
455;25250;25253
455;25257;25255

but of course this is meaningless to you. What do these values
correspond

to?

“OPD 1st Attendance Clinical Case of Malaria Under 1 Year”;“Age”;“Under
1”
"OPD 1st Attendance Clinical Case of Malaria 1 to Under 5

Years";“Age”;“Age 1-5”
“OPD 1st Attendance Clinical Case of Malaria Over 5 Years”;“Age”;“Over
5”
“OPD 1st Attendance Clinical Case of Malaria Under 1 Year”;"Patient

status";“OPD”
“OPD 1st Attendance Clinical Case of Malaria 1 to Under 5
Years”;“Patient status”;“OPD”
“OPD 1st Attendance Clinical Case of Malaria Over 5 Years”;"Patient

status";“OPD”

which can be produced by the following view.

CREATE OR REPLACE VIEW vw_dataelements_dimensions AS
SELECT dataelement.name, dataelementcategory.name AS dimension,

dataelementcategoryoption.name AS dimension_element
FROM dataelement
JOIN test_dataelementcategorycombo ON

test_dataelementcategorycombo.test_dataelementid =

dataelement.dataelementid
JOIN dataelementcategory ON dataelementcategory.categoryid =
test_dataelementcategorycombo.test_dataelementcategoryid
JOIN dataelementcategoryoption ON

test_dataelementcategorycombo.test_dataelementcategorycomboid =
dataelementcategoryoption.categoryoptionid;

So, that view just provides a human readable view of those integers

that I populated in the the test_dataelementcategorycombo table I
created above. This table just assigns particular data elements to
different category options (dimensional elements).

OK, so far so good, but the problem now is, how to use this with the
aggregatedatavalue table? If we try and join this table directly, we
will have issues with duplicates in the pivot table, so we need to

transform the data slightly.

This should do the trick.

SELECT * FROM crosstab
(
'SELECT name, dimension, dimension_element FROM

vw_dataelements_dimensions ORDER BY 1,2,3’,
‘SELECT DISTINCT dimension from vw_dataelements_dimensions ORDER BY 1
ASC’
)
as

(
name character varying(230),
age character varying(160),
status character varying(160)
);

which returns this record set

“OPD 1st Attendance Clinical Case of Malaria 1 to Under 5 Years”;“Age
1-5”;“OPD”
“OPD 1st Attendance Clinical Case of Malaria Over 5 Years”;"Over

5";“OPD”
“OPD 1st Attendance Clinical Case of Malaria Under 1 Year”;“Under
1”;“OPD”

OK, admittedly, I cheated a bit and used the crosstab function of

Postgresql, but I assume that this query could be rewritten with a few
more lines of code in standard SQL or some procedural language like
Java. Now, this record set looks like something that I can almost use

with the aggregateddatavalue table simply by joining up the table on
the appropriate dataelementid and pulling everything into a pivot
table. I would not have any duplicated values and would have columns

like data element name, period, orgunit, age, patient status and of
course the value of the data element. I hope that part is pretty
clear. Just join up that table to the aggregateddata table, and you

have pretty much what is needed to pull the data directly into a
PivotTable for further analysis.

This is not a complete example, but it is very close to what I need

here ,and I think this type of functionality would be much more useful
than the current data element categories functionality. Basically, all
that would be required, at least initially, would be another user

interface screen to allow the definition of which category(ies) and
category options a data element is a member of. The rest could ,in the
first instance be executed with custom SQL (obviously, I am partial to

this language and hobbled by the fact that I do not know Java), but
eventually this would need to be implemented somehow in Java.

I am not sure if this really solves all of the issues surrounding

multidimensional analysis of data elements, but it seems to solve the
issues that I am having by trying to assign some sort of dimensional
hierarchy to data elements (similar to the exclusive/compulsory

functionality of orgunits). Any thoughts on this?

Best regards,
Jason

On Wed, Sep 16, 2009 at 10:28 PM, Jason Pickering > > > >> > jason.p.pickering@gmail.com wrote:

On Wed, Sep 16, 2009 at 10:13 PM, johansa@ifi.uio.no wrote:

However, there does seem to be the ability to assign dimensions,
there
does
not seem to be the ability to assign particular elements within

those

dimensions to a particular DHIS data element.

Just some more clarification here: you can make category combos which

you
assign to data elements. However, it is not possible to assign just
specific parts of a category combo (only some of the category options)
to

a data element.

Yes, this was exactly what I wanted. Assigning different categories
would
seem to break the dimensionality.

Then you must make a specific category (as the only one in
or part of a new category combo) with just those options. It can be

hell;
in Tajikistan there were way over 20 categories I think, at least 10
just
on various age groups.

Johan

This was my fear.

I will need to do some testing and see. I still fear it is not exactly

the
intended functionality.

Basically, I think I need something akin to the exclusive/compulsory
groups
that are in place for organizational units, but instead, for arbitrary

dimensions. I will give a try and see what happens.

Thanks,
Jason

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

olatitle · 25 September 2009 11:32

I think I am starting to see Jason’s problem a bit clearer now. A limitation to only be able to pull data from one dataset (one categorycombo) into a report or OLAP cube is not enough. We need this restriction when dealing with raw data, e.g. in data entry and import/export, but for reports and analysis I think we need to come up with a more flexible solution along the lines of what Jason is suggesting. I think the exisiting model can be used though, but more flexibility is needed in the report table UI. I think we could have a OLAP cube builder (or ideally also an Excel pivot table designer/generator) as part of or extension to the report table UI we have today.

When generating cubes/ datamarts with dimensions (data element categories) it should be possible to pick categories freely and completely independent on categorycombos. As long as two data elements share one category it should be possible to ask for a datamart with those two data elements and that shared category with values breaken up by that category’s options. Similar to the table Jason is requesting, Data Element, Category, CategoryOption. This will involve aggregation across the category options, and produce a much more flexible view to the data than what is necessary when entering data in a form.

Ola

···

2009/9/25 Bob Jolliffe bobjolliffe@gmail.com

Hi Jason

2009/9/25 Jason Pickering jason.p.pickering@gmail.com

Hi there.

My basic issue with the category/category combo is that it appears to be a one-to-one relationship with data elements. If I look at the data model, there is a one-to-one relationship between dataelement and categorycomboid. For a given category combo, you can have multiple options. So, you can establish a relationship for a given data element and a group of category options.

Let me try and describe the issue. We have a set of data elements related to malaria for this example. We would like to be able to pivot the data on other dimensions dimensions (Data element, age, disease, patient status). Obviously there are other dimensions that are pivotable (orgunit, period, dataset)

The data elements look like this. I have put the dimensions in square brackets, and the dimensional elements into curly brackets.

[Data element, Age, Disease, Patient status]
Deaths Confirmed Malaria total (composed of) {All ages, Malaria Cases, Deaths}

Deaths Confirmed Malaria 1 to Under 5 Years {1-5, Malaria Cases, Deaths}
Deaths Confirmed Malaria Over 5 Years {Over 5, Malaria Cases, Deaths}
Deaths Confirmed Malaria Under 1 Year {Under 1, Malaria Cases, Deaths}

IP Discharge Confirmed Malaria total (composed of) {All ages, Malaria Cases, IP}
IP Discharge Confirmed Malaria 1 to Under 5 Years {1-5, Malaria Cases, IP}
IP Discharge Confirmed Malaria Over 5 Years {Over 5, Malaria Cases, Deaths}

IP Discharge Confirmed Malaria Under 1 Year {Under 1, Malaria Cases, Deaths}
OPD 1st Attendance Confirmed Malaria total (composed of) {All ages, Malaria Cases, OPD}
OPD 1st Attendance Confirmed Malaria 1 to Under 5 Years {1-5, Malaria Cases, OPD}

OPD 1st Attendance Confirmed Malaria Over 5 Years {Over 5, Malaria Cases, OPD}
OPD 1st Attendance Confirmed Malaria Under 1 Year {Under 1, Malaria Cases, OPD}

OK, I hope this is pretty clear. Obviously, there are more data elements (Typhoid, Yellow fever, etc). I might want to know how many Under 1 deaths I have had for all diseases, or how many OPD cases I have had for each disease. How can I do this with the existing data model? It is not obvious to me because there is no relationship between dimensional elements (categoryoptions) to each other. Category options can be related through a cateogry combination, but since data elements can only be assigned a single category option, the dimensionality is broken once it gets time to pull the data into a pivot table.

I think you mean datelements can be assigned a single categorycombo? I have found the multi-dimensional model confusing but I am not convinced that it is broken. All of the dimensionality is indeed captured but we seem to be missing the convenience methods in the API and the UI to make use of it.

In the incomplete example that I gave yesterday, I established a one-to-many relationship between a data element and a dimension. If I understand the current data model, I would have to create a separate categorycombo for each of these data elements, and assign this categorycombo to the data element. Now, I might be able to unfold the dimensions using the categories and categorycombos. I it is not apparent how the dimensional elements correspond themselves to a particular dimension, as there is no relation for this in the database as I can see it.

As I see it, from the reporting/analysis perspective the categorycombo is of little use. It is simply a convenience to indicate the dimensions which exist for the dataelement. Which is why I have been referring to it a dimensionset. The relation between a dimension and its dimensional element is captured through the categoryComboCategoryAssociations.

I think you are right and the whole categorycombo could be done away with by simpling having a one to many relationship between dataelements and dimensions (categories). Though the categoryCombo does seem useful when creating datelements in the user interface. There may be many dataelements which share the same dimensionsets. For example Age/Disease/Patient Status in your example above. So this set is given a name which allows it to be easily reused across dataelements. The downside being that it creates an extra layer of indirection to get at particular dimensional elements. This results in the SQL efficiency issue you refer to below.

As (Johan pointed out a few mails ago, if I understand him correctly) is different categorycombo’s can be created for individual data elements, and assigned to these elements. However, this seems to be 1) incredibly inefficient and 2) does not establish any relationship between dimensional elements and dimensions. Perhaps it is there, and maybe it has been done in SL, but the SQL is not apparent to me at all.

It would appear to me, looking from an SQL perspective, that a one-to-many relationship between a data element, a dimension (category) and dimensional element (category combo) would be much more effieicnet, and highly usable from an SQL perspective.

Agreed. As I said above, I don’t think the current model is broken, but it has an extra indirection level.

As I mentioned in my mail, I am not sure how easy this would be to implement in a procedural language like Java, but I assume it should be possible to either do it this way, or rewrite my Postgres proprietary query in standard SQL (which there are ways to do with ANSI SQL). This would require modification to the data model (similar to the table I provided yesterday) and modification to the UI to allow users to 1) select a dimension (category) 2) Select a dimensional element for the given dimension. This would populate the table with a dataelementid, a dimensionid (categoryid) and a dimensional element (cateogryoptionid).

It should certainly be possible to implement this UI interface with the current datamodel. step (1) would involve selecting a dimension from the dimensionset associated with the dataelement. I think this is currently lacking.

My gut feeling this is exactly the same functionality as has currently been implemented for organizational units. Users can define a hierarchy for organizational units, and then assign them to categories/dimension/organizational group sets, decide whether the groups are compulsory and exlusive, and then assign a particular organizational unit to a particular group (which is analogous to a dimensional element). Organizational group sets define the dimension, and one-to-one assignment of an organizational unit to a particular organizational group defines which dimensional element the organizational unit is a member of. These dimensions can then be used in PivotTable analyses, where the orgunitgroupsets become dimensions, and orgunitgroups become dimensional elements…

This model is also supported in SDMX ie. composing hierarchies of dataelements. I suppose it comes down to naming. Our model allows a single dataelement which contains multiple dimensions within it. The hierarchical model has multiple dataelements which can be grouped hierarchically for the purpose of aggregation. In the end the effect is the same. Possibly this is similar to what we do with dataelement groups.

I beleive that data elements are no different than organizational units. They should be able to be grouped into some sort of hierarchy and pivoted on any dimension. Data elements groups establish a one-to-many relationship between data elements and a data element group, but there is no concept of how data element groups relate to each other.

I think this is perhaps the same concept you mention, ReportSet.

Not really. The ReportSet would have no real significance from your perspective - rather it would be a way to define the set of (ragged) datasets which form a particular input form.

I suspect we would need to potentially rethink the entire concept of multidimensionality if we really wanted to get it right. It would see to me that the DHIS datamodel and associated aggregation methods have been hardwired into aggregation across time (period) and geography (orgunit). What we can do with PivotTables and (and OLAP) is to aggregate across any possible dimension, slicing as you mention ,on any dimension . I am not sure this will be so simple to implement but I think there is a way to do it, without major modifications.

Having the existing two fixed or mandatory dimensions is not necessarily a bad thing. But we must also be able to aggregate across the others (currently hidden behind the categorycombo veil). If the day goes well I’ll give you a stylesheet later for picking them out of a dxf file. But probably we do want to push this unravelling into the API.

In the meantime (since you are out docbook champion) I’ve attached a simple (and incomplete) stylesheet for making a docbook rendition of dxf metadata. I’ve been using this to try and make it easier to look at different databases and how they have implemented dimensions.

Run xsltproc (or some other xslt processor) on dxf2docbook.xsl with a dxf metadata file as as a source file.

Regards
Bob

I am not sure it solves the SDMX issue. There are potential issues related to “ragged” dimensions and how these get handled. Some data elements might have three dimensions, while others may have more. I have not thought about this in detail, but know it is an issue with cross-tab queries in SQL. You normally have to know how many dimensions you are working with in order to perform a cross-tab, but there are dynamic solutions. Perhaps this could be dealt with somehow in SDMX.

Anyway ,I am rambling. Hope this mail helps though to push my point further. Once I get the SQL from SL, I will see if perhaps it has been done already, and that I am just writing long emails for nothing.

Regards,

Jason

On Fri, Sep 25, 2009 at 10:44 AM, Bob Jolliffe bobjolliffe@gmail.com wrote:

Hi Jason and Johan

I’m really pleased to see you having this discussion as I have been

grappling with a similar issue which involves unravelling categories,
category options and combos into something more familiar. I have reached
similar conclusions regarding nomenclature:

category = dimension

categorycombo - I have been calling a dimension set (it bears a strong, and
useful, resemblance to xslt:attribute-set)
category option - I like your suggestion of DimensionalElement. I am going
to start calling it that too.

In my case I need to export (and import data) into a standard format called
sdmx. So whereas in the DHIS2 native DXF we export datavalues with
effectively three dimensions (source, period, categorycombooption) the last

dimension is a sort of uber-dimension. Like a peppercorn or a cardamon
seed, when you break it open it explodes its rich complexity of dimensions.

In sdmx we need the dimensions exploded. So data values look like:

<datavalue name="TB test given" uid="44344 ...44" gender="Female"

age=“0-5” value=“38” />
…

My approach to unpicking the dimensions from the dxf file is to transform it
with an xslt transformation which is still incomplete but seems to work

well.

One other nomenclatures issue which has surfaced as a result is what we call
a “dataset”. In DHIS2, if I understand correctly, a dataset corresponds
roughly to all the dataelements which might occur on a datacollection form.

If we view all dataelements as having just the three “dimensions” then all
is well, but if we explode the actual dimensions then we have an issue. In
the sdmx model a dataset consists only of dataelements with the same

dimensionset. After discussing this with Ola we have reached the conclusion
that we need another level of grouping, primarily for the UI - eg FormSet or
ReportSet which allows us to group related datasets. But that is an aside

from what you are talking about.

I know that you guys can do magic with sql, but it seems that we should try
to capture some of this and place it down in the datamodel API. It occurs
to me that for a multidimensional dataelement we might benefit from some

utility methods to retrieve slices and dices which might assist in
constructing the pivot tables around dimensions. Does this sound like the
right thing to do.

Regards
Bob

2009/9/24 johansa@ifi.uio.no

Jason,
I will leave to others to comment the code, but I have a few comments…

I have done a bit more thinking on this, and would like to offer some
more examples up for discussion.

Basically, we have a lot of data elements that are somehow related to

each other, similar to my kooky example in my original mail. I assume
this is fairly common throughout other HMIS systems. Here, malaria
attendance is broken down into various dimensions/category by patient

type (outpatient, inpatient, and deaths) and by age (under 1 ,1-5 and
over 5). But say you want to be able to pivot to look at outpatient,
inpatient and deaths totals (i.e. summed up by age). Well, you could

create a separate data element for this, but it sure would be nice to
be able to Pivot the data somehow.

In the Sierra Leone db, Edem and Romain set up views that pulled the

categories through into a “Category” pivot field, which you can then use
to get what you want. Simply tick the categories (see below) you want to
see, and group them together in excel. Maybe Edem and Romain can help

further here.

Dimension ? Category
Dimensional element ? Category option ? Category combo ( I think)

The right symbol disappeared from my reply-mail here, but some

clarification:

Crosstab Dimension (age AND gender) = Category combo
Dimension (age, gender) = Category
Dimensional element (inpatient, outpatient, death, under1, 1-5, and over

= Category option

So by assigning a DE the category combo of “gender_age”, you get 9
dimensional elements, 3 category options (in category age) by 3 category
options (in category gender)

Johan

Anyway, here is the helper table I created.

CREATE TABLE test_dataelementcategorycombo
(

test_dataelementid integer NOT NULL,
test_dataelementcategoryid integer NOT NULL,
test_dataelementcategorycomboid integer NOT NULL,
CONSTRAINT pk_testdataelementcategory PRIMARY KEY

(test_dataelementid, test_dataelementcategoryid,
test_dataelementcategorycomboid)
)
WITH (OIDS=FALSE);

So this is a real simple table which references a data element, a

data element category, and a data element combo. The reference to a
data element category may be redundant, but anyway, lets leave it in
for now.

I populated the table with some data, which will be used to assign

dimensions to data elements. It looks like this in my DB, which looks
like this.

309;25250;25251
309;25257;25255
348;25250;25252

348;25257;25255
455;25250;25253
455;25257;25255

but of course this is meaningless to you. What do these values
correspond

to?

“OPD 1st Attendance Clinical Case of Malaria Under 1 Year”;“Age”;“Under
1”
"OPD 1st Attendance Clinical Case of Malaria 1 to Under 5

Years";“Age”;“Age 1-5”
“OPD 1st Attendance Clinical Case of Malaria Over 5 Years”;“Age”;“Over
5”
“OPD 1st Attendance Clinical Case of Malaria Under 1 Year”;"Patient

status";“OPD”
“OPD 1st Attendance Clinical Case of Malaria 1 to Under 5
Years”;“Patient status”;“OPD”
“OPD 1st Attendance Clinical Case of Malaria Over 5 Years”;"Patient

status";“OPD”

which can be produced by the following view.

CREATE OR REPLACE VIEW vw_dataelements_dimensions AS
SELECT dataelement.name, dataelementcategory.name AS dimension,

dataelementcategoryoption.name AS dimension_element
FROM dataelement
JOIN test_dataelementcategorycombo ON

test_dataelementcategorycombo.test_dataelementid =

dataelement.dataelementid
JOIN dataelementcategory ON dataelementcategory.categoryid =
test_dataelementcategorycombo.test_dataelementcategoryid
JOIN dataelementcategoryoption ON

test_dataelementcategorycombo.test_dataelementcategorycomboid =
dataelementcategoryoption.categoryoptionid;

So, that view just provides a human readable view of those integers

that I populated in the the test_dataelementcategorycombo table I
created above. This table just assigns particular data elements to
different category options (dimensional elements).

OK, so far so good, but the problem now is, how to use this with the
aggregatedatavalue table? If we try and join this table directly, we
will have issues with duplicates in the pivot table, so we need to

transform the data slightly.

This should do the trick.

SELECT * FROM crosstab
(
'SELECT name, dimension, dimension_element FROM

vw_dataelements_dimensions ORDER BY 1,2,3’,
‘SELECT DISTINCT dimension from vw_dataelements_dimensions ORDER BY 1
ASC’
)
as

(
name character varying(230),
age character varying(160),
status character varying(160)
);

which returns this record set

“OPD 1st Attendance Clinical Case of Malaria 1 to Under 5 Years”;“Age
1-5”;“OPD”
“OPD 1st Attendance Clinical Case of Malaria Over 5 Years”;"Over

5";“OPD”
“OPD 1st Attendance Clinical Case of Malaria Under 1 Year”;“Under
1”;“OPD”

OK, admittedly, I cheated a bit and used the crosstab function of

Postgresql, but I assume that this query could be rewritten with a few
more lines of code in standard SQL or some procedural language like
Java. Now, this record set looks like something that I can almost use

with the aggregateddatavalue table simply by joining up the table on
the appropriate dataelementid and pulling everything into a pivot
table. I would not have any duplicated values and would have columns

like data element name, period, orgunit, age, patient status and of
course the value of the data element. I hope that part is pretty
clear. Just join up that table to the aggregateddata table, and you

have pretty much what is needed to pull the data directly into a
PivotTable for further analysis.

This is not a complete example, but it is very close to what I need

here ,and I think this type of functionality would be much more useful
than the current data element categories functionality. Basically, all
that would be required, at least initially, would be another user

interface screen to allow the definition of which category(ies) and
category options a data element is a member of. The rest could ,in the
first instance be executed with custom SQL (obviously, I am partial to

this language and hobbled by the fact that I do not know Java), but
eventually this would need to be implemented somehow in Java.

I am not sure if this really solves all of the issues surrounding

multidimensional analysis of data elements, but it seems to solve the
issues that I am having by trying to assign some sort of dimensional
hierarchy to data elements (similar to the exclusive/compulsory

functionality of orgunits). Any thoughts on this?

Best regards,
Jason

On Wed, Sep 16, 2009 at 10:28 PM, Jason Pickering > > > > > > > > >> > jason.p.pickering@gmail.com wrote:

On Wed, Sep 16, 2009 at 10:13 PM, johansa@ifi.uio.no wrote:

However, there does seem to be the ability to assign dimensions,
there
does
not seem to be the ability to assign particular elements within

those

dimensions to a particular DHIS data element.

Just some more clarification here: you can make category combos which

you
assign to data elements. However, it is not possible to assign just
specific parts of a category combo (only some of the category options)
to

a data element.

Yes, this was exactly what I wanted. Assigning different categories
would
seem to break the dimensionality.

Then you must make a specific category (as the only one in
or part of a new category combo) with just those options. It can be

hell;
in Tajikistan there were way over 20 categories I think, at least 10
just
on various age groups.

Johan

This was my fear.

I will need to do some testing and see. I still fear it is not exactly

the
intended functionality.

Basically, I think I need something akin to the exclusive/compulsory
groups
that are in place for organizational units, but instead, for arbitrary

dimensions. I will give a try and see what happens.

Thanks,
Jason

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Knut_Staring · 25 September 2009 11:37

Hi Jason

Hi there.

My basic issue with the category/category combo is that it appears to be a one-to-one relationship with data elements. If I look at the data model, there is a one-to-one relationship between dataelement and categorycomboid. For a given category combo, you can have multiple options. So, you can establish a relationship for a given data element and a group of category options.

Let me try and describe the issue. We have a set of data elements related to malaria for this example. We would like to be able to pivot the data on other dimensions dimensions (Data element, age, disease, patient status). Obviously there are other dimensions that are pivotable (orgunit, period, dataset)

The data elements look like this. I have put the dimensions in square brackets, and the dimensional elements into curly brackets.

[Data element, Age, Disease, Patient status]
Deaths Confirmed Malaria total (composed of) {All ages, Malaria Cases, Deaths}

Deaths Confirmed Malaria 1 to Under 5 Years {1-5, Malaria Cases, Deaths}
Deaths Confirmed Malaria Over 5 Years {Over 5, Malaria Cases, Deaths}
Deaths Confirmed Malaria Under 1 Year {Under 1, Malaria Cases, Deaths}

IP Discharge Confirmed Malaria total (composed of) {All ages, Malaria Cases, IP}
IP Discharge Confirmed Malaria 1 to Under 5 Years {1-5, Malaria Cases, IP}
IP Discharge Confirmed Malaria Over 5 Years {Over 5, Malaria Cases, Deaths}

IP Discharge Confirmed Malaria Under 1 Year {Under 1, Malaria Cases, Deaths}
OPD 1st Attendance Confirmed Malaria total (composed of) {All ages, Malaria Cases, OPD}
OPD 1st Attendance Confirmed Malaria 1 to Under 5 Years {1-5, Malaria Cases, OPD}

OPD 1st Attendance Confirmed Malaria Over 5 Years {Over 5, Malaria Cases, OPD}
OPD 1st Attendance Confirmed Malaria Under 1 Year {Under 1, Malaria Cases, OPD}

OK, I hope this is pretty clear. Obviously, there are more data elements (Typhoid, Yellow fever, etc). I might want to know how many Under 1 deaths I have had for all diseases, or how many OPD cases I have had for each disease. How can I do this with the existing data model? It is not obvious to me because there is no relationship between dimensional elements (categoryoptions) to each other. Category options can be related through a cateogry combination, but since data elements can only be assigned a single category option, the dimensionality is broken once it gets time to pull the data into a pivot table.

I think you mean datelements can be assigned a single categorycombo? I have found the multi-dimensional model confusing but I am not convinced that it is broken. All of the dimensionality is indeed captured but we seem to be missing the convenience methods in the API and the UI to make use of it.

In the incomplete example that I gave yesterday, I established a one-to-many relationship between a data element and a dimension. If I understand the current data model, I would have to create a separate categorycombo for each of these data elements, and assign this categorycombo to the data element. Now, I might be able to unfold the dimensions using the categories and categorycombos. I it is not apparent how the dimensional elements correspond themselves to a particular dimension, as there is no relation for this in the database as I can see it.

As I see it, from the reporting/analysis perspective the categorycombo is of little use. It is simply a convenience to indicate the dimensions which exist for the dataelement. Which is why I have been referring to it a dimensionset. The relation between a dimension and its dimensional element is captured through the categoryComboCategoryAssociations.

I think you are right and the whole categorycombo could be done away with by simpling having a one to many relationship between dataelements and dimensions (categories). Though the categoryCombo does seem useful when creating datelements in the user interface. There may be many dataelements which share the same dimensionsets. For example Age/Disease/Patient Status in your example above. So this set is given a name which allows it to be easily reused across dataelements. The downside being that it creates an extra layer of indirection to get at particular dimensional elements. This results in the SQL efficiency issue you refer to below.

As (Johan pointed out a few mails ago, if I understand him correctly) is different categorycombo’s can be created for individual data elements, and assigned to these elements. However, this seems to be 1) incredibly inefficient and 2) does not establish any relationship between dimensional elements and dimensions. Perhaps it is there, and maybe it has been done in SL, but the SQL is not apparent to me at all.

It would appear to me, looking from an SQL perspective, that a one-to-many relationship between a data element, a dimension (category) and dimensional element (category combo) would be much more effieicnet, and highly usable from an SQL perspective.

Agreed. As I said above, I don’t think the current model is broken, but it has an extra indirection level.

As I mentioned in my mail, I am not sure how easy this would be to implement in a procedural language like Java, but I assume it should be possible to either do it this way, or rewrite my Postgres proprietary query in standard SQL (which there are ways to do with ANSI SQL). This would require modification to the data model (similar to the table I provided yesterday) and modification to the UI to allow users to 1) select a dimension (category) 2) Select a dimensional element for the given dimension. This would populate the table with a dataelementid, a dimensionid (categoryid) and a dimensional element (cateogryoptionid).

It should certainly be possible to implement this UI interface with the current datamodel. step (1) would involve selecting a dimension from the dimensionset associated with the dataelement. I think this is currently lacking.

I think I am starting to see Jason’s problem a bit clearer now. A limitation to only be able to pull data from one dataset (one categorycombo) into a report or OLAP cube is not enough. We need this restriction when dealing with raw data, e.g. in data entry and import/export, but for reports and analysis I think we need to come up with a more flexible solution along the lines of what Jason is suggesting. I think the exisiting model can be used though, but more flexibility is needed in the report table UI. I think we could have a OLAP cube builder (or ideally also an Excel pivot table designer/generator) as part of or extension to the report table UI we have today.

I am no expert on this, but Mondrian has virtual cubes, meant for combining data from different cubes. As we know, OpenHealth Elements are making use of such functionality. Maybe some discussion with the people behind it would be in order. Jiri will be in Geneva next week (focusing on FluID), perhaps a chat with him (over lunch or dinner?) could be arranged.

Knut

···

On Fri, Sep 25, 2009 at 1:32 PM, Ola Hodne Titlestad olatitle@gmail.com wrote:

2009/9/25 Bob Jolliffe bobjolliffe@gmail.com

2009/9/25 Jason Pickering jason.p.pickering@gmail.com

When generating cubes/ datamarts with dimensions (data element categories) it should be possible to pick categories freely and completely independent on categorycombos. As long as two data elements share one category it should be possible to ask for a datamart with those two data elements and that shared category with values breaken up by that category’s options. Similar to the table Jason is requesting, Data Element, Category, CategoryOption. This will involve aggregation across the category options, and produce a much more flexible view to the data than what is necessary when entering data in a form.

Ola

My gut feeling this is exactly the same functionality as has currently been implemented for organizational units. Users can define a hierarchy for organizational units, and then assign them to categories/dimension/organizational group sets, decide whether the groups are compulsory and exlusive, and then assign a particular organizational unit to a particular group (which is analogous to a dimensional element). Organizational group sets define the dimension, and one-to-one assignment of an organizational unit to a particular organizational group defines which dimensional element the organizational unit is a member of. These dimensions can then be used in PivotTable analyses, where the orgunitgroupsets become dimensions, and orgunitgroups become dimensional elements…

This model is also supported in SDMX ie. composing hierarchies of dataelements. I suppose it comes down to naming. Our model allows a single dataelement which contains multiple dimensions within it. The hierarchical model has multiple dataelements which can be grouped hierarchically for the purpose of aggregation. In the end the effect is the same. Possibly this is similar to what we do with dataelement groups.

I beleive that data elements are no different than organizational units. They should be able to be grouped into some sort of hierarchy and pivoted on any dimension. Data elements groups establish a one-to-many relationship between data elements and a data element group, but there is no concept of how data element groups relate to each other.

I think this is perhaps the same concept you mention, ReportSet.

Not really. The ReportSet would have no real significance from your perspective - rather it would be a way to define the set of (ragged) datasets which form a particular input form.

I suspect we would need to potentially rethink the entire concept of multidimensionality if we really wanted to get it right. It would see to me that the DHIS datamodel and associated aggregation methods have been hardwired into aggregation across time (period) and geography (orgunit). What we can do with PivotTables and (and OLAP) is to aggregate across any possible dimension, slicing as you mention ,on any dimension . I am not sure this will be so simple to implement but I think there is a way to do it, without major modifications.

Having the existing two fixed or mandatory dimensions is not necessarily a bad thing. But we must also be able to aggregate across the others (currently hidden behind the categorycombo veil). If the day goes well I’ll give you a stylesheet later for picking them out of a dxf file. But probably we do want to push this unravelling into the API.

In the meantime (since you are out docbook champion) I’ve attached a simple (and incomplete) stylesheet for making a docbook rendition of dxf metadata. I’ve been using this to try and make it easier to look at different databases and how they have implemented dimensions.

Run xsltproc (or some other xslt processor) on dxf2docbook.xsl with a dxf metadata file as as a source file.

Regards
Bob

I am not sure it solves the SDMX issue. There are potential issues related to “ragged” dimensions and how these get handled. Some data elements might have three dimensions, while others may have more. I have not thought about this in detail, but know it is an issue with cross-tab queries in SQL. You normally have to know how many dimensions you are working with in order to perform a cross-tab, but there are dynamic solutions. Perhaps this could be dealt with somehow in SDMX.

Anyway ,I am rambling. Hope this mail helps though to push my point further. Once I get the SQL from SL, I will see if perhaps it has been done already, and that I am just writing long emails for nothing.

Regards,

Jason

On Fri, Sep 25, 2009 at 10:44 AM, Bob Jolliffe bobjolliffe@gmail.com wrote:

Hi Jason and Johan

I’m really pleased to see you having this discussion as I have been

grappling with a similar issue which involves unravelling categories,
category options and combos into something more familiar. I have reached
similar conclusions regarding nomenclature:

category = dimension

categorycombo - I have been calling a dimension set (it bears a strong, and
useful, resemblance to xslt:attribute-set)
category option - I like your suggestion of DimensionalElement. I am going
to start calling it that too.

In my case I need to export (and import data) into a standard format called
sdmx. So whereas in the DHIS2 native DXF we export datavalues with
effectively three dimensions (source, period, categorycombooption) the last

dimension is a sort of uber-dimension. Like a peppercorn or a cardamon
seed, when you break it open it explodes its rich complexity of dimensions.

In sdmx we need the dimensions exploded. So data values look like:

<datavalue name="TB test given" uid="44344 ...44" gender="Female"

age=“0-5” value=“38” />
…

My approach to unpicking the dimensions from the dxf file is to transform it
with an xslt transformation which is still incomplete but seems to work

well.

One other nomenclatures issue which has surfaced as a result is what we call
a “dataset”. In DHIS2, if I understand correctly, a dataset corresponds
roughly to all the dataelements which might occur on a datacollection form.

If we view all dataelements as having just the three “dimensions” then all
is well, but if we explode the actual dimensions then we have an issue. In
the sdmx model a dataset consists only of dataelements with the same

dimensionset. After discussing this with Ola we have reached the conclusion
that we need another level of grouping, primarily for the UI - eg FormSet or
ReportSet which allows us to group related datasets. But that is an aside

from what you are talking about.

I know that you guys can do magic with sql, but it seems that we should try
to capture some of this and place it down in the datamodel API. It occurs
to me that for a multidimensional dataelement we might benefit from some

utility methods to retrieve slices and dices which might assist in
constructing the pivot tables around dimensions. Does this sound like the
right thing to do.

Regards
Bob

2009/9/24 johansa@ifi.uio.no

Jason,
I will leave to others to comment the code, but I have a few comments…

I have done a bit more thinking on this, and would like to offer some
more examples up for discussion.

Basically, we have a lot of data elements that are somehow related to

each other, similar to my kooky example in my original mail. I assume
this is fairly common throughout other HMIS systems. Here, malaria
attendance is broken down into various dimensions/category by patient

type (outpatient, inpatient, and deaths) and by age (under 1 ,1-5 and
over 5). But say you want to be able to pivot to look at outpatient,
inpatient and deaths totals (i.e. summed up by age). Well, you could

create a separate data element for this, but it sure would be nice to
be able to Pivot the data somehow.

In the Sierra Leone db, Edem and Romain set up views that pulled the

categories through into a “Category” pivot field, which you can then use
to get what you want. Simply tick the categories (see below) you want to
see, and group them together in excel. Maybe Edem and Romain can help

further here.

Dimension ? Category
Dimensional element ? Category option ? Category combo ( I think)

The right symbol disappeared from my reply-mail here, but some

clarification:

Crosstab Dimension (age AND gender) = Category combo
Dimension (age, gender) = Category
Dimensional element (inpatient, outpatient, death, under1, 1-5, and over

= Category option

So by assigning a DE the category combo of “gender_age”, you get 9
dimensional elements, 3 category options (in category age) by 3 category
options (in category gender)

Johan

Anyway, here is the helper table I created.

CREATE TABLE test_dataelementcategorycombo
(

test_dataelementid integer NOT NULL,
test_dataelementcategoryid integer NOT NULL,
test_dataelementcategorycomboid integer NOT NULL,
CONSTRAINT pk_testdataelementcategory PRIMARY KEY

(test_dataelementid, test_dataelementcategoryid,
test_dataelementcategorycomboid)
)
WITH (OIDS=FALSE);

So this is a real simple table which references a data element, a

data element category, and a data element combo. The reference to a
data element category may be redundant, but anyway, lets leave it in
for now.

I populated the table with some data, which will be used to assign

dimensions to data elements. It looks like this in my DB, which looks
like this.

309;25250;25251
309;25257;25255
348;25250;25252

348;25257;25255
455;25250;25253
455;25257;25255

but of course this is meaningless to you. What do these values
correspond

to?

“OPD 1st Attendance Clinical Case of Malaria Under 1 Year”;“Age”;“Under
1”
"OPD 1st Attendance Clinical Case of Malaria 1 to Under 5

Years";“Age”;“Age 1-5”
“OPD 1st Attendance Clinical Case of Malaria Over 5 Years”;“Age”;“Over
5”
“OPD 1st Attendance Clinical Case of Malaria Under 1 Year”;"Patient

status";“OPD”
“OPD 1st Attendance Clinical Case of Malaria 1 to Under 5
Years”;“Patient status”;“OPD”
“OPD 1st Attendance Clinical Case of Malaria Over 5 Years”;"Patient

status";“OPD”

which can be produced by the following view.

CREATE OR REPLACE VIEW vw_dataelements_dimensions AS
SELECT dataelement.name, dataelementcategory.name AS dimension,

dataelementcategoryoption.name AS dimension_element
FROM dataelement
JOIN test_dataelementcategorycombo ON

test_dataelementcategorycombo.test_dataelementid =

dataelement.dataelementid
JOIN dataelementcategory ON dataelementcategory.categoryid =
test_dataelementcategorycombo.test_dataelementcategoryid
JOIN dataelementcategoryoption ON

test_dataelementcategorycombo.test_dataelementcategorycomboid =
dataelementcategoryoption.categoryoptionid;

So, that view just provides a human readable view of those integers

that I populated in the the test_dataelementcategorycombo table I
created above. This table just assigns particular data elements to
different category options (dimensional elements).

OK, so far so good, but the problem now is, how to use this with the
aggregatedatavalue table? If we try and join this table directly, we
will have issues with duplicates in the pivot table, so we need to

transform the data slightly.

This should do the trick.

SELECT * FROM crosstab
(
'SELECT name, dimension, dimension_element FROM

vw_dataelements_dimensions ORDER BY 1,2,3’,
‘SELECT DISTINCT dimension from vw_dataelements_dimensions ORDER BY 1
ASC’
)
as

(
name character varying(230),
age character varying(160),
status character varying(160)
);

which returns this record set

“OPD 1st Attendance Clinical Case of Malaria 1 to Under 5 Years”;“Age
1-5”;“OPD”
“OPD 1st Attendance Clinical Case of Malaria Over 5 Years”;"Over

5";“OPD”
“OPD 1st Attendance Clinical Case of Malaria Under 1 Year”;“Under
1”;“OPD”

OK, admittedly, I cheated a bit and used the crosstab function of

Postgresql, but I assume that this query could be rewritten with a few
more lines of code in standard SQL or some procedural language like
Java. Now, this record set looks like something that I can almost use

with the aggregateddatavalue table simply by joining up the table on
the appropriate dataelementid and pulling everything into a pivot
table. I would not have any duplicated values and would have columns

like data element name, period, orgunit, age, patient status and of
course the value of the data element. I hope that part is pretty
clear. Just join up that table to the aggregateddata table, and you

have pretty much what is needed to pull the data directly into a
PivotTable for further analysis.

This is not a complete example, but it is very close to what I need

here ,and I think this type of functionality would be much more useful
than the current data element categories functionality. Basically, all
that would be required, at least initially, would be another user

interface screen to allow the definition of which category(ies) and
category options a data element is a member of. The rest could ,in the
first instance be executed with custom SQL (obviously, I am partial to

this language and hobbled by the fact that I do not know Java), but
eventually this would need to be implemented somehow in Java.

I am not sure if this really solves all of the issues surrounding

multidimensional analysis of data elements, but it seems to solve the
issues that I am having by trying to assign some sort of dimensional
hierarchy to data elements (similar to the exclusive/compulsory

functionality of orgunits). Any thoughts on this?

Best regards,
Jason

On Wed, Sep 16, 2009 at 10:28 PM, Jason Pickering

jason.p.pickering@gmail.com wrote:

On Wed, Sep 16, 2009 at 10:13 PM, johansa@ifi.uio.no wrote:

However, there does seem to be the ability to assign dimensions,
there
does
not seem to be the ability to assign particular elements within

those

dimensions to a particular DHIS data element.

Just some more clarification here: you can make category combos which

you
assign to data elements. However, it is not possible to assign just
specific parts of a category combo (only some of the category options)
to

a data element.

Yes, this was exactly what I wanted. Assigning different categories
would
seem to break the dimensionality.

Then you must make a specific category (as the only one in
or part of a new category combo) with just those options. It can be

hell;
in Tajikistan there were way over 20 categories I think, at least 10
just
on various age groups.

Johan

This was my fear.

I will need to do some testing and see. I still fear it is not exactly

the
intended functionality.

Basically, I think I need something akin to the exclusive/compulsory
groups
that are in place for organizational units, but instead, for arbitrary

dimensions. I will give a try and see what happens.

Thanks,
Jason

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

–
Cheers,
Knut Staring

bobj · 25 September 2009 12:36

Hi Ola

Comments inline below …

Hi Jason

Hi there.

My basic issue with the category/category combo is that it appears to be a one-to-one relationship with data elements. If I look at the data model, there is a one-to-one relationship between dataelement and categorycomboid. For a given category combo, you can have multiple options. So, you can establish a relationship for a given data element and a group of category options.

Let me try and describe the issue. We have a set of data elements related to malaria for this example. We would like to be able to pivot the data on other dimensions dimensions (Data element, age, disease, patient status). Obviously there are other dimensions that are pivotable (orgunit, period, dataset)

The data elements look like this. I have put the dimensions in square brackets, and the dimensional elements into curly brackets.

[Data element, Age, Disease, Patient status]
Deaths Confirmed Malaria total (composed of) {All ages, Malaria Cases, Deaths}

Deaths Confirmed Malaria 1 to Under 5 Years {1-5, Malaria Cases, Deaths}
Deaths Confirmed Malaria Over 5 Years {Over 5, Malaria Cases, Deaths}
Deaths Confirmed Malaria Under 1 Year {Under 1, Malaria Cases, Deaths}

IP Discharge Confirmed Malaria total (composed of) {All ages, Malaria Cases, IP}
IP Discharge Confirmed Malaria 1 to Under 5 Years {1-5, Malaria Cases, IP}
IP Discharge Confirmed Malaria Over 5 Years {Over 5, Malaria Cases, Deaths}

IP Discharge Confirmed Malaria Under 1 Year {Under 1, Malaria Cases, Deaths}
OPD 1st Attendance Confirmed Malaria total (composed of) {All ages, Malaria Cases, OPD}
OPD 1st Attendance Confirmed Malaria 1 to Under 5 Years {1-5, Malaria Cases, OPD}

OPD 1st Attendance Confirmed Malaria Over 5 Years {Over 5, Malaria Cases, OPD}
OPD 1st Attendance Confirmed Malaria Under 1 Year {Under 1, Malaria Cases, OPD}

OK, I hope this is pretty clear. Obviously, there are more data elements (Typhoid, Yellow fever, etc). I might want to know how many Under 1 deaths I have had for all diseases, or how many OPD cases I have had for each disease. How can I do this with the existing data model? It is not obvious to me because there is no relationship between dimensional elements (categoryoptions) to each other. Category options can be related through a cateogry combination, but since data elements can only be assigned a single category option, the dimensionality is broken once it gets time to pull the data into a pivot table.

I think you mean datelements can be assigned a single categorycombo? I have found the multi-dimensional model confusing but I am not convinced that it is broken. All of the dimensionality is indeed captured but we seem to be missing the convenience methods in the API and the UI to make use of it.

In the incomplete example that I gave yesterday, I established a one-to-many relationship between a data element and a dimension. If I understand the current data model, I would have to create a separate categorycombo for each of these data elements, and assign this categorycombo to the data element. Now, I might be able to unfold the dimensions using the categories and categorycombos. I it is not apparent how the dimensional elements correspond themselves to a particular dimension, as there is no relation for this in the database as I can see it.

As I see it, from the reporting/analysis perspective the categorycombo is of little use. It is simply a convenience to indicate the dimensions which exist for the dataelement. Which is why I have been referring to it a dimensionset. The relation between a dimension and its dimensional element is captured through the categoryComboCategoryAssociations.

I think you are right and the whole categorycombo could be done away with by simpling having a one to many relationship between dataelements and dimensions (categories). Though the categoryCombo does seem useful when creating datelements in the user interface. There may be many dataelements which share the same dimensionsets. For example Age/Disease/Patient Status in your example above. So this set is given a name which allows it to be easily reused across dataelements. The downside being that it creates an extra layer of indirection to get at particular dimensional elements. This results in the SQL efficiency issue you refer to below.

As (Johan pointed out a few mails ago, if I understand him correctly) is different categorycombo’s can be created for individual data elements, and assigned to these elements. However, this seems to be 1) incredibly inefficient and 2) does not establish any relationship between dimensional elements and dimensions. Perhaps it is there, and maybe it has been done in SL, but the SQL is not apparent to me at all.

It would appear to me, looking from an SQL perspective, that a one-to-many relationship between a data element, a dimension (category) and dimensional element (category combo) would be much more effieicnet, and highly usable from an SQL perspective.

Agreed. As I said above, I don’t think the current model is broken, but it has an extra indirection level.

As I mentioned in my mail, I am not sure how easy this would be to implement in a procedural language like Java, but I assume it should be possible to either do it this way, or rewrite my Postgres proprietary query in standard SQL (which there are ways to do with ANSI SQL). This would require modification to the data model (similar to the table I provided yesterday) and modification to the UI to allow users to 1) select a dimension (category) 2) Select a dimensional element for the given dimension. This would populate the table with a dataelementid, a dimensionid (categoryid) and a dimensional element (cateogryoptionid).

It should certainly be possible to implement this UI interface with the current datamodel. step (1) would involve selecting a dimension from the dimensionset associated with the dataelement. I think this is currently lacking.

I think I am starting to see Jason’s problem a bit clearer now. A limitation to only be able to pull data from one dataset (one categorycombo) into a report or OLAP cube is not enough.

We need this restriction when dealing with raw data, e.g. in data entry and import/export, but for reports and analysis I think we need to come up with a more flexible solution along the lines of what Jason is suggesting. I think the exisiting model can be used though, but more flexibility is needed in the report table UI.

Yes. I have not really looked at this UI in any detail yet but I am sure that is correct. Unfortunately my current priority is getting the sdmx stuff working but hopefully some of it will feed back to this problem.

I think we could have a OLAP cube builder (or ideally also an Excel pivot table designer/generator) as part of or extension to the report table UI we have today.

Just as exploding dimensions is required to generate sdmx, the same technique can be applied to generate xlsx xml streams to build pivot tables on the fly. I have a good ooxml guide book from Microsoft which I will do some bedside reading on.

When generating cubes/ datamarts with dimensions (data element categories) it should be possible to pick categories freely and completely independent on categorycombos. As long as two data elements share one category it should be possible to ask for a datamart with those two data elements and that shared category with values breaken up by that category’s options. Similar to the table Jason is requesting, Data Element, Category, CategoryOption. This will involve aggregation across the category options, and produce a much more flexible view to the data than what is necessary when entering data in a form.

Agreed. As I said earlier the CategoryCombo is useful on the dataentry side but not useful for reporting and we need to have UI components to see through it to the underlying dimensions.

Cheers
Bob

···

2009/9/25 Ola Hodne Titlestad olatitle@gmail.com

2009/9/25 Bob Jolliffe bobjolliffe@gmail.com

2009/9/25 Jason Pickering jason.p.pickering@gmail.com

Ola

My gut feeling this is exactly the same functionality as has currently been implemented for organizational units. Users can define a hierarchy for organizational units, and then assign them to categories/dimension/organizational group sets, decide whether the groups are compulsory and exlusive, and then assign a particular organizational unit to a particular group (which is analogous to a dimensional element). Organizational group sets define the dimension, and one-to-one assignment of an organizational unit to a particular organizational group defines which dimensional element the organizational unit is a member of. These dimensions can then be used in PivotTable analyses, where the orgunitgroupsets become dimensions, and orgunitgroups become dimensional elements…

This model is also supported in SDMX ie. composing hierarchies of dataelements. I suppose it comes down to naming. Our model allows a single dataelement which contains multiple dimensions within it. The hierarchical model has multiple dataelements which can be grouped hierarchically for the purpose of aggregation. In the end the effect is the same. Possibly this is similar to what we do with dataelement groups.

I beleive that data elements are no different than organizational units. They should be able to be grouped into some sort of hierarchy and pivoted on any dimension. Data elements groups establish a one-to-many relationship between data elements and a data element group, but there is no concept of how data element groups relate to each other.

I think this is perhaps the same concept you mention, ReportSet.

Not really. The ReportSet would have no real significance from your perspective - rather it would be a way to define the set of (ragged) datasets which form a particular input form.

I suspect we would need to potentially rethink the entire concept of multidimensionality if we really wanted to get it right. It would see to me that the DHIS datamodel and associated aggregation methods have been hardwired into aggregation across time (period) and geography (orgunit). What we can do with PivotTables and (and OLAP) is to aggregate across any possible dimension, slicing as you mention ,on any dimension . I am not sure this will be so simple to implement but I think there is a way to do it, without major modifications.

Having the existing two fixed or mandatory dimensions is not necessarily a bad thing. But we must also be able to aggregate across the others (currently hidden behind the categorycombo veil). If the day goes well I’ll give you a stylesheet later for picking them out of a dxf file. But probably we do want to push this unravelling into the API.

In the meantime (since you are out docbook champion) I’ve attached a simple (and incomplete) stylesheet for making a docbook rendition of dxf metadata. I’ve been using this to try and make it easier to look at different databases and how they have implemented dimensions.

Run xsltproc (or some other xslt processor) on dxf2docbook.xsl with a dxf metadata file as as a source file.

Regards
Bob

I am not sure it solves the SDMX issue. There are potential issues related to “ragged” dimensions and how these get handled. Some data elements might have three dimensions, while others may have more. I have not thought about this in detail, but know it is an issue with cross-tab queries in SQL. You normally have to know how many dimensions you are working with in order to perform a cross-tab, but there are dynamic solutions. Perhaps this could be dealt with somehow in SDMX.

Anyway ,I am rambling. Hope this mail helps though to push my point further. Once I get the SQL from SL, I will see if perhaps it has been done already, and that I am just writing long emails for nothing.

Regards,

Jason

On Fri, Sep 25, 2009 at 10:44 AM, Bob Jolliffe bobjolliffe@gmail.com wrote:

Hi Jason and Johan

I’m really pleased to see you having this discussion as I have been

grappling with a similar issue which involves unravelling categories,
category options and combos into something more familiar. I have reached
similar conclusions regarding nomenclature:

category = dimension

categorycombo - I have been calling a dimension set (it bears a strong, and
useful, resemblance to xslt:attribute-set)
category option - I like your suggestion of DimensionalElement. I am going
to start calling it that too.

In my case I need to export (and import data) into a standard format called
sdmx. So whereas in the DHIS2 native DXF we export datavalues with
effectively three dimensions (source, period, categorycombooption) the last

dimension is a sort of uber-dimension. Like a peppercorn or a cardamon
seed, when you break it open it explodes its rich complexity of dimensions.

In sdmx we need the dimensions exploded. So data values look like:

<datavalue name="TB test given" uid="44344 ...44" gender="Female"

age=“0-5” value=“38” />
…

My approach to unpicking the dimensions from the dxf file is to transform it
with an xslt transformation which is still incomplete but seems to work

well.

One other nomenclatures issue which has surfaced as a result is what we call
a “dataset”. In DHIS2, if I understand correctly, a dataset corresponds
roughly to all the dataelements which might occur on a datacollection form.

If we view all dataelements as having just the three “dimensions” then all
is well, but if we explode the actual dimensions then we have an issue. In
the sdmx model a dataset consists only of dataelements with the same

dimensionset. After discussing this with Ola we have reached the conclusion
that we need another level of grouping, primarily for the UI - eg FormSet or
ReportSet which allows us to group related datasets. But that is an aside

from what you are talking about.

I know that you guys can do magic with sql, but it seems that we should try
to capture some of this and place it down in the datamodel API. It occurs
to me that for a multidimensional dataelement we might benefit from some

utility methods to retrieve slices and dices which might assist in
constructing the pivot tables around dimensions. Does this sound like the
right thing to do.

Regards
Bob

2009/9/24 johansa@ifi.uio.no

Jason,
I will leave to others to comment the code, but I have a few comments…

I have done a bit more thinking on this, and would like to offer some
more examples up for discussion.

Basically, we have a lot of data elements that are somehow related to

each other, similar to my kooky example in my original mail. I assume
this is fairly common throughout other HMIS systems. Here, malaria
attendance is broken down into various dimensions/category by patient

type (outpatient, inpatient, and deaths) and by age (under 1 ,1-5 and
over 5). But say you want to be able to pivot to look at outpatient,
inpatient and deaths totals (i.e. summed up by age). Well, you could

create a separate data element for this, but it sure would be nice to
be able to Pivot the data somehow.

In the Sierra Leone db, Edem and Romain set up views that pulled the

categories through into a “Category” pivot field, which you can then use
to get what you want. Simply tick the categories (see below) you want to
see, and group them together in excel. Maybe Edem and Romain can help

further here.

Dimension ? Category
Dimensional element ? Category option ? Category combo ( I think)

The right symbol disappeared from my reply-mail here, but some

clarification:

Crosstab Dimension (age AND gender) = Category combo
Dimension (age, gender) = Category
Dimensional element (inpatient, outpatient, death, under1, 1-5, and over

= Category option

So by assigning a DE the category combo of “gender_age”, you get 9
dimensional elements, 3 category options (in category age) by 3 category
options (in category gender)

Johan

Anyway, here is the helper table I created.

CREATE TABLE test_dataelementcategorycombo
(

test_dataelementid integer NOT NULL,
test_dataelementcategoryid integer NOT NULL,
test_dataelementcategorycomboid integer NOT NULL,
CONSTRAINT pk_testdataelementcategory PRIMARY KEY

(test_dataelementid, test_dataelementcategoryid,
test_dataelementcategorycomboid)
)
WITH (OIDS=FALSE);

So this is a real simple table which references a data element, a

data element category, and a data element combo. The reference to a
data element category may be redundant, but anyway, lets leave it in
for now.

I populated the table with some data, which will be used to assign

dimensions to data elements. It looks like this in my DB, which looks
like this.

309;25250;25251
309;25257;25255
348;25250;25252

348;25257;25255
455;25250;25253
455;25257;25255

but of course this is meaningless to you. What do these values
correspond

to?

“OPD 1st Attendance Clinical Case of Malaria Under 1 Year”;“Age”;“Under
1”
"OPD 1st Attendance Clinical Case of Malaria 1 to Under 5

Years";“Age”;“Age 1-5”
“OPD 1st Attendance Clinical Case of Malaria Over 5 Years”;“Age”;“Over
5”
“OPD 1st Attendance Clinical Case of Malaria Under 1 Year”;"Patient

status";“OPD”
“OPD 1st Attendance Clinical Case of Malaria 1 to Under 5
Years”;“Patient status”;“OPD”
“OPD 1st Attendance Clinical Case of Malaria Over 5 Years”;"Patient

status";“OPD”

which can be produced by the following view.

CREATE OR REPLACE VIEW vw_dataelements_dimensions AS
SELECT dataelement.name, dataelementcategory.name AS dimension,

dataelementcategoryoption.name AS dimension_element
FROM dataelement
JOIN test_dataelementcategorycombo ON

test_dataelementcategorycombo.test_dataelementid =

dataelement.dataelementid
JOIN dataelementcategory ON dataelementcategory.categoryid =
test_dataelementcategorycombo.test_dataelementcategoryid
JOIN dataelementcategoryoption ON

test_dataelementcategorycombo.test_dataelementcategorycomboid =
dataelementcategoryoption.categoryoptionid;

So, that view just provides a human readable view of those integers

that I populated in the the test_dataelementcategorycombo table I
created above. This table just assigns particular data elements to
different category options (dimensional elements).

OK, so far so good, but the problem now is, how to use this with the
aggregatedatavalue table? If we try and join this table directly, we
will have issues with duplicates in the pivot table, so we need to

transform the data slightly.

This should do the trick.

SELECT * FROM crosstab
(
'SELECT name, dimension, dimension_element FROM

vw_dataelements_dimensions ORDER BY 1,2,3’,
‘SELECT DISTINCT dimension from vw_dataelements_dimensions ORDER BY 1
ASC’
)
as

(
name character varying(230),
age character varying(160),
status character varying(160)
);

which returns this record set

“OPD 1st Attendance Clinical Case of Malaria 1 to Under 5 Years”;“Age
1-5”;“OPD”
“OPD 1st Attendance Clinical Case of Malaria Over 5 Years”;"Over

5";“OPD”
“OPD 1st Attendance Clinical Case of Malaria Under 1 Year”;“Under
1”;“OPD”

OK, admittedly, I cheated a bit and used the crosstab function of

Postgresql, but I assume that this query could be rewritten with a few
more lines of code in standard SQL or some procedural language like
Java. Now, this record set looks like something that I can almost use

with the aggregateddatavalue table simply by joining up the table on
the appropriate dataelementid and pulling everything into a pivot
table. I would not have any duplicated values and would have columns

like data element name, period, orgunit, age, patient status and of
course the value of the data element. I hope that part is pretty
clear. Just join up that table to the aggregateddata table, and you

have pretty much what is needed to pull the data directly into a
PivotTable for further analysis.

This is not a complete example, but it is very close to what I need

here ,and I think this type of functionality would be much more useful
than the current data element categories functionality. Basically, all
that would be required, at least initially, would be another user

interface screen to allow the definition of which category(ies) and
category options a data element is a member of. The rest could ,in the
first instance be executed with custom SQL (obviously, I am partial to

this language and hobbled by the fact that I do not know Java), but
eventually this would need to be implemented somehow in Java.

I am not sure if this really solves all of the issues surrounding

multidimensional analysis of data elements, but it seems to solve the
issues that I am having by trying to assign some sort of dimensional
hierarchy to data elements (similar to the exclusive/compulsory

functionality of orgunits). Any thoughts on this?

Best regards,
Jason

On Wed, Sep 16, 2009 at 10:28 PM, Jason Pickering

jason.p.pickering@gmail.com wrote:

On Wed, Sep 16, 2009 at 10:13 PM, johansa@ifi.uio.no wrote:

However, there does seem to be the ability to assign dimensions,
there
does
not seem to be the ability to assign particular elements within

those

dimensions to a particular DHIS data element.

Just some more clarification here: you can make category combos which

you
assign to data elements. However, it is not possible to assign just
specific parts of a category combo (only some of the category options)
to

a data element.

Yes, this was exactly what I wanted. Assigning different categories
would
seem to break the dimensionality.

Then you must make a specific category (as the only one in
or part of a new category combo) with just those options. It can be

hell;
in Tajikistan there were way over 20 categories I think, at least 10
just
on various age groups.

Johan

This was my fear.

I will need to do some testing and see. I still fear it is not exactly

the
intended functionality.

Basically, I think I need something akin to the exclusive/compulsory
groups
that are in place for organizational units, but instead, for arbitrary

dimensions. I will give a try and see what happens.

Thanks,
Jason

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

jason · 26 September 2009 06:23

The realtionship is one-to-many. A categorycombo can be assigned for many dataelements. But a dataelement can have only one categorycombo.

Thanks for clarifying, but this is exactly what I meant. Sorry if it was not clear.

···

On Fri, Sep 25, 2009 at 11:44 AM, Jason Pickering jason.p.pickering@gmail.com wrote:

Hi there.

My basic issue with the category/category combo is that it appears to be a one-to-one relationship with data elements. If I look at the data model, there is a one-to-one relationship between dataelement and categorycomboid. For a given category combo, you can have multiple options. So, you can establish a relationship for a given data element and a group of category options.

Let me try and describe the issue. We have a set of data elements related to malaria for this example. We would like to be able to pivot the data on other dimensions dimensions (Data element, age, disease, patient status). Obviously there are other dimensions that are pivotable (orgunit, period, dataset)

The data elements look like this. I have put the dimensions in square brackets, and the dimensional elements into curly brackets.

[Data element, Age, Disease, Patient status]
Deaths Confirmed Malaria total (composed of) {All ages, Malaria Cases, Deaths}

Deaths Confirmed Malaria 1 to Under 5 Years {1-5, Malaria Cases, Deaths}
Deaths Confirmed Malaria Over 5 Years {Over 5, Malaria Cases, Deaths}
Deaths Confirmed Malaria Under 1 Year {Under 1, Malaria Cases, Deaths}

IP Discharge Confirmed Malaria total (composed of) {All ages, Malaria Cases, IP}
IP Discharge Confirmed Malaria 1 to Under 5 Years {1-5, Malaria Cases, IP}
IP Discharge Confirmed Malaria Over 5 Years {Over 5, Malaria Cases, Deaths}

IP Discharge Confirmed Malaria Under 1 Year {Under 1, Malaria Cases, Deaths}
OPD 1st Attendance Confirmed Malaria total (composed of) {All ages, Malaria Cases, OPD}
OPD 1st Attendance Confirmed Malaria 1 to Under 5 Years {1-5, Malaria Cases, OPD}

OPD 1st Attendance Confirmed Malaria Over 5 Years {Over 5, Malaria Cases, OPD}
OPD 1st Attendance Confirmed Malaria Under 1 Year {Under 1, Malaria Cases, OPD}

OK, I hope this is pretty clear. Obviously, there are more data elements (Typhoid, Yellow fever, etc). I might want to know how many Under 1 deaths I have had for all diseases, or how many OPD cases I have had for each disease. How can I do this with the existing data model? It is not obvious to me because there is no relationship between dimensional elements (categoryoptions) to each other. Category options can be related through a cateogry combination, but since data elements can only be assigned a single category option, the dimensionality is broken once it gets time to pull the data into a pivot table.

In the incomplete example that I gave yesterday, I established a one-to-many relationship between a data element and a dimension. If I understand the current data model, I would have to create a separate categorycombo for each of these data elements, and assign this categorycombo to the data element. Now, I might be able to unfold the dimensions using the categories and categorycombos. I it is not apparent how the dimensional elements correspond themselves to a particular dimension, as there is no relation for this in the database as I can see it.

As (Johan pointed out a few mails ago, if I understand him correctly) is different categorycombo’s can be created for individual data elements, and assigned to these elements. However, this seems to be 1) incredibly inefficient and 2) does not establish any relationship between dimensional elements and dimensions. Perhaps it is there, and maybe it has been done in SL, but the SQL is not apparent to me at all.

It would appear to me, looking from an SQL perspective, that a one-to-many relationship between a data element, a dimension (category) and dimensional element (category combo) would be much more effieicnet, and highly usable from an SQL perspective. As I mentioned in my mail, I am not sure how easy this would be to implement in a procedural language like Java, but I assume it should be possible to either do it this way, or rewrite my Postgres proprietary query in standard SQL (which there are ways to do with ANSI SQL). This would require modification to the data model (similar to the table I provided yesterday) and modification to the UI to allow users to 1) select a dimension (category) 2) Select a dimensional element for the given dimension. This would populate the table with a dataelementid, a dimensionid (categoryid) and a dimensional element (cateogryoptionid).

My gut feeling this is exactly the same functionality as has currently been implemented for organizational units. Users can define a hierarchy for organizational units, and then assign them to categories/dimension/organizational group sets, decide whether the groups are compulsory and exlusive, and then assign a particular organizational unit to a particular group (which is analogous to a dimensional element). Organizational group sets define the dimension, and one-to-one assignment of an organizational unit to a particular organizational group defines which dimensional element the organizational unit is a member of. These dimensions can then be used in PivotTable analyses, where the orgunitgroupsets become dimensions, and orgunitgroups become dimensional elements…

I beleive that data elements are no different than organizational units. They should be able to be grouped into some sort of hierarchy and pivoted on any dimension. Data elements groups establish a one-to-many relationship between data elements and a data element group, but there is no concept of how data element groups relate to each other.

I think this is perhaps the same concept you mention, ReportSet.

I suspect we would need to potentially rethink the entire concept of multidimensionality if we really wanted to get it right. It would see to me that the DHIS datamodel and associated aggregation methods have been hardwired into aggregation across time (period) and geography (orgunit). What we can do with PivotTables and (and OLAP) is to aggregate across any possible dimension, slicing as you mention ,on any dimension . I am not sure this will be so simple to implement but I think there is a way to do it, without major modifications.

I am not sure it solves the SDMX issue. There are potential issues related to “ragged” dimensions and how these get handled. Some data elements might have three dimensions, while others may have more. I have not thought about this in detail, but know it is an issue with cross-tab queries in SQL. You normally have to know how many dimensions you are working with in order to perform a cross-tab, but there are dynamic solutions. Perhaps this could be dealt with somehow in SDMX.

Anyway ,I am rambling. Hope this mail helps though to push my point further. Once I get the SQL from SL, I will see if perhaps it has been done already, and that I am just writing long emails for nothing.

Regards,

Jason

On Fri, Sep 25, 2009 at 10:44 AM, Bob Jolliffe bobjolliffe@gmail.com wrote:

Hi Jason and Johan

I’m really pleased to see you having this discussion as I have been

grappling with a similar issue which involves unravelling categories,
category options and combos into something more familiar. I have reached
similar conclusions regarding nomenclature:

category = dimension

categorycombo - I have been calling a dimension set (it bears a strong, and
useful, resemblance to xslt:attribute-set)
category option - I like your suggestion of DimensionalElement. I am going
to start calling it that too.

In my case I need to export (and import data) into a standard format called
sdmx. So whereas in the DHIS2 native DXF we export datavalues with
effectively three dimensions (source, period, categorycombooption) the last

dimension is a sort of uber-dimension. Like a peppercorn or a cardamon
seed, when you break it open it explodes its rich complexity of dimensions.

In sdmx we need the dimensions exploded. So data values look like:

<datavalue name="TB test given" uid="44344 ...44" gender="Female"

age=“0-5” value=“38” />
…

My approach to unpicking the dimensions from the dxf file is to transform it
with an xslt transformation which is still incomplete but seems to work

well.

One other nomenclatures issue which has surfaced as a result is what we call
a “dataset”. In DHIS2, if I understand correctly, a dataset corresponds
roughly to all the dataelements which might occur on a datacollection form.

If we view all dataelements as having just the three “dimensions” then all
is well, but if we explode the actual dimensions then we have an issue. In
the sdmx model a dataset consists only of dataelements with the same

dimensionset. After discussing this with Ola we have reached the conclusion
that we need another level of grouping, primarily for the UI - eg FormSet or
ReportSet which allows us to group related datasets. But that is an aside

from what you are talking about.

I know that you guys can do magic with sql, but it seems that we should try
to capture some of this and place it down in the datamodel API. It occurs
to me that for a multidimensional dataelement we might benefit from some

utility methods to retrieve slices and dices which might assist in
constructing the pivot tables around dimensions. Does this sound like the
right thing to do.

Regards
Bob

2009/9/24 johansa@ifi.uio.no

Jason,
I will leave to others to comment the code, but I have a few comments…

I have done a bit more thinking on this, and would like to offer some
more examples up for discussion.

Basically, we have a lot of data elements that are somehow related to

each other, similar to my kooky example in my original mail. I assume
this is fairly common throughout other HMIS systems. Here, malaria
attendance is broken down into various dimensions/category by patient

type (outpatient, inpatient, and deaths) and by age (under 1 ,1-5 and
over 5). But say you want to be able to pivot to look at outpatient,
inpatient and deaths totals (i.e. summed up by age). Well, you could

create a separate data element for this, but it sure would be nice to
be able to Pivot the data somehow.

In the Sierra Leone db, Edem and Romain set up views that pulled the

categories through into a “Category” pivot field, which you can then use
to get what you want. Simply tick the categories (see below) you want to
see, and group them together in excel. Maybe Edem and Romain can help

further here.

Dimension ? Category
Dimensional element ? Category option ? Category combo ( I think)

The right symbol disappeared from my reply-mail here, but some

clarification:

Crosstab Dimension (age AND gender) = Category combo
Dimension (age, gender) = Category
Dimensional element (inpatient, outpatient, death, under1, 1-5, and over

= Category option

So by assigning a DE the category combo of “gender_age”, you get 9
dimensional elements, 3 category options (in category age) by 3 category
options (in category gender)

Johan

Anyway, here is the helper table I created.

CREATE TABLE test_dataelementcategorycombo
(

test_dataelementid integer NOT NULL,
test_dataelementcategoryid integer NOT NULL,
test_dataelementcategorycomboid integer NOT NULL,
CONSTRAINT pk_testdataelementcategory PRIMARY KEY

(test_dataelementid, test_dataelementcategoryid,
test_dataelementcategorycomboid)
)
WITH (OIDS=FALSE);

So this is a real simple table which references a data element, a

data element category, and a data element combo. The reference to a
data element category may be redundant, but anyway, lets leave it in
for now.

I populated the table with some data, which will be used to assign

dimensions to data elements. It looks like this in my DB, which looks
like this.

309;25250;25251
309;25257;25255
348;25250;25252

348;25257;25255
455;25250;25253
455;25257;25255

but of course this is meaningless to you. What do these values
correspond

to?

“OPD 1st Attendance Clinical Case of Malaria Under 1 Year”;“Age”;“Under
1”
"OPD 1st Attendance Clinical Case of Malaria 1 to Under 5

Years";“Age”;“Age 1-5”
“OPD 1st Attendance Clinical Case of Malaria Over 5 Years”;“Age”;“Over
5”
“OPD 1st Attendance Clinical Case of Malaria Under 1 Year”;"Patient

status";“OPD”
“OPD 1st Attendance Clinical Case of Malaria 1 to Under 5
Years”;“Patient status”;“OPD”
“OPD 1st Attendance Clinical Case of Malaria Over 5 Years”;"Patient

status";“OPD”

which can be produced by the following view.

CREATE OR REPLACE VIEW vw_dataelements_dimensions AS
SELECT dataelement.name, dataelementcategory.name AS dimension,

dataelementcategoryoption.name AS dimension_element
FROM dataelement
JOIN test_dataelementcategorycombo ON

test_dataelementcategorycombo.test_dataelementid =

dataelement.dataelementid
JOIN dataelementcategory ON dataelementcategory.categoryid =
test_dataelementcategorycombo.test_dataelementcategoryid
JOIN dataelementcategoryoption ON

test_dataelementcategorycombo.test_dataelementcategorycomboid =
dataelementcategoryoption.categoryoptionid;

So, that view just provides a human readable view of those integers

that I populated in the the test_dataelementcategorycombo table I
created above. This table just assigns particular data elements to
different category options (dimensional elements).

OK, so far so good, but the problem now is, how to use this with the
aggregatedatavalue table? If we try and join this table directly, we
will have issues with duplicates in the pivot table, so we need to

transform the data slightly.

This should do the trick.

SELECT * FROM crosstab
(
'SELECT name, dimension, dimension_element FROM

vw_dataelements_dimensions ORDER BY 1,2,3’,
‘SELECT DISTINCT dimension from vw_dataelements_dimensions ORDER BY 1
ASC’
)
as

(
name character varying(230),
age character varying(160),
status character varying(160)
);

which returns this record set

“OPD 1st Attendance Clinical Case of Malaria 1 to Under 5 Years”;“Age
1-5”;“OPD”
“OPD 1st Attendance Clinical Case of Malaria Over 5 Years”;"Over

5";“OPD”
“OPD 1st Attendance Clinical Case of Malaria Under 1 Year”;“Under
1”;“OPD”

OK, admittedly, I cheated a bit and used the crosstab function of

Postgresql, but I assume that this query could be rewritten with a few
more lines of code in standard SQL or some procedural language like
Java. Now, this record set looks like something that I can almost use

with the aggregateddatavalue table simply by joining up the table on
the appropriate dataelementid and pulling everything into a pivot
table. I would not have any duplicated values and would have columns

like data element name, period, orgunit, age, patient status and of
course the value of the data element. I hope that part is pretty
clear. Just join up that table to the aggregateddata table, and you

have pretty much what is needed to pull the data directly into a
PivotTable for further analysis.

This is not a complete example, but it is very close to what I need

here ,and I think this type of functionality would be much more useful
than the current data element categories functionality. Basically, all
that would be required, at least initially, would be another user

interface screen to allow the definition of which category(ies) and
category options a data element is a member of. The rest could ,in the
first instance be executed with custom SQL (obviously, I am partial to

this language and hobbled by the fact that I do not know Java), but
eventually this would need to be implemented somehow in Java.

I am not sure if this really solves all of the issues surrounding

multidimensional analysis of data elements, but it seems to solve the
issues that I am having by trying to assign some sort of dimensional
hierarchy to data elements (similar to the exclusive/compulsory

functionality of orgunits). Any thoughts on this?

Best regards,
Jason

On Wed, Sep 16, 2009 at 10:28 PM, Jason Pickering > > > > > > > > >> > jason.p.pickering@gmail.com wrote:

On Wed, Sep 16, 2009 at 10:13 PM, johansa@ifi.uio.no wrote:

However, there does seem to be the ability to assign dimensions,
there
does
not seem to be the ability to assign particular elements within

those

dimensions to a particular DHIS data element.

Just some more clarification here: you can make category combos which

you
assign to data elements. However, it is not possible to assign just
specific parts of a category combo (only some of the category options)
to

a data element.

Yes, this was exactly what I wanted. Assigning different categories
would
seem to break the dimensionality.

Then you must make a specific category (as the only one in
or part of a new category combo) with just those options. It can be

hell;
in Tajikistan there were way over 20 categories I think, at least 10
just
on various age groups.

Johan

This was my fear.

I will need to do some testing and see. I still fear it is not exactly

the
intended functionality.

Basically, I think I need something akin to the exclusive/compulsory
groups
that are in place for organizational units, but instead, for arbitrary

dimensions. I will give a try and see what happens.

Thanks,
Jason

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Johan_Ivar_Saebo · 28 September 2009 12:12

Jason,

found some old stuff.. A bit late maybe, but this might be what you wanted
(last week):

View: pivotsource_routinedata_ou4_categoryoption

DROP VIEW pivotsource_routinedata_ou4_categoryoption;

CREATE OR REPLACE VIEW pivotsource_routinedata_ou4_categoryoption AS
SELECT organisationunit_1.name AS orgunit1, organisationunit_1.shortname
AS ou1,
organisationunit_2.name AS orgunit2, organisationunit_2.shortname AS ou2,
organisationunit_3.name AS orgunit3, organisationunit_3.shortname AS ou3,
organisationunit_4.name AS orgunit4, organisationunit_4.shortname AS ou4,
dataelementgroup.name AS degroup, dataelement.name AS dataelement,
dataelement.shortname AS deshort, categoryoptioncomboname,
periodtype.name AS periodtype,
to_char(period.startdate::timestamp with time zone, 'YYYY'::text) AS year,
to_char(period.startdate::timestamp with time zone, 'Mon'::text) AS month,
(rtrim(to_char(period.startdate::timestamp with time zone, 'Mon'::text))

'-'::text) || to_char(period.startdate::timestamp with time zone,

'YY'::text) AS period,
  aggregateddatavalue.value, aggregateddatavalue.level
   FROM dataelementgroupmembers dataelementgroupmembers, dataelement
dataelement,
   categoryoptioncomboname,
    dataelementgroup dataelementgroup, period period, periodtype periodtype,
     aggregateddatavalue aggregateddatavalue, orgunitstructure
orgunitstructure,
     organisationunit organisationunit_1, organisationunit
organisationunit_2,
     organisationunit organisationunit_3, organisationunit organisationunit_4
  WHERE dataelementgroupmembers.dataelementid = dataelement.dataelementid AND
  dataelementgroupmembers.dataelementgroupid =
dataelementgroup.dataelementgroupid AND
  period.periodtypeid = periodtype.periodtypeid AND periodtype.name::text
= 'Monthly'::text AND
  dataelement.dataelementid = aggregateddatavalue.dataelementid AND
  period.periodid = aggregateddatavalue.periodid AND
  aggregateddatavalue.organisationunitid =
orgunitstructure.organisationunitid AND
  orgunitstructure.idlevel1 = organisationunit_1.organisationunitid AND
  orgunitstructure.idlevel2 = organisationunit_2.organisationunitid AND
  orgunitstructure.idlevel3 = organisationunit_3.organisationunitid AND
  orgunitstructure.idlevel4 = organisationunit_4.organisationunitid AND
  aggregateddatavalue.categoryoptioncomboid=categoryoptioncomboname.categoryoptioncomboid
AND
  aggregateddatavalue.level = 4
  ORDER BY period.startdate;

ALTER TABLE pivotsource_routinedata_ou4_categoryoption OWNER TO dhis;

Johan

···

The realtionship is one-to-many. A categorycombo can be assigned for
many
dataelements. But a dataelement can have only one categorycombo.

Thanks for clarifying, but this is exactly what I meant. Sorry if it
was
not clear.

On Fri, Sep 25, 2009 at 11:44 AM, Jason Pickering < >> jason.p.pickering@gmail.com> wrote:

Hi there.

My basic issue with the category/category combo is that it appears to
be a
one-to-one relationship with data elements. If I look at the data
model,
there is a one-to-one relationship between dataelement and
categorycomboid.
For a given category combo, you can have multiple options. So, you can
establish a relationship for a given data element and a group of
category
options.

Let me try and describe the issue. We have a set of data elements
related
to malaria for this example. We would like to be able to pivot the data
on
other dimensions dimensions (Data element, age, disease, patient
status).
Obviously there are other dimensions that are pivotable (orgunit,
period,
dataset)

The data elements look like this. I have put the dimensions in square
brackets, and the dimensional elements into curly brackets.

[Data element, Age, Disease, Patient status]
Deaths Confirmed Malaria total (composed of) {All ages, Malaria Cases,
Deaths}
Deaths Confirmed Malaria 1 to Under 5 Years {1-5, Malaria Cases,
Deaths}
Deaths Confirmed Malaria Over 5 Years {Over 5, Malaria Cases, Deaths}
Deaths Confirmed Malaria Under 1 Year {Under 1, Malaria Cases, Deaths}
IP Discharge Confirmed Malaria total (composed of) {All ages, Malaria
Cases, IP}
IP Discharge Confirmed Malaria 1 to Under 5 Years {1-5, Malaria Cases,
IP}
IP Discharge Confirmed Malaria Over 5 Years {Over 5, Malaria Cases,
Deaths}
IP Discharge Confirmed Malaria Under 1 Year {Under 1, Malaria Cases,
Deaths}
OPD 1st Attendance Confirmed Malaria total (composed of) {All ages,
Malaria Cases, OPD}
OPD 1st Attendance Confirmed Malaria 1 to Under 5 Years {1-5, Malaria
Cases, OPD}
OPD 1st Attendance Confirmed Malaria Over 5 Years {Over 5, Malaria
Cases,
OPD}
OPD 1st Attendance Confirmed Malaria Under 1 Year {Under 1, Malaria
Cases,
OPD}

OK, I hope this is pretty clear. Obviously, there are more data
elements
(Typhoid, Yellow fever, etc). I might want to know how many Under 1
deaths I
have had for all diseases, or how many OPD cases I have had for each
disease. How can I do this with the existing data model? It is not
obvious
to me because there is no relationship between dimensional elements
(categoryoptions) to each other. Category options can be related
through a
cateogry combination, but since data elements can only be assigned a
single
category option, the dimensionality is broken once it gets time to pull
the
data into a pivot table.

In the incomplete example that I gave yesterday, I established a
one-to-many relationship between a data element and a dimension. If I
understand the current data model, I would have to create a separate
categorycombo for each of these data elements, and assign this
categorycombo to the data element. Now, I might be able to unfold the
dimensions using the categories and categorycombos. I it is not
apparent how
the dimensional elements correspond themselves to a particular
dimension, as
there is no relation for this in the database as I can see it.

As (Johan pointed out a few mails ago, if I understand him correctly)
is
different categorycombo's can be created for individual data elements,
and
assigned to these elements. However, this seems to be 1) incredibly
inefficient and 2) does not establish any relationship between
dimensional
elements and dimensions. Perhaps it is there, and maybe it has been
done in
SL, but the SQL is not apparent to me at all.

It would appear to me, looking from an SQL perspective, that a
one-to-many relationship between a data element, a dimension (category)
and
dimensional element (category combo) would be much more effieicnet, and
highly usable from an SQL perspective. As I mentioned in my mail, I am
not
sure how easy this would be to implement in a procedural language like
Java,
but I assume it should be possible to either do it this way, or rewrite
my
Postgres proprietary query in standard SQL (which there are ways to do
with
ANSI SQL). This would require modification to the data model (similar
to
the table I provided yesterday) and modification to the UI to allow
users to
1) select a dimension (category) 2) Select a dimensional element for
the
given dimension. This would populate the table with a dataelementid, a
dimensionid (categoryid) and a dimensional element (cateogryoptionid).

My gut feeling this is exactly the same functionality as has currently
been implemented for organizational units. Users can define a hierarchy
for
organizational units, and then assign them to
categories/dimension/organizational group sets, decide whether the
groups
are compulsory and exlusive, and then assign a particular
organizational
unit to a particular group (which is analogous to a dimensional
element).
Organizational group sets define the dimension, and one-to-one
assignment of
an organizational unit to a particular organizational group defines
which
dimensional element the organizational unit is a member of. These
dimensions
can then be used in PivotTable analyses, where the orgunitgroupsets
become
dimensions, and orgunitgroups become dimensional elements..

I beleive that data elements are no different than organizational
units.
They should be able to be grouped into some sort of hierarchy and
pivoted on
any dimension. Data elements groups establish a one-to-many
relationship
between data elements and a data element group, but there is no concept
of
how data element groups relate to each other.
I think this is perhaps the same concept you mention, ReportSet.

I suspect we would need to potentially rethink the entire concept of
multidimensionality if we really wanted to get it right. It would see
to me
that the DHIS datamodel and associated aggregation methods have been
hardwired into aggregation across time (period) and geography
(orgunit).
What we can do with PivotTables and (and OLAP) is to aggregate across
any
possible dimension, slicing as you mention ,on any dimension . I am not
sure
this will be so simple to implement but I think there is a way to do
it,
without major modifications.

I am not sure it solves the SDMX issue. There are potential issues
related
to "ragged" dimensions and how these get handled. Some data elements
might
have three dimensions, while others may have more. I have not thought
about
this in detail, but know it is an issue with cross-tab queries in SQL.
You
normally have to know how many dimensions you are working with in order
to
perform a cross-tab, but there are dynamic solutions. Perhaps this
could be
dealt with somehow in SDMX.

Anyway ,I am rambling. Hope this mail helps though to push my point
further. Once I get the SQL from SL, I will see if perhaps it has been
done
already, and that I am just writing long emails for nothing.

Regards,
Jason

On Fri, Sep 25, 2009 at 10:44 AM, Bob Jolliffe <bobjolliffe@gmail.com> >>> wrote:
> Hi Jason and Johan
>
> I'm really pleased to see you having this discussion as I have been
> grappling with a similar issue which involves unravelling categories,
> category options and combos into something more familiar. I have
reached
> similar conclusions regarding nomenclature:
>
> category = dimension
> categorycombo - I have been calling a dimension set (it bears a
strong,
and
> useful, resemblance to xslt:attribute-set)
> category option - I like your suggestion of DimensionalElement. I am
going
> to start calling it that too.
>
> In my case I need to export (and import data) into a standard format
called
> sdmx. So whereas in the DHIS2 native DXF we export datavalues with
> effectively three dimensions (source, period, categorycombooption)
the
last
> dimension is a sort of uber-dimension. Like a peppercorn or a
cardamon
> seed, when you break it open it explodes its rich complexity of
dimensions.
>
> In sdmx we need the dimensions exploded. So data values look like:
>
> <dataset>
> <datavalue name="TB test given" uid="44344 ...44" gender="Male"
age="0-5"
> value="32" />
> <datavalue name="TB test given" uid="44344 ...44" gender="Female"
> age="0-5" value="38" />
> ..
> </dataset>
>
> My approach to unpicking the dimensions from the dxf file is to
transform it
> with an xslt transformation which is still incomplete but seems to
work
> well.
>
> One other nomenclatures issue which has surfaced as a result is what
we
call
> a "dataset". In DHIS2, if I understand correctly, a dataset
corresponds
> roughly to all the dataelements which might occur on a datacollection
form.
> If we view all dataelements as having just the three "dimensions"
then
all
> is well, but if we explode the actual dimensions then we have an
issue.
In
> the sdmx model a dataset consists only of dataelements with the same
> dimensionset. After discussing this with Ola we have reached the
conclusion
> that we need another level of grouping, primarily for the UI - eg
FormSet or
> ReportSet which allows us to group related datasets. But that is an
aside
> from what you are talking about.
>
> I know that you guys can do magic with sql, but it seems that we
should
try
> to capture some of this and place it down in the datamodel API. It
occurs
> to me that for a multidimensional dataelement we might benefit from
some
> utility methods to retrieve slices and dices which might assist in
> constructing the pivot tables around dimensions. Does this sound
like
the
> right thing to do.
>
> Regards
> Bob
>
> 2009/9/24 <johansa@ifi.uio.no>
>>
>> Jason,
>> I will leave to others to comment the code, but I have a few
comments...
>>
>> > I have done a bit more thinking on this, and would like to offer
some
>> > more examples up for discussion.
>> >
>> > Basically, we have a lot of data elements that are somehow related
to
>> > each other, similar to my kooky example in my original mail. I
assume
>> > this is fairly common throughout other HMIS systems. Here,
malaria
>> > attendance is broken down into various dimensions/category by
patient
>> > type (outpatient, inpatient, and deaths) and by age (under 1 ,1-5
and
>> > over 5). But say you want to be able to pivot to look at
outpatient,
>> > inpatient and deaths totals (i.e. summed up by age). Well, you
could
>> > create a separate data element for this, but it sure would be nice
to
>> > be able to Pivot the data somehow.
>>
>> In the Sierra Leone db, Edem and Romain set up views that pulled the
>> categories through into a "Category" pivot field, which you can then
use
>> to get what you want. Simply tick the categories (see below) you
want
to
>> see, and group them together in excel. Maybe Edem and Romain can
help
>> further here.
>>
>>
>> > Dimension ? Category
>> > Dimensional element ? Category option ? Category combo ( I think)
>>
>> The right symbol disappeared from my reply-mail here, but some
>> clarification:
>>
>> Crosstab Dimension (age AND gender) = Category combo
>> Dimension (age, gender) = Category
>> Dimensional element (inpatient, outpatient, death, under1, 1-5, and
over
>> 5) = Category option
>>
>> So by assigning a DE the category combo of "gender_age", you get 9
>> dimensional elements, 3 category options (in category age) by 3
category
>> options (in category gender)
>>
>> Johan
>>
>>
>>
>>
>> > Anyway, here is the helper table I created.
>> >
>> > CREATE TABLE test_dataelementcategorycombo
>> > (
>> > test_dataelementid integer NOT NULL,
>> > test_dataelementcategoryid integer NOT NULL,
>> > test_dataelementcategorycomboid integer NOT NULL,
>> > CONSTRAINT pk_testdataelementcategory PRIMARY KEY
>> > (test_dataelementid, test_dataelementcategoryid,
>> > test_dataelementcategorycomboid)
>> > )
>> > WITH (OIDS=FALSE);
>> >
>> > So this is a real simple table which references a data element, a
>> > data element category, and a data element combo. The reference to
a
>> > data element category may be redundant, but anyway, lets leave it
in
>> > for now.
>> >
>> > I populated the table with some data, which will be used to assign
>> > dimensions to data elements. It looks like this in my DB, which
looks
>> > like this.
>> >
>> > 309;25250;25251
>> > 309;25257;25255
>> > 348;25250;25252
>> > 348;25257;25255
>> > 455;25250;25253
>> > 455;25257;25255
>> >
>> > but of course this is meaningless to you. What do these values
>> > correspond
>> > to?
>> >
>> > "OPD 1st Attendance Clinical Case of Malaria Under 1
Year";"Age";"Under
>> > 1"
>> > "OPD 1st Attendance Clinical Case of Malaria 1 to Under 5
>> > Years";"Age";"Age 1-5"
>> > "OPD 1st Attendance Clinical Case of Malaria Over 5
Years";"Age";"Over
>> > 5"
>> > "OPD 1st Attendance Clinical Case of Malaria Under 1
Year";"Patient
>> > status";"OPD"
>> > "OPD 1st Attendance Clinical Case of Malaria 1 to Under 5
>> > Years";"Patient status";"OPD"
>> > "OPD 1st Attendance Clinical Case of Malaria Over 5
Years";"Patient
>> > status";"OPD"
>> >
>> > which can be produced by the following view.
>> >
>> > CREATE OR REPLACE VIEW vw_dataelements_dimensions AS
>> > SELECT dataelement.name, dataelementcategory.name AS dimension,
>> > dataelementcategoryoption.name AS dimension_element
>> > FROM dataelement
>> > JOIN test_dataelementcategorycombo ON
>> > test_dataelementcategorycombo.test_dataelementid =
>> > dataelement.dataelementid
>> > JOIN dataelementcategory ON dataelementcategory.categoryid =
>> > test_dataelementcategorycombo.test_dataelementcategoryid
>> > JOIN dataelementcategoryoption ON
>> > test_dataelementcategorycombo.test_dataelementcategorycomboid =
>> > dataelementcategoryoption.categoryoptionid;
>> >
>> > So, that view just provides a human readable view of those
integers
>> > that I populated in the the test_dataelementcategorycombo table I
>> > created above. This table just assigns particular data elements
to
>> > different category options (dimensional elements).
>> >
>> > OK, so far so good, but the problem now is, how to use this with
the
>> > aggregatedatavalue table? If we try and join this table directly,
we
>> > will have issues with duplicates in the pivot table, so we need to
>> > transform the data slightly.
>> >
>> > This should do the trick.
>> >
>> > SELECT * FROM crosstab
>> > (
>> > 'SELECT name, dimension, dimension_element FROM
>> > vw_dataelements_dimensions ORDER BY 1,2,3',
>> > 'SELECT DISTINCT dimension from vw_dataelements_dimensions ORDER
BY 1
>> > ASC'
>> > )
>> > as
>> > (
>> > name character varying(230),
>> > age character varying(160),
>> > status character varying(160)
>> > );
>> >
>> >
>> > which returns this record set
>> >
>> > "OPD 1st Attendance Clinical Case of Malaria 1 to Under 5
Years";"Age
>> > 1-5";"OPD"
>> > "OPD 1st Attendance Clinical Case of Malaria Over 5 Years";"Over
>> > 5";"OPD"
>> > "OPD 1st Attendance Clinical Case of Malaria Under 1 Year";"Under
>> > 1";"OPD"
>> >
>> >
>> > OK, admittedly, I cheated a bit and used the crosstab function of
>> > Postgresql, but I assume that this query could be rewritten with a
few
>> > more lines of code in standard SQL or some procedural language
like
>> > Java. Now, this record set looks like something that I can almost
use
>> > with the aggregateddatavalue table simply by joining up the table
on
>> > the appropriate dataelementid and pulling everything into a pivot
>> > table. I would not have any duplicated values and would have
columns
>> > like data element name, period, orgunit, age, patient status and
of
>> > course the value of the data element. I hope that part is pretty
>> > clear. Just join up that table to the aggregateddata table, and
you
>> > have pretty much what is needed to pull the data directly into a
>> > PivotTable for further analysis.
>> >
>> > This is not a complete example, but it is very close to what I
need
>> > here ,and I think this type of functionality would be much more
useful
>> > than the current data element categories functionality. Basically,
all
>> > that would be required, at least initially, would be another user
>> > interface screen to allow the definition of which category(ies)
and
>> > category options a data element is a member of. The rest could ,in
the
>> > first instance be executed with custom SQL (obviously, I am
partial
to
>> > this language and hobbled by the fact that I do not know Java),
but
>> > eventually this would need to be implemented somehow in Java.
>> >
>> > I am not sure if this really solves all of the issues surrounding
>> > multidimensional analysis of data elements, but it seems to solve
the
>> > issues that I am having by trying to assign some sort of
dimensional
>> > hierarchy to data elements (similar to the exclusive/compulsory
>> > functionality of orgunits). Any thoughts on this?
>> >
>> > Best regards,
>> > Jason
>> >
>> >
>> >
>> >
>> > On Wed, Sep 16, 2009 at 10:28 PM, Jason Pickering >>> >> > <jason.p.pickering@gmail.com> wrote:
>> >>
>> >>
>> >> On Wed, Sep 16, 2009 at 10:13 PM, <johansa@ifi.uio.no> wrote:
>> >>>
>> >>> >> However, there does seem to be the ability to assign
dimensions,
>> >>> there
>> >>> >> does
>> >>> >> not seem to be the ability to assign particular elements
within
>> >>> those
>> >>> >> dimensions to a particular DHIS data element.
>> >>>
>> >>>
>> >>> Just some more clarification here: you can make category combos
which
>> >>> you
>> >>> assign to data elements. However, it is not possible to assign
just
>> >>> specific parts of a category combo (only some of the category
options)
>> >>> to
>> >>> a data element.
>> >>
>> >> Yes, this was exactly what I wanted. Assigning different
categories
>> >> would
>> >> seem to break the dimensionality.
>> >>
>> >>>
>> >>> Then you must make a specific category (as the only one in
>> >>> or part of a new category combo) with just those options. It can
be
>> >>> hell;
>> >>> in Tajikistan there were way over 20 categories I think, at
least
10
>> >>> just
>> >>> on various age groups.
>> >>>
>> >>> Johan
>> >>>
>> >>
>> >> This was my fear.
>> >>
>> >> I will need to do some testing and see. I still fear it is not
exactly
>> >> the
>> >> intended functionality.
>> >>
>> >> Basically, I think I need something akin to the
exclusive/compulsory
>> >> groups
>> >> that are in place for organizational units, but instead, for
arbitrary
>> >> dimensions. I will give a try and see what happens.
>> >>
>> >> Thanks,
>> >> Jason
>> >>
>> >>
>> >>
>> >>
>> >>
>> >
>>
>>
>>
>> _______________________________________________
>> Mailing list:
https://launchpad.net/~dhis2-devs<https://launchpad.net/~dhis2-devs>
>> Post to : dhis2-devs@lists.launchpad.net
>> Unsubscribe :
https://launchpad.net/~dhis2-devs<https://launchpad.net/~dhis2-devs>
>> More help : https://help.launchpad.net/ListHelp
>
>

_______________________________________________
Mailing list:
https://launchpad.net/~dhis2-devs<https://launchpad.net/~dhis2-devs>
Post to : dhis2-devs@lists.launchpad.net
Unsubscribe :
https://launchpad.net/~dhis2-devs<https://launchpad.net/~dhis2-devs>
More help : https://help.launchpad.net/ListHelp

_______________________________________________
Mailing list: https://launchpad.net/~dhis2-devs
Post to : dhis2-devs@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-devs
More help : https://help.launchpad.net/ListHelp

jason · 28 September 2009 13:00

The link that you sent seems reasonable, but this is not what has been
implemented in DHIS 2 at this point as far as I can tell. In the
current model (at least through the UI), I have created two categories
(Age and Patient status). I then created a categorycombo
Age_Patientstatus (Age and Patient status). I do this because
conceptually, Age is a dimension, and each of the member of the
category should correspond to a dimensional element (Under 1, 1-5 and
Over 5 for Age). This is necessary because in the SQL view, I need to
have a single column for each dimension, populated with the
appropriate dimensional elements.

query did not work for me but looking at the source of the query, I
assume this is what is supposed to happen.

If I then assign these categorycombos to my data elements, I already
know it is not going to work, because I have no idea which one of the
categorycombooptions is applicable to a particular data element. I
suppose this is why I would need to create a categorycombo with
exactly one option in each category, which again, is not desired. Each
category should be able to have multiple options.

Now, I do see some light.

Now, looking at the current database, when I generate the resource
table, this is what i get back in categoryoptioncombo name
25270;25260;"(Over 5,IPD,)"
25271;25261;"(Over 5,Deaths,)"
25272;25262;"(Over 5,OPD,)"
25273;25263;"(Under 1,IPD,)"
25274;25264;"(Under 1,Deaths,)"
25275;25265;"(Under 1,OPD,)"
25277;25267;"(Age 1-5,Deaths,)"
25278;25268;"(Age 1-5,OPD,)"
;;""
25276;25266;"(Age 1-5,IPD,)"

Now, this looks very much like what I need in my PivotTable source
query, which I think is what the query that Johan just sent is
supposed to provide. (The query did not work, but I assume this is
what it is meant to happen).

The problem is now, I have no idea (at least directly) that the first
set of values corresponds to Age and the second set of values
corresponds to Patient status.

If I could assign a data element a categoryoptioncomboid (the second
number in that result set above) instead of a categorycomboid (as is
the case now) I think I would be able to produce the result set that I
actually want. However, by assigning a data element the
categorycomboid , I can only tell which dimensions the data element
has, but not which particular dimensional elements it possesses.

So, perhaps you are right that there is no need for any changes to the
data model, but rather the assignment that I mention above.

Johan, thanks for the query. I will see if I can get it to work.

Best regards,
Jason

···

On Fri, Sep 25, 2009 at 12:07 PM, Knut Staring <knutst@gmail.com> wrote:

http://208.76.222.114/confluence/display/RandD/General+multi-dimensional+model

On Fri, Sep 25, 2009 at 11:55 AM, Abyot Gizaw <abyota@gmail.com> wrote:

The one-to-one relationship mentioned between dataelement and
categorycombo is not correct !

The realtionship is one-to-many. A categorycombo can be assigned for many
dataelements. But a dataelement can have only one categorycombo.

Thank you
Abyot.

On Fri, Sep 25, 2009 at 11:44 AM, Jason Pickering >> <jason.p.pickering@gmail.com> wrote:

Hi there.

My basic issue with the category/category combo is that it appears to be
a one-to-one relationship with data elements. If I look at the data model,
there is a one-to-one relationship between dataelement and categorycomboid.
For a given category combo, you can have multiple options. So, you can
establish a relationship for a given data element and a group of category
options.

Let me try and describe the issue. We have a set of data elements related
to malaria for this example. We would like to be able to pivot the data on
other dimensions dimensions (Data element, age, disease, patient status).
Obviously there are other dimensions that are pivotable (orgunit, period,
dataset)

The data elements look like this. I have put the dimensions in square
brackets, and the dimensional elements into curly brackets.

[Data element, Age, Disease, Patient status]
Deaths Confirmed Malaria total (composed of) {All ages, Malaria Cases,
Deaths}
Deaths Confirmed Malaria 1 to Under 5 Years {1-5, Malaria Cases, Deaths}
Deaths Confirmed Malaria Over 5 Years {Over 5, Malaria Cases, Deaths}
Deaths Confirmed Malaria Under 1 Year {Under 1, Malaria Cases, Deaths}
IP Discharge Confirmed Malaria total (composed of) {All ages, Malaria
Cases, IP}
IP Discharge Confirmed Malaria 1 to Under 5 Years {1-5, Malaria Cases,
IP}
IP Discharge Confirmed Malaria Over 5 Years {Over 5, Malaria Cases,
Deaths}
IP Discharge Confirmed Malaria Under 1 Year {Under 1, Malaria Cases,
Deaths}
OPD 1st Attendance Confirmed Malaria total (composed of) {All ages,
Malaria Cases, OPD}
OPD 1st Attendance Confirmed Malaria 1 to Under 5 Years {1-5, Malaria
Cases, OPD}
OPD 1st Attendance Confirmed Malaria Over 5 Years {Over 5, Malaria Cases,
OPD}
OPD 1st Attendance Confirmed Malaria Under 1 Year {Under 1, Malaria
Cases, OPD}

OK, I hope this is pretty clear. Obviously, there are more data elements
(Typhoid, Yellow fever, etc). I might want to know how many Under 1 deaths I
have had for all diseases, or how many OPD cases I have had for each
disease. How can I do this with the existing data model? It is not obvious
to me because there is no relationship between dimensional elements
(categoryoptions) to each other. Category options can be related through a
cateogry combination, but since data elements can only be assigned a single
category option, the dimensionality is broken once it gets time to pull the
data into a pivot table.

In the incomplete example that I gave yesterday, I established a
one-to-many relationship between a data element and a dimension. If I
understand the current data model, I would have to create a separate
categorycombo for each of these data elements, and assign this
categorycombo to the data element. Now, I might be able to unfold the
dimensions using the categories and categorycombos. I it is not apparent how
the dimensional elements correspond themselves to a particular dimension, as
there is no relation for this in the database as I can see it.

As (Johan pointed out a few mails ago, if I understand him correctly)
is different categorycombo's can be created for individual data elements,
and assigned to these elements. However, this seems to be 1) incredibly
inefficient and 2) does not establish any relationship between dimensional
elements and dimensions. Perhaps it is there, and maybe it has been done in
SL, but the SQL is not apparent to me at all.

It would appear to me, looking from an SQL perspective, that a
one-to-many relationship between a data element, a dimension (category) and
dimensional element (category combo) would be much more effieicnet, and
highly usable from an SQL perspective. As I mentioned in my mail, I am not
sure how easy this would be to implement in a procedural language like Java,
but I assume it should be possible to either do it this way, or rewrite my
Postgres proprietary query in standard SQL (which there are ways to do with
ANSI SQL). This would require modification to the data model (similar to
the table I provided yesterday) and modification to the UI to allow users to
1) select a dimension (category) 2) Select a dimensional element for the
given dimension. This would populate the table with a dataelementid, a
dimensionid (categoryid) and a dimensional element (cateogryoptionid).

My gut feeling this is exactly the same functionality as has currently
been implemented for organizational units. Users can define a hierarchy for
organizational units, and then assign them to
categories/dimension/organizational group sets, decide whether the groups
are compulsory and exlusive, and then assign a particular organizational
unit to a particular group (which is analogous to a dimensional element).
Organizational group sets define the dimension, and one-to-one assignment of
an organizational unit to a particular organizational group defines which
dimensional element the organizational unit is a member of. These dimensions
can then be used in PivotTable analyses, where the orgunitgroupsets become
dimensions, and orgunitgroups become dimensional elements..

I beleive that data elements are no different than organizational units.
They should be able to be grouped into some sort of hierarchy and pivoted on
any dimension. Data elements groups establish a one-to-many relationship
between data elements and a data element group, but there is no concept of
how data element groups relate to each other.
I think this is perhaps the same concept you mention, ReportSet.

I suspect we would need to potentially rethink the entire concept of
multidimensionality if we really wanted to get it right. It would see to me
that the DHIS datamodel and associated aggregation methods have been
hardwired into aggregation across time (period) and geography (orgunit).
What we can do with PivotTables and (and OLAP) is to aggregate across any
possible dimension, slicing as you mention ,on any dimension . I am not sure
this will be so simple to implement but I think there is a way to do it,
without major modifications.

I am not sure it solves the SDMX issue. There are potential issues
related to "ragged" dimensions and how these get handled. Some data elements
might have three dimensions, while others may have more. I have not thought
about this in detail, but know it is an issue with cross-tab queries in SQL.
You normally have to know how many dimensions you are working with in order
to perform a cross-tab, but there are dynamic solutions. Perhaps this could
be dealt with somehow in SDMX.

Anyway ,I am rambling. Hope this mail helps though to push my point
further. Once I get the SQL from SL, I will see if perhaps it has been done
already, and that I am just writing long emails for nothing.

Regards,
Jason

On Fri, Sep 25, 2009 at 10:44 AM, Bob Jolliffe <bobjolliffe@gmail.com> >>> wrote:
> Hi Jason and Johan
>
> I'm really pleased to see you having this discussion as I have been
> grappling with a similar issue which involves unravelling categories,
> category options and combos into something more familiar. I have
> reached
> similar conclusions regarding nomenclature:
>
> category = dimension
> categorycombo - I have been calling a dimension set (it bears a strong,
> and
> useful, resemblance to xslt:attribute-set)
> category option - I like your suggestion of DimensionalElement. I am
> going
> to start calling it that too.
>
> In my case I need to export (and import data) into a standard format
> called
> sdmx. So whereas in the DHIS2 native DXF we export datavalues with
> effectively three dimensions (source, period, categorycombooption) the
> last
> dimension is a sort of uber-dimension. Like a peppercorn or a cardamon
> seed, when you break it open it explodes its rich complexity of
> dimensions.
>
> In sdmx we need the dimensions exploded. So data values look like:
>
> <dataset>
> <datavalue name="TB test given" uid="44344 ...44" gender="Male"
> age="0-5"
> value="32" />
> <datavalue name="TB test given" uid="44344 ...44" gender="Female"
> age="0-5" value="38" />
> ..
> </dataset>
>
> My approach to unpicking the dimensions from the dxf file is to
> transform it
> with an xslt transformation which is still incomplete but seems to work
> well.
>
> One other nomenclatures issue which has surfaced as a result is what we
> call
> a "dataset". In DHIS2, if I understand correctly, a dataset
> corresponds
> roughly to all the dataelements which might occur on a datacollection
> form.
> If we view all dataelements as having just the three "dimensions" then
> all
> is well, but if we explode the actual dimensions then we have an
> issue. In
> the sdmx model a dataset consists only of dataelements with the same
> dimensionset. After discussing this with Ola we have reached the
> conclusion
> that we need another level of grouping, primarily for the UI - eg
> FormSet or
> ReportSet which allows us to group related datasets. But that is an
> aside
> from what you are talking about.
>
> I know that you guys can do magic with sql, but it seems that we should
> try
> to capture some of this and place it down in the datamodel API. It
> occurs
> to me that for a multidimensional dataelement we might benefit from
> some
> utility methods to retrieve slices and dices which might assist in
> constructing the pivot tables around dimensions. Does this sound like
> the
> right thing to do.
>
> Regards
> Bob
>
> 2009/9/24 <johansa@ifi.uio.no>
>>
>> Jason,
>> I will leave to others to comment the code, but I have a few
>> comments...
>>
>> > I have done a bit more thinking on this, and would like to offer
>> > some
>> > more examples up for discussion.
>> >
>> > Basically, we have a lot of data elements that are somehow related
>> > to
>> > each other, similar to my kooky example in my original mail. I
>> > assume
>> > this is fairly common throughout other HMIS systems. Here, malaria
>> > attendance is broken down into various dimensions/category by
>> > patient
>> > type (outpatient, inpatient, and deaths) and by age (under 1 ,1-5
>> > and
>> > over 5). But say you want to be able to pivot to look at outpatient,
>> > inpatient and deaths totals (i.e. summed up by age). Well, you could
>> > create a separate data element for this, but it sure would be nice
>> > to
>> > be able to Pivot the data somehow.
>>
>> In the Sierra Leone db, Edem and Romain set up views that pulled the
>> categories through into a "Category" pivot field, which you can then
>> use
>> to get what you want. Simply tick the categories (see below) you want
>> to
>> see, and group them together in excel. Maybe Edem and Romain can help
>> further here.
>>
>>
>> > Dimension ? Category
>> > Dimensional element ? Category option ? Category combo ( I think)
>>
>> The right symbol disappeared from my reply-mail here, but some
>> clarification:
>>
>> Crosstab Dimension (age AND gender) = Category combo
>> Dimension (age, gender) = Category
>> Dimensional element (inpatient, outpatient, death, under1, 1-5, and
>> over
>> 5) = Category option
>>
>> So by assigning a DE the category combo of "gender_age", you get 9
>> dimensional elements, 3 category options (in category age) by 3
>> category
>> options (in category gender)
>>
>> Johan
>>
>>
>>
>>
>> > Anyway, here is the helper table I created.
>> >
>> > CREATE TABLE test_dataelementcategorycombo
>> > (
>> > test_dataelementid integer NOT NULL,
>> > test_dataelementcategoryid integer NOT NULL,
>> > test_dataelementcategorycomboid integer NOT NULL,
>> > CONSTRAINT pk_testdataelementcategory PRIMARY KEY
>> > (test_dataelementid, test_dataelementcategoryid,
>> > test_dataelementcategorycomboid)
>> > )
>> > WITH (OIDS=FALSE);
>> >
>> > So this is a real simple table which references a data element, a
>> > data element category, and a data element combo. The reference to a
>> > data element category may be redundant, but anyway, lets leave it in
>> > for now.
>> >
>> > I populated the table with some data, which will be used to assign
>> > dimensions to data elements. It looks like this in my DB, which
>> > looks
>> > like this.
>> >
>> > 309;25250;25251
>> > 309;25257;25255
>> > 348;25250;25252
>> > 348;25257;25255
>> > 455;25250;25253
>> > 455;25257;25255
>> >
>> > but of course this is meaningless to you. What do these values
>> > correspond
>> > to?
>> >
>> > "OPD 1st Attendance Clinical Case of Malaria Under 1
>> > Year";"Age";"Under
>> > 1"
>> > "OPD 1st Attendance Clinical Case of Malaria 1 to Under 5
>> > Years";"Age";"Age 1-5"
>> > "OPD 1st Attendance Clinical Case of Malaria Over 5
>> > Years";"Age";"Over
>> > 5"
>> > "OPD 1st Attendance Clinical Case of Malaria Under 1 Year";"Patient
>> > status";"OPD"
>> > "OPD 1st Attendance Clinical Case of Malaria 1 to Under 5
>> > Years";"Patient status";"OPD"
>> > "OPD 1st Attendance Clinical Case of Malaria Over 5 Years";"Patient
>> > status";"OPD"
>> >
>> > which can be produced by the following view.
>> >
>> > CREATE OR REPLACE VIEW vw_dataelements_dimensions AS
>> > SELECT dataelement.name, dataelementcategory.name AS dimension,
>> > dataelementcategoryoption.name AS dimension_element
>> > FROM dataelement
>> > JOIN test_dataelementcategorycombo ON
>> > test_dataelementcategorycombo.test_dataelementid =
>> > dataelement.dataelementid
>> > JOIN dataelementcategory ON dataelementcategory.categoryid =
>> > test_dataelementcategorycombo.test_dataelementcategoryid
>> > JOIN dataelementcategoryoption ON
>> > test_dataelementcategorycombo.test_dataelementcategorycomboid =
>> > dataelementcategoryoption.categoryoptionid;
>> >
>> > So, that view just provides a human readable view of those integers
>> > that I populated in the the test_dataelementcategorycombo table I
>> > created above. This table just assigns particular data elements to
>> > different category options (dimensional elements).
>> >
>> > OK, so far so good, but the problem now is, how to use this with the
>> > aggregatedatavalue table? If we try and join this table directly, we
>> > will have issues with duplicates in the pivot table, so we need to
>> > transform the data slightly.
>> >
>> > This should do the trick.
>> >
>> > SELECT * FROM crosstab
>> > (
>> > 'SELECT name, dimension, dimension_element FROM
>> > vw_dataelements_dimensions ORDER BY 1,2,3',
>> > 'SELECT DISTINCT dimension from vw_dataelements_dimensions ORDER BY
>> > 1
>> > ASC'
>> > )
>> > as
>> > (
>> > name character varying(230),
>> > age character varying(160),
>> > status character varying(160)
>> > );
>> >
>> >
>> > which returns this record set
>> >
>> > "OPD 1st Attendance Clinical Case of Malaria 1 to Under 5
>> > Years";"Age
>> > 1-5";"OPD"
>> > "OPD 1st Attendance Clinical Case of Malaria Over 5 Years";"Over
>> > 5";"OPD"
>> > "OPD 1st Attendance Clinical Case of Malaria Under 1 Year";"Under
>> > 1";"OPD"
>> >
>> >
>> > OK, admittedly, I cheated a bit and used the crosstab function of
>> > Postgresql, but I assume that this query could be rewritten with a
>> > few
>> > more lines of code in standard SQL or some procedural language like
>> > Java. Now, this record set looks like something that I can almost
>> > use
>> > with the aggregateddatavalue table simply by joining up the table on
>> > the appropriate dataelementid and pulling everything into a pivot
>> > table. I would not have any duplicated values and would have
>> > columns
>> > like data element name, period, orgunit, age, patient status and of
>> > course the value of the data element. I hope that part is pretty
>> > clear. Just join up that table to the aggregateddata table, and you
>> > have pretty much what is needed to pull the data directly into a
>> > PivotTable for further analysis.
>> >
>> > This is not a complete example, but it is very close to what I need
>> > here ,and I think this type of functionality would be much more
>> > useful
>> > than the current data element categories functionality. Basically,
>> > all
>> > that would be required, at least initially, would be another user
>> > interface screen to allow the definition of which category(ies) and
>> > category options a data element is a member of. The rest could ,in
>> > the
>> > first instance be executed with custom SQL (obviously, I am partial
>> > to
>> > this language and hobbled by the fact that I do not know Java), but
>> > eventually this would need to be implemented somehow in Java.
>> >
>> > I am not sure if this really solves all of the issues surrounding
>> > multidimensional analysis of data elements, but it seems to solve
>> > the
>> > issues that I am having by trying to assign some sort of dimensional
>> > hierarchy to data elements (similar to the exclusive/compulsory
>> > functionality of orgunits). Any thoughts on this?
>> >
>> > Best regards,
>> > Jason
>> >
>> >
>> >
>> >
>> > On Wed, Sep 16, 2009 at 10:28 PM, Jason Pickering >>> >> > <jason.p.pickering@gmail.com> wrote:
>> >>
>> >>
>> >> On Wed, Sep 16, 2009 at 10:13 PM, <johansa@ifi.uio.no> wrote:
>> >>>
>> >>> >> However, there does seem to be the ability to assign
>> >>> >> dimensions,
>> >>> there
>> >>> >> does
>> >>> >> not seem to be the ability to assign particular elements within
>> >>> those
>> >>> >> dimensions to a particular DHIS data element.
>> >>>
>> >>>
>> >>> Just some more clarification here: you can make category combos
>> >>> which
>> >>> you
>> >>> assign to data elements. However, it is not possible to assign
>> >>> just
>> >>> specific parts of a category combo (only some of the category
>> >>> options)
>> >>> to
>> >>> a data element.
>> >>
>> >> Yes, this was exactly what I wanted. Assigning different categories
>> >> would
>> >> seem to break the dimensionality.
>> >>
>> >>>
>> >>> Then you must make a specific category (as the only one in
>> >>> or part of a new category combo) with just those options. It can
>> >>> be
>> >>> hell;
>> >>> in Tajikistan there were way over 20 categories I think, at least
>> >>> 10
>> >>> just
>> >>> on various age groups.
>> >>>
>> >>> Johan
>> >>>
>> >>
>> >> This was my fear.
>> >>
>> >> I will need to do some testing and see. I still fear it is not
>> >> exactly
>> >> the
>> >> intended functionality.
>> >>
>> >> Basically, I think I need something akin to the
>> >> exclusive/compulsory
>> >> groups
>> >> that are in place for organizational units, but instead, for
>> >> arbitrary
>> >> dimensions. I will give a try and see what happens.
>> >>
>> >> Thanks,
>> >> Jason
>> >>
>> >>
>> >>
>> >>
>> >>
>> >
>>
>>
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~dhis2-devs
>> Post to : dhis2-devs@lists.launchpad.net
>> Unsubscribe : https://launchpad.net/~dhis2-devs
>> More help : https://help.launchpad.net/ListHelp
>
>

_______________________________________________
Mailing list: https://launchpad.net/~dhis2-devs
Post to : dhis2-devs@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-devs
More help : https://help.launchpad.net/ListHelp

_______________________________________________
Mailing list: https://launchpad.net/~dhis2-devs
Post to : dhis2-devs@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-devs
More help : https://help.launchpad.net/ListHelp

--
Cheers,
Knut Staring