On categories and dimensions and zooks

Johan_Ivar_Saebo · 1 October 2009 10:36

I don't agree with this, and I think the example I just made above
strengthens that. There are dimensions that are needed in data entry to be
able to break up a data element (age, gender, etc.), and there are other
dimensions that are broader groupings of data like type of diseases that
you
do not need to know about in order to register data about diseases.

Some semantic musings...

Adjectives related to the general description of a noun should be DE group.
Adjectives related to a specific event should be categoryoptions.

Some can be both; Sexually transmitted might be a description (attribute)
of the disease HIV, but Sexually transmitted might also be a description
of the special case of Freddie Mercury's tick in the tally sheet (in
contrast to blood transfusion).

So, in other words, there are dimensions to an indefinite noun (DE
groups), and dimensions to a definite noun (categoryoptions), that are
intrinsically different in use. DHIS should support both, in one way or
another.

Johan

jason · 1 October 2009 11:13

but in the end getting confused which one is the dataelement which one is
the dimension. Well the MD model can handle such a breakup I guess but the
point is not that.

Well me too, and that is why I started all of this. My data entry
screen (e.g. a paper form, collect
OPD 1st Attendance Clinical Case of Malaria Under 1

with different patient status (OPD, IP, Deaths), different classes of
diagnosis (Confirmed, Clinical) and different age groups.

However, in the end what the district health officers want to know is
as follows:

How many cases of OPD, IP and Deaths of Malaria have their been, with
possible slices of those dimensions (by age, comparing clinical and
confirmed etc). By defining appropriate categories, I can see now how
I can possible get this through a a DHIS report table, the datamart,
or some other custom SQL query.

But, the problem arises when there is some higher order
dimensionality, that has not been defined in the categories. Such as,

All cases of vector-borne diseases

This category was not part of my original categories and might include
(as I stated in my example) Malaria, Leishmaniasis, Dengue, etc
aggregated together. If I wanted to get this, I suppose I could create
a data element group for this. But what happens when I need more than
one data element group, such as Communicable diseases, which might
include vector borne, water borne, etc.

The point is, what users should do is I guess to first define what they need
from that functionality - what kind of data are they going to collect? what
does their dataentry screen look like?

No, I totally disagree. For me the data entry screen is only a
necessary artifact to answer the questions that need to. How many
cases of under 1 clinical malaria have we had? What is the case
fataility ratio for confirmed malaria cases? There are so many
possibilities.

But how different
is the analysis going to be from our input formats?

Very. I think I have given enough examples to support this.

Anyways for me a dimension is just an attribute to a dataelement. So before
talking about a dimension first we need to have a dataelement and
(logically) we can't mix the two!

I completely agree with you, and this is why I am not comfortable with
the current implementation. Dimensions are simply additional metadata
added to a given observation, or measure. If you take a look at the
OpenHealth data models, a measure has dimensions. DHIS appears to be
built around three compulsory dimensions, which all measures should
have: 1) What: Data element, which is essentially a description of
what the observation can be classified as 2) When: A point in time
(period) 3) Where: An organizational unit, or place. All other
dimensions are simply bling that need to be added for the purpose of
analysis. However the current model has to some degree mixed up data
entry with analysis, and thus the conundrum that I find myself in.

Regards,
Jason

Abyot_Gizaw · 1 October 2009 11:44

Yes I also agree on the difference between DataEntry screens and the analysis. What I wanted to say is … we have to be very careful in defining our input formats so that we don’t face a problem during analysis. We need to settle for some sort of atomicity in both the input form and analysis.

Our analysis shouldn’t go further down to a level where we haven’t captured (or have a break up) during data collection. But we can do analysis by aggregating/rearranging the atomic units.

I don’t think we can provide district health officers

"How many cases of OPD, IP and Deaths of Malaria have their been, with

possible slices of those dimensions (by age, comparing clinical and

confirmed etc)." while our dataentry screen have just a single entry field for “OPD 1st Attendance Clinical Case of Malaria Under 1”

Thank you
Abyot.

···

On Thu, Oct 1, 2009 at 1:13 PM, Jason Pickering jason.p.pickering@gmail.com wrote:

but in the end getting confused which one is the dataelement which one is

the dimension. Well the MD model can handle such a breakup I guess but the

point is not that.

Well me too, and that is why I started all of this. My data entry

screen (e.g. a paper form, collect
OPD 1st Attendance Clinical Case of Malaria Under 1

with different patient status (OPD, IP, Deaths), different classes of

diagnosis (Confirmed, Clinical) and different age groups.

However, in the end what the district health officers want to know is

as follows:

How many cases of OPD, IP and Deaths of Malaria have their been, with

possible slices of those dimensions (by age, comparing clinical and

confirmed etc). By defining appropriate categories, I can see now how

I can possible get this through a a DHIS report table, the datamart,

or some other custom SQL query.

But, the problem arises when there is some higher order

dimensionality, that has not been defined in the categories. Such as,

All cases of vector-borne diseases

This category was not part of my original categories and might include

(as I stated in my example) Malaria, Leishmaniasis, Dengue, etc

aggregated together. If I wanted to get this, I suppose I could create

a data element group for this. But what happens when I need more than

one data element group, such as Communicable diseases, which might

include vector borne, water borne, etc.

The point is, what users should do is I guess to first define what they need

from that functionality - what kind of data are they going to collect? what

does their dataentry screen look like?

No, I totally disagree. For me the data entry screen is only a

necessary artifact to answer the questions that need to. How many

cases of under 1 clinical malaria have we had? What is the case

fataility ratio for confirmed malaria cases? There are so many

possibilities.

But how different

is the analysis going to be from our input formats?

Very. I think I have given enough examples to support this.

Anyways for me a dimension is just an attribute to a dataelement. So before

talking about a dimension first we need to have a dataelement and

(logically) we can’t mix the two!

I completely agree with you, and this is why I am not comfortable with

the current implementation. Dimensions are simply additional metadata

added to a given observation, or measure. If you take a look at the

OpenHealth data models, a measure has dimensions. DHIS appears to be

built around three compulsory dimensions, which all measures should

have: 1) What: Data element, which is essentially a description of

what the observation can be classified as 2) When: A point in time

(period) 3) Where: An organizational unit, or place. All other

dimensions are simply bling that need to be added for the purpose of

analysis. However the current model has to some degree mixed up data

entry with analysis, and thus the conundrum that I find myself in.

Regards,

Jason

jason · 1 October 2009 11:56

I don't think we can provide district health officers

"How many cases of OPD, IP and Deaths of Malaria have their been, with
possible slices of those dimensions (by age, comparing clinical and
confirmed etc)." while our dataentry screen have just a single entry field
for "OPD 1st Attendance Clinical Case of Malaria Under 1"

Yes, we cannot, at least not reverting to dodgy SQL and naming
conventions to unravel the "dimensions" that are hiding inside of
names. Granted, the cateogry combos can help, but they do not go far
enough. But we do need to figure out how we can do this.

I am attaching a few files, My apologies for this, but they are quite
small. I have attached a sample data entry form, which is used at the
facility level. You can see, this mirrors many of my examples
throughout this thread.

I have also included in this file, a PivotTable, that I created to
answer the question above. It is not complete, but it does seperate
OPD. IP and deaths into seperate columns. I have scrambled the data
with random values, but have left values that were "blank" from the
data entry ,but showed up in the query.

Now, I should be able to do this table by using category combos I
guess pretty easy. What I cannot do is then to aggregate by dimensions
that are not present in the category combos themselves, at least, but
I guess this is the functionality that Ola describes.

Hope these two examples may help to make this a bit more concrete.

Regards,
Jason

zm_samples.7z (42 KB)

Johan_Ivar_Saebo · 1 October 2009 12:12

A general comment to this:

Are the scenarios provided in this thread really a requested
functionality? I can see that it's possible to come up with an infinite
number of possible ways to look at data, but to look at "all communicable
diseases", is something you don't do that often, I believe. Unless of
course, you want some useless data. The few times you want it, you can
combine the DE groups "vector-borne", "sexually transmitted" etc.. in
pivots or with SQL.

The design principle should be needs-focused, not possible-focused, though
I sense there are great discussions to be had regarding database design
and good coding etc. But be careful about creating a demand that does not
exist.

···

I don't think we can provide district health officers

"How many cases of OPD, IP and Deaths of Malaria have their been, with
possible slices of those dimensions (by age, comparing clinical and
confirmed etc)." while our dataentry screen have just a single entry
field
for "OPD 1st Attendance Clinical Case of Malaria Under 1"

Yes, we cannot, at least not reverting to dodgy SQL and naming
conventions to unravel the "dimensions" that are hiding inside of
names. Granted, the cateogry combos can help, but they do not go far
enough. But we do need to figure out how we can do this.

I am attaching a few files, My apologies for this, but they are quite
small. I have attached a sample data entry form, which is used at the
facility level. You can see, this mirrors many of my examples
throughout this thread.

I have also included in this file, a PivotTable, that I created to
answer the question above. It is not complete, but it does seperate
OPD. IP and deaths into seperate columns. I have scrambled the data
with random values, but have left values that were "blank" from the
data entry ,but showed up in the query.

Now, I should be able to do this table by using category combos I
guess pretty easy. What I cannot do is then to aggregate by dimensions
that are not present in the category combos themselves, at least, but
I guess this is the functionality that Ola describes.

Hope these two examples may help to make this a bit more concrete.

Regards,
Jason
_______________________________________________
Mailing list: https://launchpad.net/~dhis2-devs
Post to : dhis2-devs@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-devs
More help : https://help.launchpad.net/ListHelp

jason · 1 October 2009 12:21

Hi Johan,
They are definetely real, at least. It does not mean they are
elsewhere. But the examples I just provided have come from data review
and analysis workshops conducted in the field here, which are a
primary means of district health officers being able to conduct
planning and performance review. They need to know what the total
number of outpatient attendances are, regardless of disease, so they
can plan staffing levels. They need to know prevalence of vector borne
diseases to plan spraying campaigns and bed-net campaigns. I did not
come up with these examples trivially, but have heard this time and
time again from the DHIOs here.

But I do not agree with you about your last point. Looking at
prevalence of communicable versus non-communicable disease incidence
is important, at least this is what people at WHO have told me.
Perhaps it is not, but I see no reason to doubt. I agree, there are
infinite possibilities, and that is the point of a PivotTable. The
system should enable the analysis in such as way as makes sense to the
people using the data.

I am not trying to be academic at all , but am rather trying to
distill into code, functionality and specifications what the
requirements from this country are. Perhaps they are not applicable
elsewhere, in which case, we need to decide whether it is something
general that should be implemented in the main branch, or specific and
should be implemented here only.

Regards,
Jason

···

On Thu, Oct 1, 2009 at 2:12 PM, <johansa@ifi.uio.no> wrote:

A general comment to this:

Are the scenarios provided in this thread really a requested
functionality? I can see that it's possible to come up with an infinite
number of possible ways to look at data, but to look at "all communicable
diseases", is something you don't do that often, I believe. Unless of
course, you want some useless data. The few times you want it, you can
combine the DE groups "vector-borne", "sexually transmitted" etc.. in
pivots or with SQL.

The design principle should be needs-focused, not possible-focused, though
I sense there are great discussions to be had regarding database design
and good coding etc. But be careful about creating a demand that does not
exist.

I don't think we can provide district health officers

"How many cases of OPD, IP and Deaths of Malaria have their been, with
possible slices of those dimensions (by age, comparing clinical and
confirmed etc)." while our dataentry screen have just a single entry
field
for "OPD 1st Attendance Clinical Case of Malaria Under 1"

Yes, we cannot, at least not reverting to dodgy SQL and naming
conventions to unravel the "dimensions" that are hiding inside of
names. Granted, the cateogry combos can help, but they do not go far
enough. But we do need to figure out how we can do this.

I am attaching a few files, My apologies for this, but they are quite
small. I have attached a sample data entry form, which is used at the
facility level. You can see, this mirrors many of my examples
throughout this thread.

I have also included in this file, a PivotTable, that I created to
answer the question above. It is not complete, but it does seperate
OPD. IP and deaths into seperate columns. I have scrambled the data
with random values, but have left values that were "blank" from the
data entry ,but showed up in the query.

Now, I should be able to do this table by using category combos I
guess pretty easy. What I cannot do is then to aggregate by dimensions
that are not present in the category combos themselves, at least, but
I guess this is the functionality that Ola describes.

Hope these two examples may help to make this a bit more concrete.

Regards,
Jason
_______________________________________________
Mailing list: https://launchpad.net/~dhis2-devs
Post to : dhis2-devs@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-devs
More help : https://help.launchpad.net/ListHelp

Abyot_Gizaw · 1 October 2009 12:38

Now, I should be able to do this table by using category combos I

guess pretty easy. What I cannot do is then to aggregate by dimensions

that are not present in the category combos themselves, at least, but

I guess this is the functionality that Ola describes.

Can you explain me this further? Because if I got you correct - you are saying you couldn’t do aggregation by dimesnsions which are not included in the category combos. Then for me… you are trying to do something else which is not envisioned during data collection. If a dimesion can not be constructed from the category combos (and corresponding options) then this dimesion shouldn’t be asked during analysis.

I am only convinced of one thing from this whole discussion - that we need to extend the API so that we can do aggregation of different combinations for the available (collected) atomic dimesion elements. As far as the MD model … I can take your comments for renaming some of the objects - but I don’t see a limitation in the model. But again everything is open here - and anyone can take it further. I am just sharing my views as I was the central person in the MD issue.

Thank you
Abyot.

Johan_Ivar_Saebo · 1 October 2009 12:44

Don't trust people from WHO You might even encounter a militant
gender-health specialist, who will say that ALL diseases must be split
into genders, and this ratio is the centre of the universe (I'm not
kidding, I met one in Botswana). That is not to say that your examples are
not valid, they do make sense. But this is perfectly possible with the
current design, as far as my experience is. My point is that I can easily
come up with some examples that would require a different data model, but
that would not make sense.

Vector-borne diseases can be a DE group. If data elements have a category
inpatient/outpatient, you can easily get only outpatient data from the
pivot table (or, as many other countries: have two headcount elements for
each form: Total inpatients, Total outpatients). Communicable and
non-communicable diseases can also be an DE group, and if you don't want
DEs to belong to two groups, you can lump them together with a query or in
the pivot table. I think some countries, and some specialists (which are
often too interested in details in their field) have some requirements
which are not "one-click-at-a-button"-ready, but that this can always be
solved with categories and DE groups in the pivot tables. My input to
this, in sum: I've not yet encountered any requirements (with DHIS2: in
Tajikistan and Sierra Leone) that could not be solved with
categoryoptions, categories, categorycombos, DE groups, and Org.Unit
groups (rural/urban, private/public, type of clinic), and combinations of
data (indicators, or Diarrhoea and Safe Water crossexamined, for example).
Johan

···

Hi Johan,
They are definetely real, at least. It does not mean they are
elsewhere. But the examples I just provided have come from data review
and analysis workshops conducted in the field here, which are a
primary means of district health officers being able to conduct
planning and performance review. They need to know what the total
number of outpatient attendances are, regardless of disease, so they
can plan staffing levels. They need to know prevalence of vector borne
diseases to plan spraying campaigns and bed-net campaigns. I did not
come up with these examples trivially, but have heard this time and
time again from the DHIOs here.

But I do not agree with you about your last point. Looking at
prevalence of communicable versus non-communicable disease incidence
is important, at least this is what people at WHO have told me.
Perhaps it is not, but I see no reason to doubt. I agree, there are
infinite possibilities, and that is the point of a PivotTable. The
system should enable the analysis in such as way as makes sense to the
people using the data.

I am not trying to be academic at all , but am rather trying to
distill into code, functionality and specifications what the
requirements from this country are. Perhaps they are not applicable
elsewhere, in which case, we need to decide whether it is something
general that should be implemented in the main branch, or specific and
should be implemented here only.

Regards,
Jason

On Thu, Oct 1, 2009 at 2:12 PM, <johansa@ifi.uio.no> wrote:

A general comment to this:

Are the scenarios provided in this thread really a requested
functionality? I can see that it's possible to come up with an infinite
number of possible ways to look at data, but to look at "all
communicable
diseases", is something you don't do that often, I believe. Unless of
course, you want some useless data. The few times you want it, you can
combine the DE groups "vector-borne", "sexually transmitted" etc.. in
pivots or with SQL.

The design principle should be needs-focused, not possible-focused,
though
I sense there are great discussions to be had regarding database design
and good coding etc. But be careful about creating a demand that does
not
exist.

I don't think we can provide district health officers

"How many cases of OPD, IP and Deaths of Malaria have their been, with
possible slices of those dimensions (by age, comparing clinical and
confirmed etc)." while our dataentry screen have just a single entry
field
for "OPD 1st Attendance Clinical Case of Malaria Under 1"

Yes, we cannot, at least not reverting to dodgy SQL and naming
conventions to unravel the "dimensions" that are hiding inside of
names. Granted, the cateogry combos can help, but they do not go far
enough. But we do need to figure out how we can do this.

I am attaching a few files, My apologies for this, but they are quite
small. I have attached a sample data entry form, which is used at the
facility level. You can see, this mirrors many of my examples
throughout this thread.

I have also included in this file, a PivotTable, that I created to
answer the question above. It is not complete, but it does seperate
OPD. IP and deaths into seperate columns. I have scrambled the data
with random values, but have left values that were "blank" from the
data entry ,but showed up in the query.

Now, I should be able to do this table by using category combos I
guess pretty easy. What I cannot do is then to aggregate by dimensions
that are not present in the category combos themselves, at least, but
I guess this is the functionality that Ola describes.

Hope these two examples may help to make this a bit more concrete.

Regards,
Jason
_______________________________________________
Mailing list: https://launchpad.net/~dhis2-devs
Post to � � : dhis2-devs@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-devs
More help � : https://help.launchpad.net/ListHelp

olatitle · 1 October 2009 12:53

Agree that we need to keep it simple and focus on the 80% of the use cases.

I would still say that having data element group sets would simplify a lot when setting up pivot tables.

The restriction on only one group per data element does not makes sense. I remember it was allowed to assign multiple groups in the early days of 1.4, and that was based on real needs. What happened is that there was a design flaw that lead to duplication in the pivot tables due to multiple rows per data element,orgunit, period (one per group). For orgunit groups this is taken care of by having group sets such as OrgUnitType, RuralUrban etc. and that is what we need for data elements and indicators as well.

And, I would also add that simplifying the multidimensional data element approach from a user perspective would really improve the usability of DHIS 2. That goes for both creating dimensions and assigning them to data elements, and to be able to use these dimensions in reports. Being able to tweak the system to get want you want is not good enough in the long run, these things should we delivered to the users out of the box.

Ola

···

2009/10/1 johansa@ifi.uio.no

Don’t trust people from WHO You might even encounter a militant

gender-health specialist, who will say that ALL diseases must be split

into genders, and this ratio is the centre of the universe (I’m not

kidding, I met one in Botswana). That is not to say that your examples are

not valid, they do make sense. But this is perfectly possible with the

current design, as far as my experience is. My point is that I can easily

come up with some examples that would require a different data model, but

that would not make sense.

Vector-borne diseases can be a DE group. If data elements have a category

inpatient/outpatient, you can easily get only outpatient data from the

pivot table (or, as many other countries: have two headcount elements for

each form: Total inpatients, Total outpatients). Communicable and

non-communicable diseases can also be an DE group, and if you don’t want

DEs to belong to two groups, you can lump them together with a query or in

the pivot table. I think some countries, and some specialists (which are

often too interested in details in their field) have some requirements

which are not “one-click-at-a-button”-ready, but that this can always be

solved with categories and DE groups in the pivot tables. My input to

this, in sum: I’ve not yet encountered any requirements (with DHIS2: in

Tajikistan and Sierra Leone) that could not be solved with

categoryoptions, categories, categorycombos, DE groups, and Org.Unit

groups (rural/urban, private/public, type of clinic), and combinations of

data (indicators, or Diarrhoea and Safe Water crossexamined, for example).

Johan

Hi Johan,

They are definetely real, at least. It does not mean they are

elsewhere. But the examples I just provided have come from data review

and analysis workshops conducted in the field here, which are a

primary means of district health officers being able to conduct

planning and performance review. They need to know what the total

number of outpatient attendances are, regardless of disease, so they

can plan staffing levels. They need to know prevalence of vector borne

diseases to plan spraying campaigns and bed-net campaigns. I did not

come up with these examples trivially, but have heard this time and

time again from the DHIOs here.

But I do not agree with you about your last point. Looking at

prevalence of communicable versus non-communicable disease incidence

is important, at least this is what people at WHO have told me.

Perhaps it is not, but I see no reason to doubt. I agree, there are

infinite possibilities, and that is the point of a PivotTable. The

system should enable the analysis in such as way as makes sense to the

people using the data.

I am not trying to be academic at all , but am rather trying to

distill into code, functionality and specifications what the

requirements from this country are. Perhaps they are not applicable

elsewhere, in which case, we need to decide whether it is something

general that should be implemented in the main branch, or specific and

should be implemented here only.

Regards,

Jason

On Thu, Oct 1, 2009 at 2:12 PM, johansa@ifi.uio.no wrote:

A general comment to this:

Are the scenarios provided in this thread really a requested

functionality? I can see that it’s possible to come up with an infinite

number of possible ways to look at data, but to look at "all

communicable

diseases", is something you don’t do that often, I believe. Unless of

course, you want some useless data. The few times you want it, you can

combine the DE groups “vector-borne”, “sexually transmitted” etc… in

pivots or with SQL.

The design principle should be needs-focused, not possible-focused,

though

I sense there are great discussions to be had regarding database design

and good coding etc. But be careful about creating a demand that does

not

exist.

I don’t think we can provide district health officers

"How many cases of OPD, IP and Deaths of Malaria have their been, with

possible slices of those dimensions (by age, comparing clinical and

confirmed etc)." while our dataentry screen have just a single entry

field

for “OPD 1st Attendance Clinical Case of Malaria Under 1”

Yes, we cannot, at least not reverting to dodgy SQL and naming

conventions to unravel the “dimensions” that are hiding inside of

names. Granted, the cateogry combos can help, but they do not go far

enough. But we do need to figure out how we can do this.

I am attaching a few files, My apologies for this, but they are quite

small. I have attached a sample data entry form, which is used at the

facility level. You can see, this mirrors many of my examples

throughout this thread.

I have also included in this file, a PivotTable, that I created to

answer the question above. It is not complete, but it does seperate

OPD. IP and deaths into seperate columns. I have scrambled the data

with random values, but have left values that were “blank” from the

data entry ,but showed up in the query.

Now, I should be able to do this table by using category combos I

guess pretty easy. What I cannot do is then to aggregate by dimensions

that are not present in the category combos themselves, at least, but

I guess this is the functionality that Ola describes.

Hope these two examples may help to make this a bit more concrete.

Regards,

Jason

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

jason · 1 October 2009 13:07

Abyot..

Can you explain me this further? Because if I got you correct - you are
saying you couldn't do aggregation by dimesnsions which are not included in
the category combos.

I think it is clear that I can create data elements for the data entry
sheet I sent the exisitng cateogry combo functionality. This seems
straight forward, except we are using 1.4 here as the data collection
tool, and 2 analysis and presentation purposes. That is another
complication, but anyway, lets assume it is possible, and it certainly
seems to be the case.

Then for me... you are trying to do something else
which is not envisioned during data collection. If a dimesion can not be
constructed from the category combos (and corresponding options) then this
dimesion shouldn't be asked during analysis.

I must completely disagree with you. As I stated previously, a DHIO
asked me two weeks ago, "How do I calculate the total incidence of
vector borne diseases?" Well, I said, You can create an indicator for
this, and add every vector borne disease to it and divide it my the
population. Seems reasonable. A bit of work, but it seems possible.
However, it gets stickier when you think about beginning to slice and
dice the data. Would an indicator need to be created for every
possible combination of diseases, age groups that we could possibly
dream up? If so, then we can completely do away with the PivotTables,
as we can just define an indicator for every possible combination that
we can dream up! This does not seem practical.

Another solution, as Johan just points out, is that we can use
different data element groups for this purpose. This seems possible as
well I guess. DHIS 1.4 and 2 will allow you to do this, but it does
result in a data integrity violation as entries in the resulting
PivotTables will be duplicated. Sure we can lump them together, but
which table tells me that two data element groups should be lumped
together? The problem is the lack of data element group sets (like
organizational unit group sets). If there was, then I think everything
would be fine (at least I think).

I am only convinced of one thing from this whole discussion - that we need
to extend the API so that we can do aggregation of different combinations
for the available (collected) atomic dimesion elements. As far as the MD
model ..... I can take your comments for renaming some of the objects - but
I don't see a limitation in the model. But again everything is open here -
and anyone can take it further. I am just sharing my views as I was the
central person in the MD issue.

Well, my problem is that in my case, the data elements have not been
created as multidimensional from the beginning, thus the need to group
them into dimensions afterwards. I agree, as the categories have been
implemented now, it seems totally possible to create a data entry form
for the Excel sheet I have sent you. No problems there, it is just in
the analaysis that we are missing something.

In reply to Johan,

Vector-borne diseases can be a DE group.

But why can't it be a category as well? This is a matter that is still
not clear. Where should the categories start and data element group
sets take over? Anyway, this is more of an academic question, and not
entirely relevant. it would seem that creating categories-category
combos should be used for data entry, and that data element group sets
should be used for anaysis.

If data elements have a category
inpatient/outpatient, you can easily get only outpatient data from the
pivot table (or, as many other countries: have two headcount elements for
each form: Total inpatients, Total outpatients). Communicable and
non-communicable diseases can also be an DE group, and if you don't want
DEs to belong to two groups, you can lump them together with a query or in
he pivot table.

How can I lump them together in a PivotTable if I cannot determine
that they are in a group set together? I can hardwire it in the query
with regular expressions to pull out all instances of OPD, IP and
Deaths in data elements. I can create a supporting table and populate
them with data elements, but how will it be maintained and accessible
through the UI? I have done both of these things, but I do not think
it is part of the current functionality.

Anyway, to summarize this, as it seems we may be nearing some consensus.

1) Current functionality for category-category combos is sufficient
for the purposes of multidimensional data entry, and provides enough
functionality for analysis of data based on the dimensions defined
during the data element creation.
2) Current functionality for grouping of data elements by dimensions
for analysis is insufficient and needs to be extended to include the
concept of data element group sets.

Capiche?

Regards,
Jason

Best regards,
Jason

bobj · 1 October 2009 13:41

Jason,

Abyot…

Can you explain me this further? Because if I got you correct - you are

saying you couldn’t do aggregation by dimesnsions which are not included in

the category combos.

I think it is clear that I can create data elements for the data entry

sheet I sent the exisitng cateogry combo functionality. This seems

straight forward, except we are using 1.4 here as the data collection

tool, and 2 analysis and presentation purposes. That is another

complication, but anyway, lets assume it is possible, and it certainly

seems to be the case.

Then for me… you are trying to do something else

which is not envisioned during data collection. If a dimesion can not be

constructed from the category combos (and corresponding options) then this

dimesion shouldn’t be asked during analysis.

I must completely disagree with you. As I stated previously, a DHIO

asked me two weeks ago, "How do I calculate the total incidence of

vector borne diseases?" Well, I said, You can create an indicator for

this, and add every vector borne disease to it and divide it my the

population. Seems reasonable. A bit of work, but it seems possible.

However, it gets stickier when you think about beginning to slice and

dice the data. Would an indicator need to be created for every

possible combination of diseases, age groups that we could possibly

dream up? If so, then we can completely do away with the PivotTables,

as we can just define an indicator for every possible combination that

we can dream up! This does not seem practical.

Another solution, as Johan just points out, is that we can use

different data element groups for this purpose. This seems possible as

well I guess. DHIS 1.4 and 2 will allow you to do this, but it does

result in a data integrity violation as entries in the resulting

PivotTables will be duplicated. Sure we can lump them together, but

which table tells me that two data element groups should be lumped

together? The problem is the lack of data element group sets (like

organizational unit group sets). If there was, then I think everything

would be fine (at least I think).

I am only convinced of one thing from this whole discussion - that we need

to extend the API so that we can do aggregation of different combinations

for the available (collected) atomic dimesion elements. As far as the MD

model … I can take your comments for renaming some of the objects - but

I don’t see a limitation in the model. But again everything is open here -

and anyone can take it further. I am just sharing my views as I was the

central person in the MD issue.

Well, my problem is that in my case, the data elements have not been

created as multidimensional from the beginning, thus the need to group

them into dimensions afterwards. I agree, as the categories have been

implemented now, it seems totally possible to create a data entry form

for the Excel sheet I have sent you. No problems there, it is just in

the analaysis that we are missing something.

In reply to Johan,

Vector-borne diseases can be a DE group.

But why can’t it be a category as well? This is a matter that is still

not clear. Where should the categories start and data element group

sets take over? Anyway, this is more of an academic question, and not

entirely relevant. it would seem that creating categories-category

combos should be used for data entry, and that data element group sets

should be used for anaysis.

If data elements have a category

inpatient/outpatient, you can easily get only outpatient data from the

pivot table (or, as many other countries: have two headcount elements for

each form: Total inpatients, Total outpatients). Communicable and

non-communicable diseases can also be an DE group, and if you don’t want

DEs to belong to two groups, you can lump them together with a query or in

he pivot table.

How can I lump them together in a PivotTable if I cannot determine

that they are in a group set together? I can hardwire it in the query

with regular expressions to pull out all instances of OPD, IP and

Deaths in data elements. I can create a supporting table and populate

them with data elements, but how will it be maintained and accessible

through the UI? I have done both of these things, but I do not think

it is part of the current functionality.

Anyway, to summarize this, as it seems we may be nearing some consensus.

Current functionality for category-category combos is sufficient

for the purposes of multidimensional data entry, and provides enough

functionality for analysis of data based on the dimensions defined

during the data element creation.

What it seems is missing is the ability to isolate a category in, for example, a report table. So if there is an Age-Sex categorycombo on Malaria, you cannot isolate Malaria by Sex without also bringing Age. This may not be a problem. If it is we can probably solve it through API

Current functionality for grouping of data elements by dimensions

for analysis is insufficient and needs to be extended to include the

concept of data element group sets.

Funny. We seem to have the opposite problem here with the above. If I understand you right you are saying effectively we cannot create multiple dimensions by composition because data elements can only be members of one group.

And presumably this also applies to indicators.

Regards
Bob

···

2009/10/1 Jason Pickering jason.p.pickering@gmail.com

Capiche?

Regards,

Jason

Best regards,

Jason

olatitle · 1 October 2009 14:20

Jason,

Abyot…

Can you explain me this further? Because if I got you correct - you are

saying you couldn’t do aggregation by dimesnsions which are not included in

the category combos.

I think it is clear that I can create data elements for the data entry

sheet I sent the exisitng cateogry combo functionality. This seems

straight forward, except we are using 1.4 here as the data collection

tool, and 2 analysis and presentation purposes. That is another

complication, but anyway, lets assume it is possible, and it certainly

seems to be the case.

Then for me… you are trying to do something else

which is not envisioned during data collection. If a dimesion can not be

constructed from the category combos (and corresponding options) then this

dimesion shouldn’t be asked during analysis.

I must completely disagree with you. As I stated previously, a DHIO

asked me two weeks ago, "How do I calculate the total incidence of

vector borne diseases?" Well, I said, You can create an indicator for

this, and add every vector borne disease to it and divide it my the

population. Seems reasonable. A bit of work, but it seems possible.

However, it gets stickier when you think about beginning to slice and

dice the data. Would an indicator need to be created for every

possible combination of diseases, age groups that we could possibly

dream up? If so, then we can completely do away with the PivotTables,

as we can just define an indicator for every possible combination that

we can dream up! This does not seem practical.

Another solution, as Johan just points out, is that we can use

different data element groups for this purpose. This seems possible as

well I guess. DHIS 1.4 and 2 will allow you to do this, but it does

result in a data integrity violation as entries in the resulting

PivotTables will be duplicated. Sure we can lump them together, but

which table tells me that two data element groups should be lumped

together? The problem is the lack of data element group sets (like

organizational unit group sets). If there was, then I think everything

would be fine (at least I think).

I am only convinced of one thing from this whole discussion - that we need

to extend the API so that we can do aggregation of different combinations

for the available (collected) atomic dimesion elements. As far as the MD

model … I can take your comments for renaming some of the objects - but

I don’t see a limitation in the model. But again everything is open here -

and anyone can take it further. I am just sharing my views as I was the

central person in the MD issue.

Well, my problem is that in my case, the data elements have not been

created as multidimensional from the beginning, thus the need to group

them into dimensions afterwards. I agree, as the categories have been

implemented now, it seems totally possible to create a data entry form

for the Excel sheet I have sent you. No problems there, it is just in

the analaysis that we are missing something.

In reply to Johan,

Vector-borne diseases can be a DE group.

But why can’t it be a category as well? This is a matter that is still

not clear. Where should the categories start and data element group

sets take over? Anyway, this is more of an academic question, and not

entirely relevant. it would seem that creating categories-category

combos should be used for data entry, and that data element group sets

should be used for anaysis.

If data elements have a category

inpatient/outpatient, you can easily get only outpatient data from the

pivot table (or, as many other countries: have two headcount elements for

each form: Total inpatients, Total outpatients). Communicable and

non-communicable diseases can also be an DE group, and if you don’t want

DEs to belong to two groups, you can lump them together with a query or in

he pivot table.

How can I lump them together in a PivotTable if I cannot determine

that they are in a group set together? I can hardwire it in the query

with regular expressions to pull out all instances of OPD, IP and

Deaths in data elements. I can create a supporting table and populate

them with data elements, but how will it be maintained and accessible

through the UI? I have done both of these things, but I do not think

it is part of the current functionality.

Anyway, to summarize this, as it seems we may be nearing some consensus.

Current functionality for category-category combos is sufficient

for the purposes of multidimensional data entry, and provides enough

functionality for analysis of data based on the dimensions defined

during the data element creation.

What it seems is missing is the ability to isolate a category in, for example, a report table. So if there is an Age-Sex categorycombo on Malaria, you cannot isolate Malaria by Sex without also bringing Age. This may not be a problem. If it is we can probably solve it through API

Current functionality for grouping of data elements by dimensions

for analysis is insufficient and needs to be extended to include the

concept of data element group sets.

Funny. We seem to have the opposite problem here with the above. If I understand you right you are saying effectively we cannot create multiple dimensions by composition because data elements can only be members of one group.

And presumably this also applies to indicators.

Yes. That is why I so eagerly pushed data element/indicator group sets yesterday. We need it for exactly this purpose.

···

2009/10/1 Bob Jolliffe bobjolliffe@gmail.com

2009/10/1 Jason Pickering jason.p.pickering@gmail.com

Regards
Bob

Capiche?

Regards,

Jason

Best regards,

Jason

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Johan_Ivar_Saebo · 1 October 2009 14:38

In reply to Johan,

Vector-borne diseases can be a DE group.

But why can't it be a category as well? This is a matter that is still
not clear. Where should the categories start and data element group
sets take over? Anyway, this is more of an academic question, and not
entirely relevant. it would seem that creating categories-category
combos should be used for data entry, and that data element group sets
should be used for anaysis.

It can, if you would collect data on

Malaria, vector borne
Malaria, non-vector borne

But you would not do that. But malaria could belong a DE group ("vector
borne"), and we could have DE group sets ("transmission"), when the DE
group set functionality is in place.

The categories are for creating sub-elements of data elements. That is,
let us enter data for dimensions of data elements. In this example, we
don't want to enter non-vector borne malaria, and thus the DE Malaria
could belong to the non-vector borne DE group, but not have this as
categoryoption/dimension. (Of course, given that I'm not mixing vector and
non-vector borne now, but the principle is the same)

I referred earlier to dimensions of an event, and dimensions of a data
element (definition). Dimensions of an event are categoryoptions.
Dimensions of the data definition are DE groups.

1) Current functionality for category-category combos is sufficient
for the purposes of multidimensional data entry, and provides enough
functionality for analysis of data based on the dimensions defined
during the data element creation.
2) Current functionality for grouping of data elements by dimensions
for analysis is insufficient and needs to be extended to include the
concept of data element group sets.

Capiche?

Io capisco tutti. We need the DE group sets, just like for orgunits. The
rest should cover all the examples I've seen so far

jason · 1 October 2009 15:08

Here is my suggestion, based on all the mails above, so that we can
develop all of this into a blueprint/specification.

1) Categories and category options should be left essentially as is.
2) Data element groups sets should allow data elements to be assigned
to multiple data element groups. Data elements (both multi-dimensional
and non-multidimensional) should be able to be assigned a particular
(only one) data element groups.
3) A data element group set would be something like "Age", "Sex",
"Disease transmission method". Each data element group set would would
contain multiple data element groups, such as "Under 1, Under 5, Over
5", "Male, Female", "Vector borne, water borne".
4) Data element groups sets should be exclusive. Data element groups
can only belong to one data element group set.
3) Multidimensional elements will automatically be imparted an
intrinsic level of dimensionality via the Categories, ipso facto.
Category options would essentially be transformed to a data element
group at the stage of analysis.
4) Non-multidimensional elements should be able to be assigned a
particular member of a CateogryOption, post facto, so that
multi-dimensional and non-multidimensional elements would be able to
coexist peacefully.
5) A particular measure's full dimensionality would result from the
combination of prime dimensions:
   a) OrgUnitGroupSets (already implemeneted). Each OrgUnitGroupSet
would impart a column in a cross-tab query for each OrgUnitGroupSet,
providing classification by Type, Urban Rural, Ownership as examples.
   b) Period. There is no currently (as far as I know)
PeriodGroupSets, but usually, grouping can take place through the data
type itself, so that values can be grouped into quaters, bi-monthly
groups, etc. Period would impart a two columns in a cross-tab query
(StartDate and EndDate).
   c) The resulting number of dimensions resulting from the data
elements assignment of the data element to "Categories" and
"DataElementGroupSets".Each Category and each DateElementGroup
assigned to the data element would impart a single column in a
crosstab query.
d) Other dimensions that are already implemeneted, such as Dataset.

Do these points capture the requirements, based on everything we have
discussed in this monster thread?

Plese feel free to comment, especially the developers. Lars has been
quiet and said it is possible, so I suppose he is just coding already.

Johan_Ivar_Saebo · 1 October 2009 15:16

Here is my suggestion, based on all the mails above, so that we can
develop all of this into a blueprint/specification.

1) Categories and category options should be left essentially as is.
2) Data element groups sets should allow data elements to be assigned
to multiple data element groups. Data elements (both multi-dimensional
and non-multidimensional) should be able to be assigned a particular
(only one) data element groups.
3) A data element group set would be something like "Age", "Sex",
"Disease transmission method". Each data element group set would would
contain multiple data element groups, such as "Under 1, Under 5, Over
5", "Male, Female", "Vector borne, water borne".

No. "Malaria" cannot be male or female. Diarrhoea cannot be 15 years old.
But a person with malaria can be male or female, and a 15 year old person
can have diarrhoea.

The data element groups and group sets are metadata. The categories are
event-specific data.

Johan

···

4) Data element groups sets should be exclusive. Data element groups
can only belong to one data element group set.
3) Multidimensional elements will automatically be imparted an
intrinsic level of dimensionality via the Categories, ipso facto.
Category options would essentially be transformed to a data element
group at the stage of analysis.
4) Non-multidimensional elements should be able to be assigned a
particular member of a CateogryOption, post facto, so that
multi-dimensional and non-multidimensional elements would be able to
coexist peacefully.
5) A particular measure's full dimensionality would result from the
combination of prime dimensions:
   a) OrgUnitGroupSets (already implemeneted). Each OrgUnitGroupSet
would impart a column in a cross-tab query for each OrgUnitGroupSet,
providing classification by Type, Urban Rural, Ownership as examples.
   b) Period. There is no currently (as far as I know)
PeriodGroupSets, but usually, grouping can take place through the data
type itself, so that values can be grouped into quaters, bi-monthly
groups, etc. Period would impart a two columns in a cross-tab query
(StartDate and EndDate).
   c) The resulting number of dimensions resulting from the data
elements assignment of the data element to "Categories" and
"DataElementGroupSets".Each Category and each DateElementGroup
assigned to the data element would impart a single column in a
crosstab query.
d) Other dimensions that are already implemeneted, such as Dataset.

Do these points capture the requirements, based on everything we have
discussed in this monster thread?

Plese feel free to comment, especially the developers. Lars has been
quiet and said it is possible, so I suppose he is just coding already.

_______________________________________________
Mailing list: https://launchpad.net/~dhis2-devs
Post to : dhis2-devs@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-devs
More help : https://help.launchpad.net/ListHelp

Knut_Staring · 1 October 2009 15:18

Here is my suggestion, based on all the mails above, so that we can

develop all of this into a blueprint/specification.

Categories and category options should be left essentially as is.

Data element groups sets should allow data elements to be assigned

to multiple data element groups. Data elements (both multi-dimensional

and non-multidimensional) should be able to be assigned a particular

(only one) data element groups.

A data element group set would be something like “Age”, “Sex”,

“Disease transmission method”. Each data element group set would would

contain multiple data element groups, such as "Under 1, Under 5, Over

5", “Male, Female”, “Vector borne, water borne”.

No. “Malaria” cannot be male or female. Diarrhoea cannot be 15 years old.

But a person with malaria can be male or female, and a 15 year old person

can have diarrhoea.

The data element groups and group sets are metadata. The categories are

event-specific data.

Not sure what you are trying to say here. We are of course always talking about number of cases, even though we do not specify “Cases of Malaria”?

···

On Thu, Oct 1, 2009 at 5:16 PM, johansa@ifi.uio.no wrote:

Johan

Data element groups sets should be exclusive. Data element groups

can only belong to one data element group set.

Multidimensional elements will automatically be imparted an

intrinsic level of dimensionality via the Categories, ipso facto.

Category options would essentially be transformed to a data element

group at the stage of analysis.

Non-multidimensional elements should be able to be assigned a

particular member of a CateogryOption, post facto, so that

multi-dimensional and non-multidimensional elements would be able to

coexist peacefully.

A particular measure’s full dimensionality would result from the

combination of prime dimensions:

a) OrgUnitGroupSets (already implemeneted). Each OrgUnitGroupSet

would impart a column in a cross-tab query for each OrgUnitGroupSet,

providing classification by Type, Urban Rural, Ownership as examples.

b) Period. There is no currently (as far as I know)

PeriodGroupSets, but usually, grouping can take place through the data

type itself, so that values can be grouped into quaters, bi-monthly

groups, etc. Period would impart a two columns in a cross-tab query

(StartDate and EndDate).

c) The resulting number of dimensions resulting from the data

elements assignment of the data element to “Categories” and

“DataElementGroupSets”.Each Category and each DateElementGroup

assigned to the data element would impart a single column in a

crosstab query.

d) Other dimensions that are already implemeneted, such as Dataset.

Do these points capture the requirements, based on everything we have

discussed in this monster thread?

Plese feel free to comment, especially the developers. Lars has been

quiet and said it is possible, so I suppose he is just coding already.

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

–
Cheers,
Knut Staring

olatitle · 1 October 2009 15:26

In an ideal world we should be consistent on how we use categories and groups, I agree. And I tried to put forward some best practices using Jason’s example earlier today. But there is nothing wrong with adding group sets like Age or Gender, I mean it doesn’t break anything right? In the pivot table it will just be another column describing the data, and you can choose whether to use it or not, and whether to create it in the first place or not. In Jason’s case and in any other country where 1.4 is the data collection tool at the lower levels, the categories cannot be used (and all data elements are flat) so all dimensionality will have to come from these group sets and groups. That is why they have to break with the DHIS 2 best practise.

Ola

···

2009/10/1 Knut Staring knutst@gmail.com

On Thu, Oct 1, 2009 at 5:16 PM, johansa@ifi.uio.no wrote:

Here is my suggestion, based on all the mails above, so that we can

develop all of this into a blueprint/specification.

Categories and category options should be left essentially as is.

Data element groups sets should allow data elements to be assigned

to multiple data element groups. Data elements (both multi-dimensional

and non-multidimensional) should be able to be assigned a particular

(only one) data element groups.

A data element group set would be something like “Age”, “Sex”,

“Disease transmission method”. Each data element group set would would

contain multiple data element groups, such as "Under 1, Under 5, Over

5", “Male, Female”, “Vector borne, water borne”.

No. “Malaria” cannot be male or female. Diarrhoea cannot be 15 years old.

But a person with malaria can be male or female, and a 15 year old person

can have diarrhoea.

The data element groups and group sets are metadata. The categories are

event-specific data.

Not sure what you are trying to say here. We are of course always talking about number of cases, even though we do not specify “Cases of Malaria”?

Johan

Data element groups sets should be exclusive. Data element groups

can only belong to one data element group set.

Multidimensional elements will automatically be imparted an

intrinsic level of dimensionality via the Categories, ipso facto.

Category options would essentially be transformed to a data element

group at the stage of analysis.

Non-multidimensional elements should be able to be assigned a

particular member of a CateogryOption, post facto, so that

multi-dimensional and non-multidimensional elements would be able to

coexist peacefully.

A particular measure’s full dimensionality would result from the

combination of prime dimensions:

a) OrgUnitGroupSets (already implemeneted). Each OrgUnitGroupSet

would impart a column in a cross-tab query for each OrgUnitGroupSet,

providing classification by Type, Urban Rural, Ownership as examples.

b) Period. There is no currently (as far as I know)

PeriodGroupSets, but usually, grouping can take place through the data

type itself, so that values can be grouped into quaters, bi-monthly

groups, etc. Period would impart a two columns in a cross-tab query

(StartDate and EndDate).

c) The resulting number of dimensions resulting from the data

elements assignment of the data element to “Categories” and

“DataElementGroupSets”.Each Category and each DateElementGroup

assigned to the data element would impart a single column in a

crosstab query.

d) Other dimensions that are already implemeneted, such as Dataset.

Do these points capture the requirements, based on everything we have

discussed in this monster thread?

Plese feel free to comment, especially the developers. Lars has been

quiet and said it is possible, so I suppose he is just coding already.

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

–
Cheers,
Knut Staring

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Johan_Ivar_Saebo · 1 October 2009 15:46

Dear all,
some clarifications and suggestions:

> Here is my suggestion, based on all the mails above, so that we can
> develop all of this into a blueprint/specification.
>
> 1) Categories and category options should be left essentially as is.

Agree, unless there is a better way of coding or naming dimensionality of
events.

> 2) Data element groups sets should allow data elements to be assigned
> to multiple data element groups. Data elements (both multi-dimensional
> and non-multidimensional) should be able to be assigned a particular
> (only one) data element groups.

Yes, agree. This is how it is for orgunits. It can belong to the group
"public facility" or "private facility" in the group set "ownership", for
example.

> 3) A data element group set would be something like "Age", "Sex",
> "Disease transmission method". Each data element group set would would
> contain multiple data element groups, such as "Under 1, Under 5, Over
> 5", "Male, Female", "Vector borne, water borne".

No. "Malaria" cannot be male or female. Diarrhoea cannot be 15 years
old.
But a person with malaria can be male or female, and a 15 year old
person
can have diarrhoea.

The data element groups and group sets are metadata. The categories are
event-specific data.

Not sure what you are trying to say here. We are of course always talking
about number of cases, even though we do not specify "Cases of Malaria"?

We are not talking about cases when we talk about data element groups. We
are talking about metadata, that apply to ALL uses of that data element.
So we can have 5000 cases of malaria, from all kinds of ages and genders
(all of them!), but they would all share the metadata of Malaria = vector
borne, which has nothing to do with the individual cases.

So DE groups are metadata. I have no idea if there is anything wrong with
using the same code and name for both metadata and event-data, but for me
they are different. If you have age as DE group set, you cannot enter
different ages for that data element. You will have to make another data
element, assigned to another group.

> 4) Data element groups sets should be exclusive. Data element groups
> can only belong to one data element group set.

Agree

> 3) Multidimensional elements will automatically be imparted an
> intrinsic level of dimensionality via the Categories, ipso facto.
> Category options would essentially be transformed to a data element
> group at the stage of analysis.

Not sure about this. Category options may (or may not) be aggregated to a
data element at the stage of analysis. In a pivot table, you could have
fields with both DE groups and Categories, and thus they would serve the
same purpose of pivoting and filtering.

> 4) Non-multidimensional elements should be able to be assigned a
> particular member of a CateogryOption, post facto, so that
> multi-dimensional and non-multidimensional elements would be able to
> coexist peacefully.

Yes. the categorycombo "default" fills this role now, but the name is not
perfect. Could instead choose "none".

> 5) A particular measure's full dimensionality would result from the
> combination of prime dimensions:
> a) OrgUnitGroupSets (already implemeneted). Each OrgUnitGroupSet
> would impart a column in a cross-tab query for each OrgUnitGroupSet,
> providing classification by Type, Urban Rural, Ownership as examples.
> b) Period. There is no currently (as far as I know)
> PeriodGroupSets, but usually, grouping can take place through the data
> type itself, so that values can be grouped into quaters, bi-monthly
> groups, etc. Period would impart a two columns in a cross-tab query
> (StartDate and EndDate).
> c) The resulting number of dimensions resulting from the data
> elements assignment of the data element to "Categories" and
> "DataElementGroupSets".Each Category and each DateElementGroup
> assigned to the data element would impart a single column in a
> crosstab query.
> d) Other dimensions that are already implemeneted, such as Dataset.
>
> Do these points capture the requirements, based on everything we have
> discussed in this monster thread?

I think so

Johan

···

On Thu, Oct 1, 2009 at 5:16 PM, <johansa@ifi.uio.no> wrote:

> Plese feel free to comment, especially the developers. Lars has been
> quiet and said it is possible, so I suppose he is just coding already.
>
>
> _______________________________________________
> Mailing list: https://launchpad.net/~dhis2-devs
> Post to : dhis2-devs@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~dhis2-devs
> More help : https://help.launchpad.net/ListHelp
>

_______________________________________________
Mailing list: https://launchpad.net/~dhis2-devs
Post to : dhis2-devs@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-devs
More help : https://help.launchpad.net/ListHelp

--
Cheers,
Knut Staring

Johan_Ivar_Saebo · 1 October 2009 16:02

> Here is my suggestion, based on all the mails above, so that we can
> develop all of this into a blueprint/specification.
>
> 1) Categories and category options should be left essentially as is.
> 2) Data element groups sets should allow data elements to be assigned
> to multiple data element groups. Data elements (both
multi-dimensional
> and non-multidimensional) should be able to be assigned a particular
> (only one) data element groups.
> 3) A data element group set would be something like "Age", "Sex",
> "Disease transmission method". Each data element group set would
would
> contain multiple data element groups, such as "Under 1, Under 5, Over
> 5", "Male, Female", "Vector borne, water borne".

No. "Malaria" cannot be male or female. Diarrhoea cannot be 15 years
old.
But a person with malaria can be male or female, and a 15 year old
person
can have diarrhoea.

The data element groups and group sets are metadata. The categories are
event-specific data.

In an ideal world we should be consistent on how we use categories and
groups, I agree. And I tried to put forward some best practices using
Jason's example earlier today. But there is nothing wrong with adding
group
sets like Age or Gender, I mean it doesn't break anything right? In the
pivot table it will just be another column describing the data, and you
can
choose whether to use it or not, and whether to create it in the first
place
or not. In Jason's case and in any other country where 1.4 is the data
collection tool at the lower levels, the categories cannot be used (and
all
data elements are flat) so all dimensionality will have to come from these
group sets and groups. That is why they have to break with the DHIS 2 best
practise.

Ola
---------

No, it's nothing wrong with doing it that way. But if you use DE groups to
differ on gender, you would have to make two data elements, Disease Male,
Disease Female, and assign them to each their group. When you have tables
with many columns, and many data elements, it pays off to do it the
category way.

Johan

···

2009/10/1 Knut Staring <knutst@gmail.com>

On Thu, Oct 1, 2009 at 5:16 PM, <johansa@ifi.uio.no> wrote:

Not sure what you are trying to say here. We are of course always
talking
about number of cases, even though we do not specify "Cases of Malaria"?

Johan

> 4) Data element groups sets should be exclusive. Data element groups
> can only belong to one data element group set.
> 3) Multidimensional elements will automatically be imparted an
> intrinsic level of dimensionality via the Categories, ipso facto.
> Category options would essentially be transformed to a data element
> group at the stage of analysis.
> 4) Non-multidimensional elements should be able to be assigned a
> particular member of a CateogryOption, post facto, so that
> multi-dimensional and non-multidimensional elements would be able to
> coexist peacefully.
> 5) A particular measure's full dimensionality would result from the
> combination of prime dimensions:
> a) OrgUnitGroupSets (already implemeneted). Each OrgUnitGroupSet
> would impart a column in a cross-tab query for each OrgUnitGroupSet,
> providing classification by Type, Urban Rural, Ownership as examples.
> b) Period. There is no currently (as far as I know)
> PeriodGroupSets, but usually, grouping can take place through the
data
> type itself, so that values can be grouped into quaters, bi-monthly
> groups, etc. Period would impart a two columns in a cross-tab query
> (StartDate and EndDate).
> c) The resulting number of dimensions resulting from the data
> elements assignment of the data element to "Categories" and
> "DataElementGroupSets".Each Category and each DateElementGroup
> assigned to the data element would impart a single column in a
> crosstab query.
> d) Other dimensions that are already implemeneted, such as Dataset.
>
> Do these points capture the requirements, based on everything we
have
> discussed in this monster thread?
>
> Plese feel free to comment, especially the developers. Lars has been
> quiet and said it is possible, so I suppose he is just coding
already.
>
>
> _______________________________________________
> Mailing list:
https://launchpad.net/~dhis2-devs<https://launchpad.net/~dhis2-devs>
> Post to : dhis2-devs@lists.launchpad.net
> Unsubscribe :
https://launchpad.net/~dhis2-devs<https://launchpad.net/~dhis2-devs>
> More help : https://help.launchpad.net/ListHelp
>

_______________________________________________
Mailing list:
https://launchpad.net/~dhis2-devs<https://launchpad.net/~dhis2-devs>
Post to : dhis2-devs@lists.launchpad.net
Unsubscribe :
https://launchpad.net/~dhis2-devs<https://launchpad.net/~dhis2-devs>
More help : https://help.launchpad.net/ListHelp

--
Cheers,
Knut Staring

_______________________________________________
Mailing list:
https://launchpad.net/~dhis2-devs<https://launchpad.net/~dhis2-devs>
Post to : dhis2-devs@lists.launchpad.net
Unsubscribe :
https://launchpad.net/~dhis2-devs<https://launchpad.net/~dhis2-devs>
More help : https://help.launchpad.net/ListHelp

jason · 1 October 2009 17:39

Hi Johan,
Thanks for this. It seems that we have agreement on most points.

We are not talking about cases when we talk about data element groups. We
are talking about metadata, that apply to ALL uses of that data element.
So we can have 5000 cases of malaria, from all kinds of ages and genders
(all of them!), but they would all share the metadata of Malaria = vector
borne, which has nothing to do with the individual cases.

So DE groups are metadata. I have no idea if there is anything wrong with
using the same code and name for both metadata and event-data, but for me
they are different. If you have age as DE group set, you cannot enter
different ages for that data element. You will have to make another data
element, assigned to another group.

In my view of it, it is ALL metadata about a measure, a number, or
some other value (perhaps a true/false) that occurs. Everything else,
orgunits, periods, data element names, data element groups,
categories...all the dimensions that one wants to see in a PivotTable
or filter out in a report, they are all metadata about the "data
element" or "measure", or in the DHIS database, i.e. what get put in
the value.

There are certain pieces of these metadata that have a one-to-one
relationship with the value. Values can only occur at a certain point
in time (period tells us when), at a certain place (orgunit tells us
where) and for a certain observation (data element tells us how).
Since we are only dealing with aggregate data, we do not care about
the who. We also do not really care about the exact place, the exact
doctor that was seen, or the exact point in time. OpenMRS may, but
DHIS2 does not. These dimensions (and all the others part of systems
like OpenMRS, get folded into some larger dimension like "month" even
though a particular even occurred at a given point in time.

I simply cannot see the difference between a category and a data set
.For me they are one in the same conceptually as they essentially
assign a certain type to a number of measures Categories, Data sets,
OrgUnits, Periods, they are all dimensions from an analysis
perspective. Sometimes, I may want to use them, other times, I may
want to completely fold them up and ignore them. . Whether we need
to semantically separate them for convenience purposes (e.g. the data
entry screen) is fine. But when it gets to the analysis, I want to
slice, dice and fold these different dimensions (whether they are
called categories or data element groups make no difference). How the
measures are grouped is simply metadata for me, which makes me feel
that categories and data element groups are essentially the same
beast.

I think if there are "best practices" for DHIS2, as Ola mentions, then
we need to specify them in great detail. It is obvious that you can
use the "flat" model of DHIS 1.4 to obtain essentially the same data
set without DHIS2 categories, albeit rather painfully. I would not
dare to show the query that I constructed to "unfold" the dimensions
that were inside of DHIS 1.4 data element names, but it is possible. A
set of relations would make it a lot easier, and some Java code to
allow me to press a button would be the icing on the cake. Hopefully
we are saying the same thing here.

Enough email. My head hurts.

JPP