On categories and dimensions and zooks

Knut_Staring · 2 October 2009 11:34

Hi guys,

For this discussion, it would make sense for you to take a look at the “disaggregation” descriptors in the Global Health Observatory.

Go to http://extranet.who.int/gho/

Click on World Health Statistics and then select Mortality and burden of disease. You should now see a table. Click on the header, where it says

Maternal mortality ratio (per 100 000 live births)

This will pull in the definition from the Indicator and Metadata Registry (IMR) using a web service (is planned to be SDMX later).

Click on “More” in the upper right hand corner and scroll down to Dissaggregation to see the dimensional breakdown.

Knut

···

2009/10/1 Jason Pickering jason.p.pickering@gmail.com

Hi Johan,

Thanks for this. It seems that we have agreement on most points.

We are not talking about cases when we talk about data element groups. We

are talking about metadata, that apply to ALL uses of that data element.

So we can have 5000 cases of malaria, from all kinds of ages and genders

(all of them!), but they would all share the metadata of Malaria = vector

borne, which has nothing to do with the individual cases.

So DE groups are metadata. I have no idea if there is anything wrong with

using the same code and name for both metadata and event-data, but for me

they are different. If you have age as DE group set, you cannot enter

different ages for that data element. You will have to make another data

element, assigned to another group.

In my view of it, it is ALL metadata about a measure, a number, or

some other value (perhaps a true/false) that occurs. Everything else,

orgunits, periods, data element names, data element groups,

categories…all the dimensions that one wants to see in a PivotTable

or filter out in a report, they are all metadata about the "data

element" or “measure”, or in the DHIS database, i.e. what get put in

the value.

There are certain pieces of these metadata that have a one-to-one

relationship with the value. Values can only occur at a certain point

in time (period tells us when), at a certain place (orgunit tells us

where) and for a certain observation (data element tells us how).

Since we are only dealing with aggregate data, we do not care about

the who. We also do not really care about the exact place, the exact

doctor that was seen, or the exact point in time. OpenMRS may, but

DHIS2 does not. These dimensions (and all the others part of systems

like OpenMRS, get folded into some larger dimension like “month” even

though a particular even occurred at a given point in time.

I simply cannot see the difference between a category and a data set

.For me they are one in the same conceptually as they essentially

assign a certain type to a number of measures Categories, Data sets,

OrgUnits, Periods, they are all dimensions from an analysis

perspective. Sometimes, I may want to use them, other times, I may

want to completely fold them up and ignore them. . Whether we need

to semantically separate them for convenience purposes (e.g. the data

entry screen) is fine. But when it gets to the analysis, I want to

slice, dice and fold these different dimensions (whether they are

called categories or data element groups make no difference). How the

measures are grouped is simply metadata for me, which makes me feel

that categories and data element groups are essentially the same

beast.

I think if there are “best practices” for DHIS2, as Ola mentions, then

we need to specify them in great detail. It is obvious that you can

use the “flat” model of DHIS 1.4 to obtain essentially the same data

set without DHIS2 categories, albeit rather painfully. I would not

dare to show the query that I constructed to “unfold” the dimensions

that were inside of DHIS 1.4 data element names, but it is possible. A

set of relations would make it a lot easier, and some Java code to

allow me to press a button would be the icing on the cake. Hopefully

we are saying the same thing here.

Enough email. My head hurts.

JPP

–
Cheers,
Knut Staring

Lars · 4 October 2009 22:06

Big thanks to all for illuminating the pros and cons of the current multidimensional model. It was designed in 2006 basically to support the ICD based dataentry, and we must admit that Bob is at least partially right when saying that output could have been given better thought. Anyway it is not working out too bad either it seems.

I like Bob’s suggestion for simplifying the model and it would apparently made querying easier and improve the user interface. I have a few concerns:

Feasibility. The Category-related model is integrated into 9 out of 11 service projects in DHIS 2. Re-factoring and testing all this would take months.
Backwards compatibility. Lots of databases and data-entry forms exist in the field. Conversion must be managed.
Suitability for the data-entry module. It seems likely that the CategoryCombo class can be “emulated” through the API.
Does it cut tables to change from m-n to 1-n? Using join tables to represent 1-n associations is preferred by many as it keeps the domain model cleaner.

If people say we can live with the current model I’d say we do just that. Anyway Bob’s suggestion should be documented and looked at again later. I think the point about “input without output is statistical m…” is valid. At least we will need to focus more on how to make “the goodness float up”.

Re the data element / indicator group set I think this is something we can do without risk. It won’t change the existing model and won’t break anything and wouldn’t take too long to implement. Will start on it on Wednesday. A minor comment here is that I believe we should keep the exclusiveness and compulsory-ness of the group set optional (…eh) like we have it for organisation unit group sets today.

Finally I hope people who are troubled about the lack of documentation would use Jason’s instructions and convert some of this newly discovered wisdom into… documentation.

cheers

Lars

jason · 5 October 2009 07:06

Since I started all of this, I feel compelled to write one last mail here.

I think Lars suggestions are the way to go. They are practical, ensure
compatibility with existing systems, and are certainly achievable
pretty quickly. Having the data element group sets functionality would
be a major step forward to producing useful, flexible outputs for
end-users.

I think some of the limitations, quirks, and advantages of the
current model have been highlighted. I suspect we need to look deeper
at the details during the documentation process. I will start writing
up something in DocBook format this week, commit it to the
documentation branch, but would require the input of at least Johan
,Ola, Bob and others to make the document complete. Once the data
element groups sets have been implemented, we can fill in the rest.
As I have made clear, we are dealing with a hybrid system here in
Zambia, with 1.4 and 2 running side-by-side. I can write up this use
case, but cannot add anything about the "pure" DHIS2 system, where the
lack of the data element group sets may not be such an issue.

Thanks Lars for prioritizing this. It will be a big step forward here
once implemented.

Regards,
Jason

···

2009/10/5 Lars Helge Øverland <larshelge@gmail.com>:

Big thanks to all for illuminating the pros and cons of the current
multidimensional model. It was designed in 2006 basically to support the ICD
based dataentry, and we must admit that Bob is at least partially right when
saying that output could have been given better thought. Anyway it is not
working out too bad either it seems.

I like Bob's suggestion for simplifying the model and it would apparently
made querying easier and improve the user interface. I have a few concerns:

- Feasibility. The Category-related model is integrated into 9 out of 11
service projects in DHIS 2. Re-factoring and testing all this would take
months.
- Backwards compatibility. Lots of databases and data-entry forms exist in
the field. Conversion must be managed.
- Suitability for the data-entry module. It seems likely that the
CategoryCombo class can be "emulated" through the API.
- Does it cut tables to change from m-n to 1-n? Using join tables to
represent 1-n associations is preferred by many as it keeps the domain model
cleaner.

If people say we can live with the current model I'd say we do just that.
Anyway Bob's suggestion should be documented and looked at again later. I
think the point about "input without output is statistical m..." is valid.
At least we will need to focus more on how to make "the goodness float up".

Re the data element / indicator group set I think this is something we can
do without risk. It won't change the existing model and won't break anything
and wouldn't take too long to implement. Will start on it on Wednesday. A
minor comment here is that I believe we should keep the exclusiveness and
compulsory-ness of the group set optional (..eh) like we have it for
organisation unit group sets today.

Finally I hope people who are troubled about the lack of documentation would
use Jason's instructions and convert some of this newly discovered wisdom
into... documentation.

cheers

Lars

bobj · 5 October 2009 12:43

Hi,

Big thanks to all for illuminating the pros and cons of the current multidimensional model. It was designed in 2006 basically to support the ICD based dataentry, and we must admit that Bob is at least partially right when saying that output could have been given better thought. Anyway it is not working out too bad either it seems.

I like Bob’s suggestion for simplifying the model and it would apparently made querying easier and improve the user interface. I have a few concerns:

Feasibility. The Category-related model is integrated into 9 out of 11 service projects in DHIS 2. Re-factoring and testing all this would take months.

Backwards compatibility. Lots of databases and data-entry forms exist in the field. Conversion must be managed.

I reached the same conclusion :-(. I think there is still some small rationalisation can be done, but the model is already deeply coupled with many parts of the system. Having said that I have a suggestion related to the refactoring of dimensions and dataelementgroups below.

Suitability for the data-entry module. It seems likely that the CategoryCombo class can be “emulated” through the API.

Not sure what exactly what you mean by this … but I guess probably. I suspect the work that most needs to be done on the CategoryCombo class in the API is to provide “unpicking” methods to be able to conveniently access the underlying categories (dimensions).

Does it cut tables to change from m-n to 1-n? Using join tables to represent 1-n associations is preferred by many as it keeps the domain model cleaner.

My proposal improved the situation by making a 1-n relation of category to categoryOptions. This would certainly be more efficient but doesn’t meet the use case where a categorOption might participate in different categories.

If people say we can live with the current model I’d say we do just that. Anyway Bob’s suggestion should be documented and looked at again later. I think the point about “input without output is statistical m…” is valid. At least we will need to focus more on how to make “the goodness float up”.

I think we can only know whether we can live with the current model once the api methods which seem theoretically possible are implemented. My concern is that if we provide an alternative to MD analysis through extending the groupset idea then we have no justification in recommending that implementors implement MD dataelements. Convenience of UI is not enough if in the process we enter data which we can’t unpack. What will happen is that implementors with an eye on analysis will ignore the MD notion entirely because it creates difficulties for them and they have a ready analysis solution with groups and groupsets.

Re the data element / indicator group set I think this is something we can do without risk. It won’t change the existing model and won’t break anything and wouldn’t take too long to implement. Will start on it on Wednesday. A minor comment here is that I believe we should keep the exclusiveness and compulsory-ness of the group set optional (…eh) like we have it for organisation unit group sets today.

Lars I think this is the correct response to what is clearly a very real need. But I want to suggest that we approach it as follows:

We create two new abstract classes, Dimension and DimensionOption.
DataElement should be extended with methods to retrieve Dimensions - fold/unfold whatever the gathered requirements are. These are the methods which would be used in reportable design.
Both Category and Group should in some way implement Dimension. In both cases I think the underlying structures, however imperfect, allows for this symmetry. If this is difficult for Categories initially we can throw unImplemented() for now but we will have provided the structural guidance towards harmonising the two.
We might need a DimensionSet class or perhaps just a Set getDimensions() member function of DataElement.

The point here is that if we have dimensions to a dataelement then from the reporting/analysis perspective it can be made invisible how those dimensions are implemented. Instinctively I feel it should simply be possible to retrieve datavalues from a dimension or crosstabs of dimensions.

One missing piece of the puzzle (or required symmetry) is that I don’t think currently we name a dataelement which has beneath it a dataElementGroup or set of groups. But I suspect this could be implemented relatively easily.

Whereas the above might look like it is complicating the picture I think in fact it can considerably simplify it in the long run. The correct starting point is to gather the requirements of what methods a Dimension should have. If there were to be a Dimension class and we knew nothing of implementation details, what would Jason and Ola and others really require of that class. Then we do the dirty work in the concrete implementations. Otherwise known as the sweep-it-under-the-carpet pattern Or what others might call encapsulation.

Regards
Bob

···

2009/10/4 Lars Helge Øverland larshelge@gmail.com

Finally I hope people who are troubled about the lack of documentation would use Jason’s instructions and convert some of this newly discovered wisdom into… documentation.

cheers

Lars

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

jason · 5 October 2009 14:57

Hi Bob,
I know essentially nothing about Java, so this may be a rather fluffy,
philosophical email. However ,I will offer from relational
standpoint, some comments, which I think overlap with yours, but which
are clearly less technical.

There is significant overlap in terms of what the OrgUnitGroupSets,
DataElementGroupSets and if it was implemented PeriodGroupsSets are
supposed to accomplish. They establish some sort of hierarchy and
grouping. A set of districts belong to a province. Days belong to
weeks. Certain data values were recorded for children with malaria
under 5.

It would seem there are two separate, but not competing requirements
for the data element group sets. One for data entry, and the other for
analysis. As I indicated in my earlier email, my gut feeling is that
there is no difference conceptually between a "category" and "data
element set". There may be differences in the implementation of the
classes, but conceptually, it seems to be only a way of lumping data
element together into some type of hierarchical relation. Whether
these are a seperate or single row in a database, is of little concern
to the end uers.

Those of you that have followed the OpenHealth functional prototype
have seen that what was attempted to do there was to create a union
between multidimensional data entry, and multidimensional analysis. It
was not entirely successful, but the point was clear. Sometimes you
need to be able to enter data for multiple organizational units for a
single data element (think of population indicators entered at the
national level and then distributed to districts (a requirement here)
), in other cases (and the one that DHIS has catered to) is the entry
of multiple data elements for a single organizational unit for a
single time period. The same could be said about entering a set of
data over multiple time periods for multiple organizational units for
a single data element.

I realize this may be asking to much, but is there a way that this
Dimension class could somehow be used to implement common methods
across Periods, OrgUnits and Data elements. These three concepts are
distinct and central to DHIS and data in general (when, where, what).
The rules of how these dimensions work internally are governed, are
also distinct. For instance, there are seven days in a week, three
months in a quarter, etc for Periods. I suppose this is/can be
implemented in code to tell the aggregation engine what to do. The
similarities with the concepts of hierarchies and exclusivity being
the two that come to mind, are clear to me.

I am not sure if it can be done. But if there was anyway that the
existing categories class(es) could be used for two purposes
1) Creation of multidimensional data elements for the purpose of data
entry, etc.
2) Grouping of non-multidimensional elements into a multidimensional
data element after the fact.

Perhaps it is not possible, easily to do this, but as Bob highlights,
implementers will be left with a choice, and it is not clear to me
which one is preferable. Prima facie, I would say use
non-multidimensional data elements. What happens when the
dimensionality of a multi-dimensional data element changes? Is it
possible to change this after it has been created and data entered for
it? I am not sure, perhaps it is. However, if I was given a choice
that was flexible, and allowed me to arbitrarily assign independent
data elements to group sets, either for the purpose of data entry or
analysis, this might be the route that I would choose.

I better stop here, before I continue down my sophmoric pathway.

Regards,
Jason

···

2009/10/5 Bob Jolliffe <bobjolliffe@gmail.com>:

Hi,

2009/10/4 Lars Helge Øverland <larshelge@gmail.com>

Big thanks to all for illuminating the pros and cons of the current
multidimensional model. It was designed in 2006 basically to support the ICD
based dataentry, and we must admit that Bob is at least partially right when
saying that output could have been given better thought. Anyway it is not
working out too bad either it seems.

I like Bob's suggestion for simplifying the model and it would apparently
made querying easier and improve the user interface. I have a few concerns:

- Feasibility. The Category-related model is integrated into 9 out of 11
service projects in DHIS 2. Re-factoring and testing all this would take
months.

- Backwards compatibility. Lots of databases and data-entry forms exist in
the field. Conversion must be managed.

I reached the same conclusion :-(. I think there is still some small
rationalisation can be done, but the model is already deeply coupled with
many parts of the system. Having said that I have a suggestion related to
the refactoring of dimensions and dataelementgroups below.

- Suitability for the data-entry module. It seems likely that the
CategoryCombo class can be "emulated" through the API.

Not sure what exactly what you mean by this .. but I guess probably. I
suspect the work that most needs to be done on the CategoryCombo class in
the API is to provide "unpicking" methods to be able to conveniently access
the underlying categories (dimensions).

- Does it cut tables to change from m-n to 1-n? Using join tables to
represent 1-n associations is preferred by many as it keeps the domain model
cleaner.

My proposal improved the situation by making a 1-n relation of category to
categoryOptions. This would certainly be more efficient but doesn't meet
the use case where a categorOption might participate in different
categories.

If people say we can live with the current model I'd say we do just that.
Anyway Bob's suggestion should be documented and looked at again later. I
think the point about "input without output is statistical m..." is valid.
At least we will need to focus more on how to make "the goodness float up".

I think we can only know whether we can live with the current model once the
api methods which seem theoretically possible are implemented. My concern
is that if we provide an alternative to MD analysis through extending the
groupset idea then we have no justification in recommending that
implementors implement MD dataelements. Convenience of UI is not enough if
in the process we enter data which we can't unpack. What will happen is
that implementors with an eye on analysis will ignore the MD notion entirely
because it creates difficulties for them and they have a ready analysis
solution with groups and groupsets.

Re the data element / indicator group set I think this is something we can
do without risk. It won't change the existing model and won't break anything
and wouldn't take too long to implement. Will start on it on Wednesday. A
minor comment here is that I believe we should keep the exclusiveness and
compulsory-ness of the group set optional (..eh) like we have it for
organisation unit group sets today.

Lars I think this is the correct response to what is clearly a very real
need. But I want to suggest that we approach it as follows:

- We create two new abstract classes, Dimension and DimensionOption.
- DataElement should be extended with methods to retrieve Dimensions -
fold/unfold whatever the gathered requirements are. These are the methods
which would be used in reportable design.
- Both Category and Group should in some way implement Dimension. In both
cases I think the underlying structures, however imperfect, allows for this
symmetry. If this is difficult for Categories initially we can throw
unImplemented() for now but we will have provided the structural guidance
towards harmonising the two.
- We might need a DimensionSet class or perhaps just a Set<Dimension>
getDimensions() member function of DataElement.

The point here is that if we have dimensions to a dataelement then from the
reporting/analysis perspective it can be made invisible how those dimensions
are implemented. Instinctively I feel it should simply be possible to
retrieve datavalues from a dimension or crosstabs of dimensions.

One missing piece of the puzzle (or required symmetry) is that I don't think
currently we name a dataelement which has *beneath* it a dataElementGroup or
set of groups. But I suspect this could be implemented relatively easily.

Whereas the above might look like it is complicating the picture I think in
fact it can considerably simplify it in the long run. The correct starting
point is to gather the requirements of what methods a Dimension should
have. If there were to be a Dimension class and we knew nothing of
implementation details, what would Jason and Ola and others really require
of that class. Then we do the dirty work in the concrete implementations.
Otherwise known as the sweep-it-under-the-carpet pattern Or what others
might call encapsulation.

Regards
Bob

Finally I hope people who are troubled about the lack of documentation
would use Jason's instructions and convert some of this newly discovered
wisdom into... documentation.

cheers

Lars

_______________________________________________
Mailing list: https://launchpad.net/~dhis2-devs
Post to : dhis2-devs@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-devs
More help : https://help.launchpad.net/ListHelp

_______________________________________________
Mailing list: https://launchpad.net/~dhis2-devs
Post to : dhis2-devs@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-devs
More help : https://help.launchpad.net/ListHelp

bobj · 5 October 2009 16:54

Jason,

Hi Bob,

I know essentially nothing about Java, so this may be a rather fluffy,

philosophical email. However ,I will offer from relational

standpoint, some comments, which I think overlap with yours, but which

are clearly less technical.

There is significant overlap in terms of what the OrgUnitGroupSets,

DataElementGroupSets and if it was implemented PeriodGroupsSets are

supposed to accomplish. They establish some sort of hierarchy and

grouping. A set of districts belong to a province. Days belong to

weeks. Certain data values were recorded for children with malaria

under 5.

It would seem there are two separate, but not competing requirements

for the data element group sets. One for data entry, and the other for

analysis. As I indicated in my earlier email, my gut feeling is that

there is no difference conceptually between a “category” and "data

element set". There may be differences in the implementation of the

classes, but conceptually, it seems to be only a way of lumping data

element together into some type of hierarchical relation. Whether

these are a seperate or single row in a database, is of little concern

to the end uers.

Those of you that have followed the OpenHealth functional prototype

have seen that what was attempted to do there was to create a union

between multidimensional data entry, and multidimensional analysis. It

was not entirely successful, but the point was clear. Sometimes you

need to be able to enter data for multiple organizational units for a

single data element (think of population indicators entered at the

national level and then distributed to districts (a requirement here)

), in other cases (and the one that DHIS has catered to) is the entry

of multiple data elements for a single organizational unit for a

single time period. The same could be said about entering a set of

data over multiple time periods for multiple organizational units for

a single data element.

I realize this may be asking to much, but is there a way that this

Dimension class could somehow be used to implement common methods

across Periods, OrgUnits and Data elements. These three concepts are

distinct and central to DHIS and data in general (when, where, what).

The rules of how these dimensions work internally are governed, are

also distinct. For instance, there are seven days in a week, three

months in a quarter, etc for Periods. I suppose this is/can be

implemented in code to tell the aggregation engine what to do. The

similarities with the concepts of hierarchies and exclusivity being

the two that come to mind, are clear to me.

Mapping out these similarities is exactly what the process of good object oriented design is about. Unfortunately it is often (always?) only through extended use that the patterns emerge.

I think with a bit of thought what you refer to above is certainly possible. If we just think for a start of returning DimensionOptions; in java terms a method with signature

SortedSet Dimension::getDimensionOptions()

might behave polymorphically as follows:

(i) an OrgUnit (implementing Dimension) would return a set OrgUnits
(ii) a YearlyPeriod might return a set of MonthlyPeriods, or quarters or … this one might require parameterization. (As would the previous if you wanted to unfold 1, 2 or 3 levels deep)

(iii) a category would return a set of categoryOptions
(iv) a DataSetGroup might return a set of uniDimensional DataElements

etc. Its maybe a bit more complicated, but not much.

A DataElement should in turn be able to return a set of Dimensions (a mixed bag of categories and datasetgroups of type (iii) and (iv) ).

And because all of these things implement the Dimension/DimensionSet relationship they might be able to be tabulated as equals in the output. From the UI you should be able to select Dimensions from OrgUnits, Periods and DataElements.

None of this requires much refactoring of existing models. The information is already mostly there. We just need to be able to map out the useful relationships.

Cheers
Bob

···

2009/10/5 Jason Pickering jason.p.pickering@gmail.com

I am not sure if it can be done. But if there was anyway that the

existing categories class(es) could be used for two purposes

Creation of multidimensional data elements for the purpose of data

entry, etc.

Grouping of non-multidimensional elements into a multidimensional

data element after the fact.

Perhaps it is not possible, easily to do this, but as Bob highlights,

implementers will be left with a choice, and it is not clear to me

which one is preferable. Prima facie, I would say use

non-multidimensional data elements. What happens when the

dimensionality of a multi-dimensional data element changes? Is it

possible to change this after it has been created and data entered for

it? I am not sure, perhaps it is. However, if I was given a choice

that was flexible, and allowed me to arbitrarily assign independent

data elements to group sets, either for the purpose of data entry or

analysis, this might be the route that I would choose.

I better stop here, before I continue down my sophmoric pathway.

Regards,

Jason

2009/10/5 Bob Jolliffe bobjolliffe@gmail.com:

Hi,

2009/10/4 Lars Helge Øverland larshelge@gmail.com

Big thanks to all for illuminating the pros and cons of the current

multidimensional model. It was designed in 2006 basically to support the ICD

based dataentry, and we must admit that Bob is at least partially right when

saying that output could have been given better thought. Anyway it is not

working out too bad either it seems.

I like Bob’s suggestion for simplifying the model and it would apparently

made querying easier and improve the user interface. I have a few concerns:

Feasibility. The Category-related model is integrated into 9 out of 11

service projects in DHIS 2. Re-factoring and testing all this would take

months.

Backwards compatibility. Lots of databases and data-entry forms exist in

the field. Conversion must be managed.

I reached the same conclusion :-(. I think there is still some small

rationalisation can be done, but the model is already deeply coupled with

many parts of the system. Having said that I have a suggestion related to

the refactoring of dimensions and dataelementgroups below.

Suitability for the data-entry module. It seems likely that the

CategoryCombo class can be “emulated” through the API.

Not sure what exactly what you mean by this … but I guess probably. I

suspect the work that most needs to be done on the CategoryCombo class in

the API is to provide “unpicking” methods to be able to conveniently access

the underlying categories (dimensions).

Does it cut tables to change from m-n to 1-n? Using join tables to

represent 1-n associations is preferred by many as it keeps the domain model

cleaner.

My proposal improved the situation by making a 1-n relation of category to

categoryOptions. This would certainly be more efficient but doesn’t meet

the use case where a categorOption might participate in different

categories.

If people say we can live with the current model I’d say we do just that.

Anyway Bob’s suggestion should be documented and looked at again later. I

think the point about “input without output is statistical m…” is valid.

At least we will need to focus more on how to make “the goodness float up”.

I think we can only know whether we can live with the current model once the

api methods which seem theoretically possible are implemented. My concern

is that if we provide an alternative to MD analysis through extending the

groupset idea then we have no justification in recommending that

implementors implement MD dataelements. Convenience of UI is not enough if

in the process we enter data which we can’t unpack. What will happen is

that implementors with an eye on analysis will ignore the MD notion entirely

because it creates difficulties for them and they have a ready analysis

solution with groups and groupsets.

Re the data element / indicator group set I think this is something we can

do without risk. It won’t change the existing model and won’t break anything

and wouldn’t take too long to implement. Will start on it on Wednesday. A

minor comment here is that I believe we should keep the exclusiveness and

compulsory-ness of the group set optional (…eh) like we have it for

organisation unit group sets today.

Lars I think this is the correct response to what is clearly a very real

need. But I want to suggest that we approach it as follows:

We create two new abstract classes, Dimension and DimensionOption.

DataElement should be extended with methods to retrieve Dimensions -

fold/unfold whatever the gathered requirements are. These are the methods

which would be used in reportable design.

Both Category and Group should in some way implement Dimension. In both

cases I think the underlying structures, however imperfect, allows for this

symmetry. If this is difficult for Categories initially we can throw

unImplemented() for now but we will have provided the structural guidance

towards harmonising the two.

We might need a DimensionSet class or perhaps just a Set

getDimensions() member function of DataElement.

The point here is that if we have dimensions to a dataelement then from the

reporting/analysis perspective it can be made invisible how those dimensions

are implemented. Instinctively I feel it should simply be possible to

retrieve datavalues from a dimension or crosstabs of dimensions.

One missing piece of the puzzle (or required symmetry) is that I don’t think

currently we name a dataelement which has beneath it a dataElementGroup or

set of groups. But I suspect this could be implemented relatively easily.

Whereas the above might look like it is complicating the picture I think in

fact it can considerably simplify it in the long run. The correct starting

point is to gather the requirements of what methods a Dimension should

have. If there were to be a Dimension class and we knew nothing of

implementation details, what would Jason and Ola and others really require

of that class. Then we do the dirty work in the concrete implementations.

Otherwise known as the sweep-it-under-the-carpet pattern Or what others

might call encapsulation.

Regards

Bob

Finally I hope people who are troubled about the lack of documentation

would use Jason’s instructions and convert some of this newly discovered

wisdom into… documentation.

cheers

Lars

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

bobj · 5 October 2009 16:59

A passing thought before I get the dinner on …

One thing about Dimensions is that they often need to ordered for display purposes. Months would be a good example but there are others. I have a sense that we have in a number of places created a number of containers and then then fall over ourselves making classes to order them for display purpose when we might have got away with using an ordered container to start with. Haven’t looked too closely but its a hunch …

Regards
Bob

···

2009/10/5 Bob Jolliffe bobjolliffe@gmail.com

Jason,

2009/10/5 Jason Pickering jason.p.pickering@gmail.com

Hi Bob,

I know essentially nothing about Java, so this may be a rather fluffy,

philosophical email. However ,I will offer from relational

standpoint, some comments, which I think overlap with yours, but which

are clearly less technical.

There is significant overlap in terms of what the OrgUnitGroupSets,

DataElementGroupSets and if it was implemented PeriodGroupsSets are

supposed to accomplish. They establish some sort of hierarchy and

grouping. A set of districts belong to a province. Days belong to

weeks. Certain data values were recorded for children with malaria

under 5.

It would seem there are two separate, but not competing requirements

for the data element group sets. One for data entry, and the other for

analysis. As I indicated in my earlier email, my gut feeling is that

there is no difference conceptually between a “category” and "data

element set". There may be differences in the implementation of the

classes, but conceptually, it seems to be only a way of lumping data

element together into some type of hierarchical relation. Whether

these are a seperate or single row in a database, is of little concern

to the end uers.

Those of you that have followed the OpenHealth functional prototype

have seen that what was attempted to do there was to create a union

between multidimensional data entry, and multidimensional analysis. It

was not entirely successful, but the point was clear. Sometimes you

need to be able to enter data for multiple organizational units for a

single data element (think of population indicators entered at the

national level and then distributed to districts (a requirement here)

), in other cases (and the one that DHIS has catered to) is the entry

of multiple data elements for a single organizational unit for a

single time period. The same could be said about entering a set of

data over multiple time periods for multiple organizational units for

a single data element.

I realize this may be asking to much, but is there a way that this

Dimension class could somehow be used to implement common methods

across Periods, OrgUnits and Data elements. These three concepts are

distinct and central to DHIS and data in general (when, where, what).

The rules of how these dimensions work internally are governed, are

also distinct. For instance, there are seven days in a week, three

months in a quarter, etc for Periods. I suppose this is/can be

implemented in code to tell the aggregation engine what to do. The

similarities with the concepts of hierarchies and exclusivity being

the two that come to mind, are clear to me.

Mapping out these similarities is exactly what the process of good object oriented design is about. Unfortunately it is often (always?) only through extended use that the patterns emerge.

I think with a bit of thought what you refer to above is certainly possible. If we just think for a start of returning DimensionOptions; in java terms a method with signature

SortedSet Dimension::getDimensionOptions()

might behave polymorphically as follows:

(i) an OrgUnit (implementing Dimension) would return a set OrgUnits
(ii) a YearlyPeriod might return a set of MonthlyPeriods, or quarters or … this one might require parameterization. (As would the previous if you wanted to unfold 1, 2 or 3 levels deep)

(iii) a category would return a set of categoryOptions
(iv) a DataSetGroup might return a set of uniDimensional DataElements

etc. Its maybe a bit more complicated, but not much.

A DataElement should in turn be able to return a set of Dimensions (a mixed bag of categories and datasetgroups of type (iii) and (iv) ).

And because all of these things implement the Dimension/DimensionSet relationship they might be able to be tabulated as equals in the output. From the UI you should be able to select Dimensions from OrgUnits, Periods and DataElements.

None of this requires much refactoring of existing models. The information is already mostly there. We just need to be able to map out the useful relationships.

Cheers
Bob

I am not sure if it can be done. But if there was anyway that the

existing categories class(es) could be used for two purposes

Creation of multidimensional data elements for the purpose of data

entry, etc.

Grouping of non-multidimensional elements into a multidimensional

data element after the fact.

Perhaps it is not possible, easily to do this, but as Bob highlights,

implementers will be left with a choice, and it is not clear to me

which one is preferable. Prima facie, I would say use

non-multidimensional data elements. What happens when the

dimensionality of a multi-dimensional data element changes? Is it

possible to change this after it has been created and data entered for

it? I am not sure, perhaps it is. However, if I was given a choice

that was flexible, and allowed me to arbitrarily assign independent

data elements to group sets, either for the purpose of data entry or

analysis, this might be the route that I would choose.

I better stop here, before I continue down my sophmoric pathway.

Regards,

Jason

2009/10/5 Bob Jolliffe bobjolliffe@gmail.com:

Hi,

2009/10/4 Lars Helge Øverland larshelge@gmail.com

Big thanks to all for illuminating the pros and cons of the current

multidimensional model. It was designed in 2006 basically to support the ICD

based dataentry, and we must admit that Bob is at least partially right when

saying that output could have been given better thought. Anyway it is not

working out too bad either it seems.

I like Bob’s suggestion for simplifying the model and it would apparently

made querying easier and improve the user interface. I have a few concerns:

Feasibility. The Category-related model is integrated into 9 out of 11

service projects in DHIS 2. Re-factoring and testing all this would take

months.

Backwards compatibility. Lots of databases and data-entry forms exist in

the field. Conversion must be managed.

I reached the same conclusion :-(. I think there is still some small

rationalisation can be done, but the model is already deeply coupled with

many parts of the system. Having said that I have a suggestion related to

the refactoring of dimensions and dataelementgroups below.

Suitability for the data-entry module. It seems likely that the

CategoryCombo class can be “emulated” through the API.

Not sure what exactly what you mean by this … but I guess probably. I

suspect the work that most needs to be done on the CategoryCombo class in

the API is to provide “unpicking” methods to be able to conveniently access

the underlying categories (dimensions).

Does it cut tables to change from m-n to 1-n? Using join tables to

represent 1-n associations is preferred by many as it keeps the domain model

cleaner.

My proposal improved the situation by making a 1-n relation of category to

categoryOptions. This would certainly be more efficient but doesn’t meet

the use case where a categorOption might participate in different

categories.

If people say we can live with the current model I’d say we do just that.

Anyway Bob’s suggestion should be documented and looked at again later. I

think the point about “input without output is statistical m…” is valid.

At least we will need to focus more on how to make “the goodness float up”.

I think we can only know whether we can live with the current model once the

api methods which seem theoretically possible are implemented. My concern

is that if we provide an alternative to MD analysis through extending the

groupset idea then we have no justification in recommending that

implementors implement MD dataelements. Convenience of UI is not enough if

in the process we enter data which we can’t unpack. What will happen is

that implementors with an eye on analysis will ignore the MD notion entirely

because it creates difficulties for them and they have a ready analysis

solution with groups and groupsets.

Re the data element / indicator group set I think this is something we can

do without risk. It won’t change the existing model and won’t break anything

and wouldn’t take too long to implement. Will start on it on Wednesday. A

minor comment here is that I believe we should keep the exclusiveness and

compulsory-ness of the group set optional (…eh) like we have it for

organisation unit group sets today.

Lars I think this is the correct response to what is clearly a very real

need. But I want to suggest that we approach it as follows:

We create two new abstract classes, Dimension and DimensionOption.

DataElement should be extended with methods to retrieve Dimensions -

fold/unfold whatever the gathered requirements are. These are the methods

which would be used in reportable design.

Both Category and Group should in some way implement Dimension. In both

cases I think the underlying structures, however imperfect, allows for this

symmetry. If this is difficult for Categories initially we can throw

unImplemented() for now but we will have provided the structural guidance

towards harmonising the two.

We might need a DimensionSet class or perhaps just a Set

getDimensions() member function of DataElement.

The point here is that if we have dimensions to a dataelement then from the

reporting/analysis perspective it can be made invisible how those dimensions

are implemented. Instinctively I feel it should simply be possible to

retrieve datavalues from a dimension or crosstabs of dimensions.

One missing piece of the puzzle (or required symmetry) is that I don’t think

currently we name a dataelement which has beneath it a dataElementGroup or

set of groups. But I suspect this could be implemented relatively easily.

Whereas the above might look like it is complicating the picture I think in

fact it can considerably simplify it in the long run. The correct starting

point is to gather the requirements of what methods a Dimension should

have. If there were to be a Dimension class and we knew nothing of

implementation details, what would Jason and Ola and others really require

of that class. Then we do the dirty work in the concrete implementations.

Otherwise known as the sweep-it-under-the-carpet pattern Or what others

might call encapsulation.

Regards

Bob

Finally I hope people who are troubled about the lack of documentation

would use Jason’s instructions and convert some of this newly discovered

wisdom into… documentation.

cheers

Lars

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

jason · 6 October 2009 13:22

Hi all. I am forwarding a "side" discussion I was having with Bob on
this topic. I wanted to bounce it off him, before I exposed myself too
openly. I have been trying to conceptualize my thoughts on this
topic in words, and thought we should maybe try a drawing as well. It
seems that it has more or less been agreed that data element set
groups should be implemented. The key question for me is how far we
can realistically go, as slipping down the multidimensional slope
might get messy quickly . If we think about a data value, it may be
composed of multiple internal dimensions (as Bob) points out below.
DHIS deals with this by the use of "calculated data elements", whereby
users can define essentially any combination of data elements.
Currently, it would seem that that the "dimensionality" of a
calculated data element sort of get lost during a data mart export.
For instance, we can define data elements like "Total cases of
confirmed malaria" and "Total cases of clinical malaria" and then add
them up to get "Total cases of malaria ". However, Total clinical
cases may be the combination of various other data elements as well,
such Under 1, 1-5 and Over 5 age groups. It seems we have a situation
where certain data elements can be defined in the application itself,
and then others need to derived through ad-hoc means, such an SQL
query or PivotTable.

I have put together a few "mindmaps" with the tool "freemind" which is
freely available for many platforms, to try and conceptualize to some
degree how I see a "data value" or a measure. I think Bob, below
provides a good play-by-play of the diagrams. I have started with the
concept of a data value, which may be composed of several dimensions:
DataElement (what), Period (when), OrganizationalUnit (where), and
Source (how). I faintly remember there being some discussion, as Bob
alludes to, about the "source". I see the source in a slightly
different light than Bob. If you look at a system like DevInfo, their
concept of source is essentially a difference in how the data is
obtained, and relates to its methodology (I think this may be
essentially how). Here in Zambia, there are two sources of
denomitnators: the official census and facility catchment populations.
The sum of catchment populations for all facilities in a given
district does not necessarily add up to the total official population.
It may be desirable to use this alternative denominator to calculate
things like coverage rates. Another example would be routine based
data and population based data. Often times, it may be useful to
compare two values for the same indicator value, for instance HIV
prevalence among pregnant women. This could be obtained though routine
data, or through population based surveys. Again, I have seen such
graphs here, comparing results from the routine HMIS data and results
from the DHS. So, this type of analysis may be desirable.

I think the dimensions of OrgUnit and Time are pretty straightforward.
Bob highlights are few points here that are quite valid. Each country
has their own sense of what "time" is and how it is implemented in
their HMIS system. Here, daily patient registers get tallied once a
month, and then the management wants to see quarterly figures. The
situation is of course different elsewhere, which highlights the fact
that dimensions need to be 1) hierarchical and 2) flexible. The
implementation of the OrgUnit hierarchy in DHIS is a good example,
which allows countries or organizations total flexibility in
configuration of some type of "place" hierarchy.

I personally think that the "source" dimension is one of lesser
importance. I guess the question would be is DHIS for routine data
only. It is not a priority for me at least, although it would be a
nice to have.

Data elements seem to be much more complex. As Bob points out, in my
mind, they are intrinsically recursive and may be composed of other
data elements (which has been implemented by calculated data element
with a fixed definition in DHIS). When I started mapping out some of
the data elements here, a couple of things occur to me, which have
implications on the implementation of "data element set groups".

First,there appear to be "primary data elements" which are things that
actually get recorded and entered into the system. Here, they are
taken from patient registers, tallied and then entered onto an
aggregation form (which I sent a long a few emails ago). They might
also come from a medical record system, such as OpenMRS. There may be
many folded and hidden dimensions wrapped up inside of this data
element, but for the aggregate system, we do not really care about
them. This correspond in my diagram, what is being referred to in
DHIS as "data elements".

Second, there appear to be implicit default operators about how
operations at nodes should be handled. Sometimes, it would appear to
make sense to "sum" data elements (whether they are primary or
derived). Other times, it would not make sense. We have some data
elements on the number of Doctors who have been lost, recruited and
who are on-site. It does not make sense to sum these values up to
arrive at "Total number of doctors". So, it would seem we need a bit
more logic to be built in somehow, or simply leave it up to the users
how certain values of aggregate data values should be handled.

Third, there appear to be different levels of dimensionality for
different data element hierarchies. Some data elements may be more
recursive than others, which is fairly typical. This highlights that
the dimensional hierarchy may be dynamic for each branch of a
particular derived data element. This also raises potential issues
with what a crosstab table would look like. What happens if we have 20
dimensions? Well, this would probably be OK, but what if it balloons
somehow to 200?

Fourth, in terms of categories versus data element group sets. I am
thinking that in terms of best practice, perhaps one way of
distinguishing between the two concepts would be 1) Categories provide
a dimensionality construct of data elements that can be aggregated
and should be restricted to relatively few dimensional levels (Age +
Gender) as an example. In my second point, I highlight what could be a
multidimensional data element (Doctors) with three category options
(Recruitments, Losses, On-site) which do not seem to be able to be
aggregated through standard operators, but would probably want to be
visualized in some sort of cross-tab table instead only. 2) Data
element set groups provide grouping of data elements with no preset
aggregation path (at this point either SUM or AVG). Perhaps
eventually, we could define what the default aggregation path would
be, but at this point, we can leave it up to the user to decide how to
handle slicing and dicing in a PivotTable or OLAP engine.

So, in conclusion for this mail, I think that data element set groups
would go a long way to providing some multidimensional analysis
capacity, but it feels like we are missing something to me, especially
as it applies to calculated data elements. Perhaps this is best left
up to analysts to decide, and that we define what the goal-posts are
in terms of what is achievable with the current model and our level of
resources.

Apologies once again for the long mail, but maybe this can be brought
into the eventual documentation on this subject!

Regards,
jason

dhis_mind_maps.zip (2.31 KB)

···

---------- Forwarded message ----------
From: Bob Jolliffe <bobjolliffe@gmail.com>
Date: Tue, Oct 6, 2009 at 1:10 PM
Subject: Re: [Dhis2-devs] On categories and dimensions and zooks
To: Jason Pickering <jason.p.pickering@gmail.com>

Hi

2009/10/6 Jason Pickering <jason.p.pickering@gmail.com>

I thought I would mail this to you first. I have been trying to
conceptualize my thinking a bit more, and thought a picture may
represent a thousand words. I created these "mind maps" with freemind.
I think it should run on any system.

Yes freemind is cool. I introduced it to Sundeep in Goa and he is now
an avid fan.

Take a look at them and let me know if they make any sense in terms of
our discussion. They are really not complete, and I have purposefully
left out a lot of possibilities, but have tried to give enough
examples to make my points clear. They simple diagrams obviously do
not have the rigor of something like UML, but help me to try and
visualize the concepts a bit clearer.

Let me know what you think and if this is in line with your thoughts.

First observation is that I see you are *really* talking hierarchical
dataelements here - rather than just a single level of grouping. The
SDMX model also expects these (I'll send you some sample files). I
think if this is a requirement then you should highlight it. I think
it is but then again I have a different brief - how to deal with
importing an sdmx metadata file which has hierarchical indicators.
But this is quite a fundamental paradigm shift which we should
probably look at proposing for a dhis2-ng requirements gathering
exercise.

Are we saying that:
1. each dataelement can be thought of as a composite thing;
2. it might be composed of other dataelements (recursion):
3. it might be composed of "internal" dimensions
4. it might be associated with simple datavalues

And it can get more complex

5. If it doesn't have simple datavalues then it must be possible to
return an aggregated value calculated by summing a slice along any
axis below it.
6. And it should be able to be able to return the slice (or dice) of
datavalues associated with any axes below it.

I don't know if I can draw this.

But assuming we could do all the above then can we generalize these
requirements so that the same requirements can be applied to other
hierarchical entities. I suspect we can. Datavalues are fairly
simple beasts which simply have tags associated with entities in all
of these trees.

One problem with hierarchical models is that relation databases are
really not that clever at representing them. Queries into tree like
structures tend to get needlessly complex for what should be flowing
with the logic of the model. One nice alternative to a relational
database for representing the structural metadata is an xml database
like eXist. These beasts are designed to efficiently and intuitively
handle tree-like data. I can (if I close my eyes) see a situation
where structural metada is stored in eXist and the grunt work of
datavalue storage is handled in a relational database. The eXist
query would generate an Xml output which represented the particular
tree view you required - using identified simple dataelements,
dataelement internal dimensions, sources, and periods. Pulling
datavalues out of the database to match these identifying tags should
not be a complex query.

I have also added an additional "uber-dimension", Source, which is
distinct from the DHIS concept of source, but is more inline with what
DevInfo considers a source. The thinking here would be that we would
eventually like to be able to potentially have two sources of the same
data element, measured through different means, such as population
based surveys and routine data systems.

I am not sure if this is too far away from what the original idea of
dhis2 "source" was - using inheritance an orgunit is just one kind of
source. It was always envisaged that there could be others. Mind you
the others have not materialized as yet, so I recall Lars has been
considering removing the inheritance relationship. Its been there for
a few years and no one has suggested a use yet. Maybe there is an
argument for maintaining it ...

Your hierarchies of periods are not quite so straightforward. In
particular Weeks do not sit neatly under months. In fact they don't
even sit uniformly under years unless you agree to some standard like
ISO8601. Different countries and regions have different conventions
regarding the first day of the week and the first week of the year
which makes for a horrible mess of complexities. You can't reasonably
aggregate weekly data to monthly. But you can aggregate both to
yearly.

Bottom line is not that dissimilar to dataelement hierarchies mind you
- trees of weeks won't necessarily sit neatly in a single period tree
structure. Different trees must be able to co-exist in parallel under
the same root nodes eg. tree of 52 weeks under year plus tree of 12
months under year. Quarters, Decades etc are easier to fit in.

But all in all I think you are thinking in a similar direction to me.
I just don't fancy descending into RDBMS hell trying to model these
things. probably we need to plan a design fest ..

Regards
Bob

Lars · 10 October 2009 12:38

Here comes my shot at this issue. I’m gonna use Ola’s example as a basis.

The flat data element names:
“Malaria death <5 year”
“Malaria death >5 year”

“Malaria in OPD 1st attendance <5 year”

“Malaria in OPD 1st attendance >5 year”
“Malaria IP discharge <5 year”
“Malaria IP discharge >5 year”
“Typhoid death <5 year”

“Typhoid death >5 year”

etc.
(OPD is outpatient, patients treated at the clinic, IP is inpatient meaning patients that was admitted to a hospital).

There are three dimensions in the data elements above, so I define three data element group sets:

Disease, Patient Status, and Age.
I also define 7 new data element groups (Malaria, Typhoid, <5, >5, Death, OPD, IP) and assign these groups to the group set they belong to:
Disease (Malaria, Typhoid)
Patient Status (Death, OPD, IP)

Age (<5, >5)

I then assign the data element groups to the data elements
“Malaria death <5 year” assigned to “Malaria”, “Death”, and “<5”.
etc.

All these groupings can exist completely independent of data entry and be changed at any time.

From this I can generate a new resource table for my data analysis (similar to the one we already have for orgunit group sets) that provides:
Data Element Group Set, Data Element Group, Data Element
“Disease”, “Malaria”, “Malaria death <5 year”,

“Disease”, “Typhoid”, “Typhoid death <5 year”
“Patient Status”, “Death”, “Malaria death <5 year”
etc.

When joining the above table with an aggregated data value table you can define a pivot table with your three data element group sets as columns (pivot fields) and analyse the data across these three dimensions. The data element name dimension can then be completely hidden in the analysis.

Some observations:

a) From this we can derive that a GroupSet corresponds to a Dimension and that a Group corresponds to a DimensionOption.

Dimension = GroupSet
DimensionOption = Group

b) The current Category model and the suggested simplified version both generate CategoryOptionCombos/DimensionElementCombinations which are linked to DataValue and constitute all possible combinations of their associated CategoryOptions/DimensionOptions. This means that once those CategoryOptionCombos/ DimensionElementCombinations are generated and DataValues are registered for them, they cannot change. Also, once a data entry grid is defined, the underlying model cannot change. According to Ola and Jason we must be able to assign “any dimension to a DataElement” at any time. To me this rules out re-using the same dimensional attributes for data entry and analysis - we must in any case have on set of dimensions for data entry and one set of dimensions for analysis.

c) Ola’s suggested solution supports this. It is powerful in the ability to assign “raw” DataElements to Dimensions/GroupSets through DimensionOptions/Groups, completely independent of which Categories the DataElement was assigned to for data entry. The weakness is that it is based on flat data elements, not Categorized data elements, which we must include if we are to justify the Categorized data entry.

d) The Category model is pretty good at what it currently does - facilitating grid-based dataentry and cutting down on the number of data elements (as well as making the data element naming more elegant).

Based on this I suggest we do the following:

We continue to use the Category model as it is, not for analysis - but for data entry.
Taken from Bob’s suggestion - we phase out the existing Group and replace it with a new DimensionOption object. We introduce a new Dimension object which will work similarly to a GroupSet. We use this model for analysis.
We go for Ola’s mentioned suggestion for analysis, with one exception: Rather than assigning DataElements to a Group/DimensionOption, we assign a combination of DataElement and CategoryOptionCombo (We create a new object for this for every assignment - and remove it for every de-assignment). If we want to see the total, we can assign a DataElement with the “default” CategoryOptionCombo, or create a DimensionOption where the elements make a total when summarized.
We use the same thing for Indicators.

The resource table Ola mentions will then look like this:

Group Set -Group - Data Element - CategoryOptionCombo

“Disease” - “Malaria” - “Malaria” - “(death, <5 year)”

“Disease” - “Typhoid” - “Typhoid” - “(death, >5 year)”

This way we can assign dimensions as we like without loosing the fine granularity of the captured categorized data. We can improve the report table functionality in order to utilize this. This will be feasible with the time and resource constraints we are operating with. It also alleviates the challenge regarding Indicators and SDMX.

Additionally, one could expand the quotation from a) to:

Dimension = GroupSet = Category
DimensionOption = Group = CategoryOption

which means there is potential in merging those objects/making them implement a common interface. But I don’t see the value if b) is valid.

Waiting for your replies/slaughter.

Lars

bobj · 12 October 2009 10:18

Hi Lars

I think your suggestion might adequately cover the analysis use case, but there remains a missing piece to the puzzle re SDMX export. I am particulalrly thinking of the challenge Ola and Knut are shortly facing of presenting DHIS as a consumer of WHO MDG Indicator metadata and producer of SDMX MDG reports. Comments inline below:

Here comes my shot at this issue. I’m gonna use Ola’s example as a basis.

The flat data element names:
“Malaria death <5 year”

“Malaria death >5 year”

“Malaria in OPD 1st attendance <5 year”

“Malaria in OPD 1st attendance >5 year”
“Malaria IP discharge <5 year”
“Malaria IP discharge >5 year”
“Typhoid death <5 year”

“Typhoid death >5 year”

etc.
(OPD is outpatient, patients treated at the clinic, IP is inpatient meaning patients that was admitted to a hospital).

There are three dimensions in the data elements above, so I define three data element group sets:

Disease, Patient Status, and Age.
I also define 7 new data element groups (Malaria, Typhoid, <5, >5, Death, OPD, IP) and assign these groups to the group set they belong to:
Disease (Malaria, Typhoid)
Patient Status (Death, OPD, IP)

Age (<5, >5)

I then assign the data element groups to the data elements
“Malaria death <5 year” assigned to “Malaria”, “Death”, and “<5”.
etc.

All these groupings can exist completely independent of data entry and be changed at any time.

From this I can generate a new resource table for my data analysis (similar to the one we already have for orgunit group sets) that provides:
Data Element Group Set, Data Element Group, Data Element
“Disease”, “Malaria”, “Malaria death <5 year”,

“Disease”, “Typhoid”, “Typhoid death <5 year”
“Patient Status”, “Death”, “Malaria death <5 year”
etc.

When joining the above table with an aggregated data value table you can define a pivot table with your three data element group sets as columns (pivot fields) and analyse the data across these three dimensions. The data element name dimension can then be completely hidden in the analysis.

Some observations:

a) From this we can derive that a GroupSet corresponds to a Dimension and that a Group corresponds to a DimensionOption.

Dimension = GroupSet
DimensionOption = Group

b) The current Category model and the suggested simplified version both generate CategoryOptionCombos/DimensionElementCombinations which are linked to DataValue and constitute all possible combinations of their associated CategoryOptions/DimensionOptions. This means that once those CategoryOptionCombos/ DimensionElementCombinations are generated and DataValues are registered for them, they cannot change. Also, once a data entry grid is defined, the underlying model cannot change. According to Ola and Jason we must be able to assign “any dimension to a DataElement” at any time.

I think here is the snag. In the proposed scheme you are not really assigning dimensions to a dataelement at all. In fact you do the reverse - you assign dataelements to a dimension. I still need to end up with a resulting indicator/dataelement which has a name and which has these dimensions. I’ll try a snippet of Patrick’s sample sdmx inline here to illustrate the point (Best viewed by making your font size very small).

Here is an example of some indicators:

<structure:CodeList id=“CL_INDICATOR” agencyID=“SDMX-HD” version=“1.0” isFinal=“false” urn=“urn:sdmx:org.sdmx.infomodel.codelist=SDMX-HD:CL_INDICATOR” >

        <structure:Name xml:lang="en">Indicator</structure:Name>
        <structure:Code value="0" urn="urn:sdmx:org.sdmx.infomodel.codelist.Code=SDMX-HD:INDICATOR[1.0].0">

            <structure:Description xml:lang="en">Neonatal mortality rate (per 1000 live births)</structure:Description>
        </structure:Code>
        <structure:Code value="1" urn="urn:sdmx:org.sdmx.infomodel.codelist.Code=SDMX-HD:INDICATOR[1.0].1">

            <structure:Description xml:lang="en">Number of deaths during first 28 completed days of life</structure:Description>
        </structure:Code>
        <structure:Code value="2" urn="urn:sdmx:org.sdmx.infomodel.codelist.Code=SDMX-HD:INDICATOR[1.0].2">

            <structure:Description xml:lang="en">1000 live births in a given year</structure:Description>
        </structure:Code>
        <structure:Code value="3" urn="urn:sdmx:org.sdmx.infomodel.codelist.Code=SDMX-HD:INDICATOR[1.0].3">

            <structure:Description xml:lang="en">Life expectancy at birth</structure:Description>
        </structure:Code>
        <structure:Code value="4" urn="urn:sdmx:org.sdmx.infomodel.codelist.Code=SDMX-HD:INDICATOR[1.0].4">

            <structure:Description xml:lang="en">Adults aged = 15 years who are obese</structure:Description>
        </structure:Code>           
    </structure:CodeList>

(The last one strikes me as a bid odd. I would have thought the indicator would be “Number of people who are Obese” and the age stuff would be in a dimension. But anyway … best not to get obsessed with dimensions)

Here is an example of a dimension:

<structure:CodeList id=“CL_GENDER” agencyID=“SDMX-HD” version=“1.0” isFinal=“true” urn=“urn:sdmx:org.sdmx.infomodel.codelist=SDMX-HD:CL_GENDER”>

  <structure:Name xml:lang="en">Gender</structure:Name>
  <structure:Description xml:lang="en">Gender.</structure:Description>
  <structure:Code value="1" urn="urn:sdmx:org.sdmx.infomodel.codelist.Code=SDMX-HD:CL_GENDER[1.0].1">

    <structure:Description xml:lang="en">Male</structure:Description>
  </structure:Code>
  <structure:Code value="2" urn="urn:sdmx:org.sdmx.infomodel.codelist.Code=SDMX-HD:CL_GENDER[1.0].2">

    <structure:Description xml:lang="en">Female</structure:Description>
  </structure:Code>
  <structure:Code value="3" urn="urn:sdmx:org.sdmx.infomodel.codelist.Code=SDMX-HD:CL_GENDER[1.0].3">

    <structure:Description xml:lang="en">Transgender</structure:Description>
  </structure:Code>
  <structure:Code value="_NA" urn="urn:sdmx:org.sdmx.infomodel.codelist.Code=SDMX-HD:CL_GENDER[1.0]._NA">

    <structure:Description xml:lang="en">Not Applicable</structure:Description>
  </structure:Code>
  <structure:Code value="_ALL" urn="urn:sdmx:org.sdmx.infomodel.codelist.Code=SDMX-HD:CL_GENDER[1.0]._ALL">

    <structure:Description xml:lang="en">All</structure:Description>
  </structure:Code>
  <structure:Code value="_UNK" urn="urn:sdmx:org.sdmx.infomodel.codelist.Code=SDMX-HD:CL_GENDER[1.0]._UNK">

    <structure:Description xml:lang="en">Unknown</structure:Description>
  </structure:Code>
</structure:CodeList>

Note that both the indicator and the dimension are represented by a common element (structure:CodeList). This is not purely coincidental. In terms of the DataValue the indicator and the dimension are treated the same way - as an attribute. So in this sense the Indicator (like the period and orgunit) are like compulsory dimensions.

    <ns:Series DISEASE="1" PROG="0" GEOGRAPHIC_PLACE_NAME="CH-GE" ORGANIZATION="1" INDICATOR="4" VALUE_TYPE="1" GENDER="_ALL" AGROUP="5" GLOCATION="3" PERIODICITY="4" UNIT="_NA" REPEATS="0"  >

        <ns:Obs OBS_VALUE="400" TIME_PERIOD="2008" DATE_COLLECT="2009-03-20" />           
    </ns:Series>

(Series is just used to group datavalues in a time series. DISEASE might be for example Malaria)

What would (or what could) the Indicator be in our sample scenario? This is where it would be really useful to get hold of the actual MDG indicator definitions that we apparently won’t see till the 20th. Having said that we can get a pretty good idea of what they will look like from here: http://mdgs.un.org/unsd/mdg/Host.aspx?Content=Indicators/OfficialList.htm.

Anyway, I hope you see my point. Whereas we do need to be able to group indicators/dataelements into dimensions, those dimensions still have to be a dimension of something. Is it a dimension of the Indicator? Well almost, but not quite. Its interesting if you look at the indicator list above that there is no mention of dimensions. I think - and I don’t want to confuse things further by bringing in further terminology - it is actually a dimension of the “measure”. Contrary to some recent discussions in which, myself included, we thought that dataelement might be equivalent to what some people call measure. This is not the case, as Jørn quickly and vigorously pointed out. The “measure” is the type of data value (or series of datavalues) which might be something like “percentage of population” or “proportion of poulation per 1000” or something like that.

And the measure would have dimensions, including compulsory ones like Indicator, Period, OrganisationUnit as well as optional ones like Disease, Gender, Age etc.

But in practice, because the Indicator is a compulsory dimension, a particular instance of a measure (an OBS_VALUE in SDMX) would be associated with a particular Indicator + its other dimensions. So I think, besides the Indicators which make up the dimensions as per the groupset idea, we must also have an Indicator which has these dimensions. A recursion I know.

So, in addition to Lars’ model, I would propose an Indicator (and DataElement) interface as follows:

interface MultiDimensionalElement
{
OrderedList getDimensions():
void setDimensions(OrderedList);

void addDimension(Dimension);
etc
}

and Indicator implements MultiDimensionalElement; and DataElement implements MultiDimensionalElement.

And of course getDimensions() can (and many or most cases will) return NULL.

Remaing thoughts:
(i) an Indicator, even a multidimensional one, still needs a value. I suspect in most cases this will be the aggregation of its dimension values. For example, taking MDG indicator number 4.1 (Under-five mortality rate), this will probably have a Gender dimension which we will implement using groups and groupsets, but it will also have an aggregate value.

(ii) medium term. I don’t think it makes any sense to continue to support two methods of implementing multidimensionality. The revised model of Lars (with additions) should eventually also be able to be used to implement the grid data entry requirement. But we can suspend that discussion for now

Sorry for the long mail. Lars do you think it makes sense to extend your model this way? I know we need to come up with a solution pretty quickly on this.

Regards
Bob

···

2009/10/10 Lars Helge Øverland larshelge@gmail.com

To me this rules out re-using the same dimensional attributes for data entry and analysis - we must in any case have on set of dimensions for data entry and one set of dimensions for analysis.

c) Ola’s suggested solution supports this. It is powerful in the ability to assign “raw” DataElements to Dimensions/GroupSets through DimensionOptions/Groups, completely independent of which Categories the DataElement was assigned to for data entry. The weakness is that it is based on flat data elements, not Categorized data elements, which we must include if we are to justify the Categorized data entry.

d) The Category model is pretty good at what it currently does - facilitating grid-based dataentry and cutting down on the number of data elements (as well as making the data element naming more elegant).

Based on this I suggest we do the following:

We continue to use the Category model as it is, not for analysis - but for data entry.

Taken from Bob’s suggestion - we phase out the existing Group and replace it with a new DimensionOption object. We introduce a new Dimension object which will work similarly to a GroupSet. We use this model for analysis.

We go for Ola’s mentioned suggestion for analysis, with one exception: Rather than assigning DataElements to a Group/DimensionOption, we assign a combination of DataElement and CategoryOptionCombo (We create a new object for this for every assignment - and remove it for every de-assignment). If we want to see the total, we can assign a DataElement with the “default” CategoryOptionCombo, or create a DimensionOption where the elements make a total when summarized.

We use the same thing for Indicators.

The resource table Ola mentions will then look like this:

Group Set -Group - Data Element - CategoryOptionCombo

“Disease” - “Malaria” - “Malaria” - “(death, <5 year)”

“Disease” - “Typhoid” - “Typhoid” - “(death, >5 year)”

This way we can assign dimensions as we like without loosing the fine granularity of the captured categorized data. We can improve the report table functionality in order to utilize this. This will be feasible with the time and resource constraints we are operating with. It also alleviates the challenge regarding Indicators and SDMX.

Additionally, one could expand the quotation from a) to:

Dimension = GroupSet = Category
DimensionOption = Group = CategoryOption

which means there is potential in merging those objects/making them implement a common interface. But I don’t see the value if b) is valid.

Waiting for your replies/slaughter.

Lars

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Knut_Staring · 12 October 2009 13:44

Attached is the format that the IMR webservice provides currently (not yet SDMX) for MDG indicators

IndicatorGetAsXml.xml (4.92 KB)

···

2009/10/12 Bob Jolliffe bobjolliffe@gmail.com

Hi Lars

I think your suggestion might adequately cover the analysis use case, but there remains a missing piece to the puzzle re SDMX export. I am particulalrly thinking of the challenge Ola and Knut are shortly facing of presenting DHIS as a consumer of WHO MDG Indicator metadata and producer of SDMX MDG reports. Comments inline below:

2009/10/10 Lars Helge Øverland larshelge@gmail.com

Here comes my shot at this issue. I’m gonna use Ola’s example as a basis.

The flat data element names:
“Malaria death <5 year”

“Malaria death >5 year”

“Malaria in OPD 1st attendance <5 year”

“Malaria in OPD 1st attendance >5 year”
“Malaria IP discharge <5 year”
“Malaria IP discharge >5 year”
“Typhoid death <5 year”

“Typhoid death >5 year”

etc.
(OPD is outpatient, patients treated at the clinic, IP is inpatient meaning patients that was admitted to a hospital).

There are three dimensions in the data elements above, so I define three data element group sets:

Disease, Patient Status, and Age.
I also define 7 new data element groups (Malaria, Typhoid, <5, >5, Death, OPD, IP) and assign these groups to the group set they belong to:
Disease (Malaria, Typhoid)
Patient Status (Death, OPD, IP)

Age (<5, >5)

I then assign the data element groups to the data elements
“Malaria death <5 year” assigned to “Malaria”, “Death”, and “<5”.
etc.

All these groupings can exist completely independent of data entry and be changed at any time.

From this I can generate a new resource table for my data analysis (similar to the one we already have for orgunit group sets) that provides:
Data Element Group Set, Data Element Group, Data Element
“Disease”, “Malaria”, “Malaria death <5 year”,

“Disease”, “Typhoid”, “Typhoid death <5 year”
“Patient Status”, “Death”, “Malaria death <5 year”
etc.

When joining the above table with an aggregated data value table you can define a pivot table with your three data element group sets as columns (pivot fields) and analyse the data across these three dimensions. The data element name dimension can then be completely hidden in the analysis.

Some observations:

a) From this we can derive that a GroupSet corresponds to a Dimension and that a Group corresponds to a DimensionOption.

Dimension = GroupSet
DimensionOption = Group

b) The current Category model and the suggested simplified version both generate CategoryOptionCombos/DimensionElementCombinations which are linked to DataValue and constitute all possible combinations of their associated CategoryOptions/DimensionOptions. This means that once those CategoryOptionCombos/ DimensionElementCombinations are generated and DataValues are registered for them, they cannot change. Also, once a data entry grid is defined, the underlying model cannot change. According to Ola and Jason we must be able to assign “any dimension to a DataElement” at any time.

I think here is the snag. In the proposed scheme you are not really assigning dimensions to a dataelement at all. In fact you do the reverse - you assign dataelements to a dimension. I still need to end up with a resulting indicator/dataelement which has a name and which has these dimensions. I’ll try a snippet of Patrick’s sample sdmx inline here to illustrate the point (Best viewed by making your font size very small).

Here is an example of some indicators:

<structure:CodeList id=“CL_INDICATOR” agencyID=“SDMX-HD” version=“1.0” isFinal=“false” urn=“urn:sdmx:org.sdmx.infomodel.codelist=SDMX-HD:CL_INDICATOR” >
        <structure:Name xml:lang="en">Indicator</structure:Name>
        <structure:Code value="0" urn="urn:sdmx:org.sdmx.infomodel.codelist.Code=SDMX-HD:INDICATOR[1.0].0">



            <structure:Description xml:lang="en">Neonatal mortality rate (per 1000 live births)</structure:Description>
        </structure:Code>
        <structure:Code value="1" urn="urn:sdmx:org.sdmx.infomodel.codelist.Code=SDMX-HD:INDICATOR[1.0].1">



            <structure:Description xml:lang="en">Number of deaths during first 28 completed days of life</structure:Description>
        </structure:Code>
        <structure:Code value="2" urn="urn:sdmx:org.sdmx.infomodel.codelist.Code=SDMX-HD:INDICATOR[1.0].2">



            <structure:Description xml:lang="en">1000 live births in a given year</structure:Description>
        </structure:Code>
        <structure:Code value="3" urn="urn:sdmx:org.sdmx.infomodel.codelist.Code=SDMX-HD:INDICATOR[1.0].3">



            <structure:Description xml:lang="en">Life expectancy at birth</structure:Description>
        </structure:Code>
        <structure:Code value="4" urn="urn:sdmx:org.sdmx.infomodel.codelist.Code=SDMX-HD:INDICATOR[1.0].4">



            <structure:Description xml:lang="en">Adults aged = 15 years who are obese</structure:Description>
        </structure:Code>           
    </structure:CodeList>
(The last one strikes me as a bid odd. I would have thought the indicator would be “Number of people who are Obese” and the age stuff would be in a dimension. But anyway … best not to get obsessed with dimensions)

Here is an example of a dimension:

<structure:CodeList id=“CL_GENDER” agencyID=“SDMX-HD” version=“1.0” isFinal=“true” urn=“urn:sdmx:org.sdmx.infomodel.codelist=SDMX-HD:CL_GENDER”>
  <structure:Name xml:lang="en">Gender</structure:Name>
  <structure:Description xml:lang="en">Gender.</structure:Description>
  <structure:Code value="1" urn="urn:sdmx:org.sdmx.infomodel.codelist.Code=SDMX-HD:CL_GENDER[1.0].1">



    <structure:Description xml:lang="en">Male</structure:Description>
  </structure:Code>
  <structure:Code value="2" urn="urn:sdmx:org.sdmx.infomodel.codelist.Code=SDMX-HD:CL_GENDER[1.0].2">



    <structure:Description xml:lang="en">Female</structure:Description>
  </structure:Code>
  <structure:Code value="3" urn="urn:sdmx:org.sdmx.infomodel.codelist.Code=SDMX-HD:CL_GENDER[1.0].3">



    <structure:Description xml:lang="en">Transgender</structure:Description>
  </structure:Code>
  <structure:Code value="_NA" urn="urn:sdmx:org.sdmx.infomodel.codelist.Code=SDMX-HD:CL_GENDER[1.0]._NA">



    <structure:Description xml:lang="en">Not Applicable</structure:Description>
  </structure:Code>
  <structure:Code value="_ALL" urn="urn:sdmx:org.sdmx.infomodel.codelist.Code=SDMX-HD:CL_GENDER[1.0]._ALL">



    <structure:Description xml:lang="en">All</structure:Description>
  </structure:Code>
  <structure:Code value="_UNK" urn="urn:sdmx:org.sdmx.infomodel.codelist.Code=SDMX-HD:CL_GENDER[1.0]._UNK">



    <structure:Description xml:lang="en">Unknown</structure:Description>
  </structure:Code>
</structure:CodeList>
Note that both the indicator and the dimension are represented by a common element (structure:CodeList). This is not purely coincidental. In terms of the DataValue the indicator and the dimension are treated the same way - as an attribute. So in this sense the Indicator (like the period and orgunit) are like compulsory dimensions.
    <ns:Series DISEASE="1" PROG="0" GEOGRAPHIC_PLACE_NAME="CH-GE" ORGANIZATION="1" INDICATOR="4" VALUE_TYPE="1" GENDER="_ALL" AGROUP="5" GLOCATION="3" PERIODICITY="4" UNIT="_NA" REPEATS="0"  >



        <ns:Obs OBS_VALUE="400" TIME_PERIOD="2008" DATE_COLLECT="2009-03-20" />           
    </ns:Series>
(Series is just used to group datavalues in a time series. DISEASE might be for example Malaria)

What would (or what could) the Indicator be in our sample scenario? This is where it would be really useful to get hold of the actual MDG indicator definitions that we apparently won’t see till the 20th. Having said that we can get a pretty good idea of what they will look like from here: http://mdgs.un.org/unsd/mdg/Host.aspx?Content=Indicators/OfficialList.htm.

Anyway, I hope you see my point. Whereas we do need to be able to group indicators/dataelements into dimensions, those dimensions still have to be a dimension of something. Is it a dimension of the Indicator? Well almost, but not quite. Its interesting if you look at the indicator list above that there is no mention of dimensions. I think - and I don’t want to confuse things further by bringing in further terminology - it is actually a dimension of the “measure”. Contrary to some recent discussions in which, myself included, we thought that dataelement might be equivalent to what some people call measure. This is not the case, as Jørn quickly and vigorously pointed out. The “measure” is the type of data value (or series of datavalues) which might be something like “percentage of population” or “proportion of poulation per 1000” or something like that.

And the measure would have dimensions, including compulsory ones like Indicator, Period, OrganisationUnit as well as optional ones like Disease, Gender, Age etc.

But in practice, because the Indicator is a compulsory dimension, a particular instance of a measure (an OBS_VALUE in SDMX) would be associated with a particular Indicator + its other dimensions. So I think, besides the Indicators which make up the dimensions as per the groupset idea, we must also have an Indicator which has these dimensions. A recursion I know.

So, in addition to Lars’ model, I would propose an Indicator (and DataElement) interface as follows:

interface MultiDimensionalElement
{
OrderedList getDimensions():
void setDimensions(OrderedList);

void addDimension(Dimension);
etc
}

and Indicator implements MultiDimensionalElement; and DataElement implements MultiDimensionalElement.

And of course getDimensions() can (and many or most cases will) return NULL.

Remaing thoughts:
(i) an Indicator, even a multidimensional one, still needs a value. I suspect in most cases this will be the aggregation of its dimension values. For example, taking MDG indicator number 4.1 (Under-five mortality rate), this will probably have a Gender dimension which we will implement using groups and groupsets, but it will also have an aggregate value.

(ii) medium term. I don’t think it makes any sense to continue to support two methods of implementing multidimensionality. The revised model of Lars (with additions) should eventually also be able to be used to implement the grid data entry requirement. But we can suspend that discussion for now

Sorry for the long mail. Lars do you think it makes sense to extend your model this way? I know we need to come up with a solution pretty quickly on this.

Regards
Bob

To me this rules out re-using the same dimensional attributes for data entry and analysis - we must in any case have on set of dimensions for data entry and one set of dimensions for analysis.

c) Ola’s suggested solution supports this. It is powerful in the ability to assign “raw” DataElements to Dimensions/GroupSets through DimensionOptions/Groups, completely independent of which Categories the DataElement was assigned to for data entry. The weakness is that it is based on flat data elements, not Categorized data elements, which we must include if we are to justify the Categorized data entry.

d) The Category model is pretty good at what it currently does - facilitating grid-based dataentry and cutting down on the number of data elements (as well as making the data element naming more elegant).

Based on this I suggest we do the following:

We continue to use the Category model as it is, not for analysis - but for data entry.

Taken from Bob’s suggestion - we phase out the existing Group and replace it with a new DimensionOption object. We introduce a new Dimension object which will work similarly to a GroupSet. We use this model for analysis.

We go for Ola’s mentioned suggestion for analysis, with one exception: Rather than assigning DataElements to a Group/DimensionOption, we assign a combination of DataElement and CategoryOptionCombo (We create a new object for this for every assignment - and remove it for every de-assignment). If we want to see the total, we can assign a DataElement with the “default” CategoryOptionCombo, or create a DimensionOption where the elements make a total when summarized.

We use the same thing for Indicators.

The resource table Ola mentions will then look like this:

Group Set -Group - Data Element - CategoryOptionCombo

“Disease” - “Malaria” - “Malaria” - “(death, <5 year)”

“Disease” - “Typhoid” - “Typhoid” - “(death, >5 year)”

This way we can assign dimensions as we like without loosing the fine granularity of the captured categorized data. We can improve the report table functionality in order to utilize this. This will be feasible with the time and resource constraints we are operating with. It also alleviates the challenge regarding Indicators and SDMX.

Additionally, one could expand the quotation from a) to:

Dimension = GroupSet = Category
DimensionOption = Group = CategoryOption

which means there is potential in merging those objects/making them implement a common interface. But I don’t see the value if b) is valid.

Waiting for your replies/slaughter.

Lars

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

–
Cheers,
Knut Staring

jason · 13 October 2009 09:18

I think Ola's example is a pretty typical one and certainly one that
meets most of the requirements here.

To me this rules out re-using the same dimensional attributes for data
entry and analysis - we must in any case have on set of dimensions for data
entry and one set of dimensions for analysis.

Yes, I agree with this to some extent. I think that categories should
be confined to data entry, which is a clearly a requirement, but not
an absolute one. Non-multidimensional data elements could be employed
for the same purpose it would seem. I would think they may need
special methods like ordering on a screen, which points to more of an
implementation of an object, than the object itself.
Multidimensional data elements can be assigned cateogries for the
purpose of data entry, which would not be able to be deleted, but
could perhaps be added to, once data has been entered. I suppose
categories could be deleted, with the data being deleted as well?
Seems complicated, but maybe this has been considered during the
original implementation. I agree , they should be able to be assigned
additional DimensionOptions, after the fact. CategoriesOptions and
DimensionOptions should be invisible to the end-user. Likewise, data
elements that have not been assigned categories for the purpose of
data entry, should be able to be assigned DimensionOptions at any
point in time. Thus, CateogryOptions and DimensionOptions would be
drawn from the same data source potentially, but used for different
purposes.

c) Ola's suggested solution supports this. It is powerful in the ability to
assign "raw" DataElements to Dimensions/GroupSets through
DimensionOptions/Groups, completely independent of which Categories the
DataElement was assigned to for data entry. The weakness is that it is based
on flat data elements, not Categorized data elements, which we must include
if we are to justify the Categorized data entry.

I think there is clear justification, as Johan has pointed out. It
makes life easier in some cases, but I see a potential problem when it
comes around to changing a multidimensional data element which
inevitably happens all the time. Disaggregation is changed from time
to time, which will force implementers that have chosen to go down the
multidimensional route, to create new data elements somehow, with
different category options. But if you want to compare these
longitudinally across the change in CateogryOptions, this could be
done by assigning correct DimensionOptions if necessary, perhaps?

This way we can assign dimensions as we like without loosing the fine
granularity of the captured categorized data. We can improve the report
table functionality in order to utilize this. This will be feasible with the
time and resource constraints we are operating with. It also alleviates the
challenge regarding Indicators and SDMX.

Perhaps. I am not familiar yet with how SDMX implements indicators.
However, it is clear from our experience with the OpenHealth prototype
that not all data elements or indicators are intrinsically
multidimensional. Some of the indicators from the UNGASS are
devilishly complex, like the NCPI index. DHIS 2 gets around this
problem with the use of formulas. Take the indicator "Couple year
protection rate", which is defined in my system as [103.16] / 13 +
[104.16] / 4 + [105.16] / 6 + [107.16] * 10 + [106.16] * 5 + [108.16]
* 10 + [109.16] * 10...a lot of data elements divided/multiplied by
certain factors. This would never be multidimensional and SDMX will
have to transport this "indicator logic" with it.

The main use of the DimensionOptions/CategoryOptions for me would be
for ad-hoc analysis, OLAP cubes, and other "analysis" purposes. I
guess SDMX would need to be able transport both multdimensional
indicators (which are defined with multidimensional data elements or
non-multidimensional data elements that have been assigned
DimensionOptions), but these will not cover all indicators .Usually,
indicators themselves are not going to be sliced and diced with
PivotTables in the same way as data elements, which may be brought
into a PivotTable in order to calculate an ad-hoc indicator. If we
could utilize the DimensionOptions with indicators for this purpose,
it might be very useful to calculate things like "Under 5 malaria
mortality rate" as a slice of the indicator "Malaria mortality rate",
but for some indicators (as illustrated above) this is not going to be
possible.

Best regards,
Jason