DataElement -> PeriodType association

You are right that not having a reverse association makes selecting
all the "unassigned" DataElements a bit tricky. Perhaps the most
robust solution is to have a default unassigned DataSet - with a NULL
periodType. All new DataElements start life in this DataSet. This
makes finding the available dataElements pretty trivial and we can
also rigorously enforce that a dataElement MUST be a member of a
dataSet which is a good thing.

This will be a complexity trade off between data model / database schema and
application logic. Getting DataElements for a PeriodType will definitely
involve a more complex and slower query or additional application logic.

a bit slower but really not much. And not more complex application
logic above the API:

Collection<DataElements> PeriodType::getDataElements()

As for accidental complexity / future implications the DataElement ->
PeriodType association has been in use in DHIS 1.4 for several years and
proved to be working. What do we know about this approach?

I haven't seen the DHIS1.4 model so I can't really say. But I think
we all agree that having DataElement::getPeriodType() is a good thing.
And this seems to be borne out from the 1.4 experience.

You might be right on the performance of dataMart etc. It will really
come down to how often you have to call getPeriodType() which
shouldn't be too much - the real data crunching meat is in the
datavalues not the dataelements I think. If indeed it proves very
costly then it might justify the hacking the schema, but I think we
should try implementing at the API level first. Reimplementing
DataElement::getPeriodType() can be done if necessary.

I guess Hibernate will cache the DataSet lookup anyway. But remember we
cannot implement getPeriodType in the DataElement object if there is no
association DataElement -> DataSet.

On second pass this is actually quite trivial to implement because we
have the go-between DataSetMembers(?). Finding the DataSets
associated with a DataElement should be as straightforward as finding
the DataElements associated with a DataSet.

Yes, but "datasetmembers" is a mapping table in the database, we still need
an association in the object model.

The "implicit" approach can be done, but it involves a DataElement ->
DataSet association, and more complex queries/programming model. (If the
DataElement -> DataSet association is omitted; a call to the service layer
to get the PeriodType for a DataElement.)

Yes agreed. We currently have:

<set name="dataElements" table="datasetmembers">
      <key column="datasetid"/>
      <many-to-many class="org.hisp.dhis.dataelement.DataElement"
          column="dataelementid" foreign-key="fk_dataset_dataelementid"/>
    </set>
in the DataSet.hbm.xml

We would need a mirror association in the DataElement.hbm.xml in order
to get DataSets smoothely. But still using the same mapping table.
At worst this will involve making a new index.

The "explicit" approcah also involves an association (DataElement ->
PeriodType). I am not sure if one could say it involves redundancy, as we
don't have a model enforcement of one-or-more DataSet memberships for a
DataElement.

Yes this is true. But what do you think of the idea to enforce
DataSet membership having a default DataSet for all the delinquents?
I'm not sure if it can be enforced by the schema, but at least by the
application.

I opt for the latter approach. Of course I might be wrong:)

I don't know if its about right or wrong. There are pros and cons of
both approaches. What you gain on the swings you lose on the
roundabouts :slight_smile:

In the explicit case the application will have to enforce that DataSet
members all have the same periodType.

In the implicit case the application will have to enforce that
DataElements can only be members of multiple groups if these share the
same PeriodType.

The net result as far as the Data API is concerned can and must be the
same. Perhaps we should define exactly what extra methods we want in
the API first. We have already identified a few. Then decide whether
a database change is necessitated by these.

Bob

···

2009/3/20 Lars Helge Øverland <larshelge@gmail.com>:

Yes this is true. But what do you think of the idea to enforce

DataSet membership having a default DataSet for all the delinquents?

I’m not sure if it can be enforced by the schema, but at least by the

application.

OK but what does this give us in terms of PeriodType-determining if this default DataSet has a null PeriodType?

I don’t know if its about right or wrong. There are pros and cons of

both approaches. What you gain on the swings you lose on the

roundabouts :slight_smile:

In the explicit case the application will have to enforce that DataSet

members all have the same periodType.

In the implicit case the application will have to enforce that

DataElements can only be members of multiple groups if these share the

same PeriodType.

The net result as far as the Data API is concerned can and must be the

same. Perhaps we should define exactly what extra methods we want in

the API first. We have already identified a few. Then decide whether

a database change is necessitated by these.

Yes. We need at least service method:

Collection getDataElementsByPeriodType( PeriodType )

and getter on the DataElement object:

PeriodType getPeriodType()

I guess we could make a branch, start coding and see how it works out.

Another issue would arise in the (exotic) situation where someone assigns a DataElement to a DataSet, enter data for it, then removes it from the DataElement. The data is there, but how do we deal with it in regard to the mentioned required functionaly (trend analysis, datamart) ?

Yes this is true. But what do you think of the idea to enforce
DataSet membership having a default DataSet for all the delinquents?
I'm not sure if it can be enforced by the schema, but at least by the
application.

OK but what does this give us in terms of PeriodType-determining if this
default DataSet has a null PeriodType?

Nothing really. The only effect would be you have an index on the
unassigned DataElements for what its worth. Mainly it would be useful
for determining easily the available DataElements which can be added
to a DataSet. Maybe its a nonsense idea - I was just trying to think
of ways to make editing DataSets reasonably straightforward.

I don't know if its about right or wrong. There are pros and cons of
both approaches. What you gain on the swings you lose on the
roundabouts :slight_smile:

In the explicit case the application will have to enforce that DataSet
members all have the same periodType.

In the implicit case the application will have to enforce that
DataElements can only be members of multiple groups if these share the
same PeriodType.

The net result as far as the Data API is concerned can and must be the
same. Perhaps we should define exactly what extra methods we want in
the API first. We have already identified a few. Then decide whether
a database change is necessitated by these.

Yes. We need at least service method:

Collection<DataElement> getDataElementsByPeriodType( PeriodType )

and getter on the DataElement object:

PeriodType getPeriodType()

I guess we could make a branch, start coding and see how it works out.

Sure. So long as we are adding methods we won't be breaking anything
in terms of backward compatibility. Just enforcing application level
constraints. Then we can really encourage (enforce?) upper layers to
strictly interact with the data via the API. Even if this might
occasionally mean making some lightweight API methods which bypass the
ORM.

Another issue would arise in the (exotic) situation where someone assigns a
DataElement to a DataSet, enter data for it, then removes it from the
DataElement. The data is there, but how do we deal with it in regard to the
mentioned required functionaly (trend analysis, datamart) ?

Yes this gets a bit weird (I presume you mean removes it from the
DataSet). I'm guessing you haven't lost the data because the
dataValues each have a PeriodID which in turn is linked to a
PeriodType. I suppose that (in such an exotic headspace) DataElements
can in fact change their PeriodTypes over time, though I imagine its
not a great idea.

The effect would be the same in the explicit relationship case, if
someone assigns a DataElement to a DataSet, enter data for it, then
changes the PeriodType of the DataElement ...

Cheers
Bob

···

2009/3/20 Lars Helge Øverland <larshelge@gmail.com>:

Hi,

After Kim Anh’s email about the use of the same data elements with different period types I dug up this old discussion from March 2009.

What is the status on this work, or did we not conclude this?

Ola

···

2009/3/20 Bob Jolliffe bobjolliffe@gmail.com

2009/3/20 Lars Helge Øverland larshelge@gmail.com:

Yes this is true. But what do you think of the idea to enforce

DataSet membership having a default DataSet for all the delinquents?

I’m not sure if it can be enforced by the schema, but at least by the

application.

OK but what does this give us in terms of PeriodType-determining if this

default DataSet has a null PeriodType?

Nothing really. The only effect would be you have an index on the

unassigned DataElements for what its worth. Mainly it would be useful

for determining easily the available DataElements which can be added

to a DataSet. Maybe its a nonsense idea - I was just trying to think

of ways to make editing DataSets reasonably straightforward.

I don’t know if its about right or wrong. There are pros and cons of

both approaches. What you gain on the swings you lose on the

roundabouts :slight_smile:

In the explicit case the application will have to enforce that DataSet

members all have the same periodType.

In the implicit case the application will have to enforce that

DataElements can only be members of multiple groups if these share the

same PeriodType.

The net result as far as the Data API is concerned can and must be the

same. Perhaps we should define exactly what extra methods we want in

the API first. We have already identified a few. Then decide whether

a database change is necessitated by these.

Yes. We need at least service method:

Collection getDataElementsByPeriodType( PeriodType )

and getter on the DataElement object:

PeriodType getPeriodType()

I guess we could make a branch, start coding and see how it works out.

Sure. So long as we are adding methods we won’t be breaking anything

in terms of backward compatibility. Just enforcing application level

constraints. Then we can really encourage (enforce?) upper layers to

strictly interact with the data via the API. Even if this might

occasionally mean making some lightweight API methods which bypass the

ORM.

Another issue would arise in the (exotic) situation where someone assigns a

DataElement to a DataSet, enter data for it, then removes it from the

DataElement. The data is there, but how do we deal with it in regard to the

mentioned required functionaly (trend analysis, datamart) ?

Yes this gets a bit weird (I presume you mean removes it from the

DataSet). I’m guessing you haven’t lost the data because the

dataValues each have a PeriodID which in turn is linked to a

PeriodType. I suppose that (in such an exotic headspace) DataElements

can in fact change their PeriodTypes over time, though I imagine its

not a great idea.

The effect would be the same in the explicit relationship case, if
someone assigns a DataElement to a DataSet, enter data for it, then

changes the PeriodType of the DataElement …

Cheers

Bob


Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Data elements derive their period type from the data sets they are members of.

···

On Thu, May 20, 2010 at 11:44 AM, Ola Hodne Titlestad olatitle@gmail.com wrote:

Hi,

After Kim Anh’s email about the use of the same data elements with different period types I dug up this old discussion from March 2009.

What is the status on this work, or did we not conclude this?

Ola

2009/3/20 Bob Jolliffe bobjolliffe@gmail.com

2009/3/20 Lars Helge Øverland larshelge@gmail.com:

Yes this is true. But what do you think of the idea to enforce

DataSet membership having a default DataSet for all the delinquents?

I’m not sure if it can be enforced by the schema, but at least by the

application.

OK but what does this give us in terms of PeriodType-determining if this

default DataSet has a null PeriodType?

Nothing really. The only effect would be you have an index on the

unassigned DataElements for what its worth. Mainly it would be useful

for determining easily the available DataElements which can be added

to a DataSet. Maybe its a nonsense idea - I was just trying to think

of ways to make editing DataSets reasonably straightforward.

I don’t know if its about right or wrong. There are pros and cons of

both approaches. What you gain on the swings you lose on the

roundabouts :slight_smile:

In the explicit case the application will have to enforce that DataSet

members all have the same periodType.

In the implicit case the application will have to enforce that

DataElements can only be members of multiple groups if these share the

same PeriodType.

The net result as far as the Data API is concerned can and must be the

same. Perhaps we should define exactly what extra methods we want in

the API first. We have already identified a few. Then decide whether

a database change is necessitated by these.

Yes. We need at least service method:

Collection getDataElementsByPeriodType( PeriodType )

and getter on the DataElement object:

PeriodType getPeriodType()

I guess we could make a branch, start coding and see how it works out.

Sure. So long as we are adding methods we won’t be breaking anything

in terms of backward compatibility. Just enforcing application level

constraints. Then we can really encourage (enforce?) upper layers to

strictly interact with the data via the API. Even if this might

occasionally mean making some lightweight API methods which bypass the

ORM.

Another issue would arise in the (exotic) situation where someone assigns a

DataElement to a DataSet, enter data for it, then removes it from the

DataElement. The data is there, but how do we deal with it in regard to the

mentioned required functionaly (trend analysis, datamart) ?

Yes this gets a bit weird (I presume you mean removes it from the

DataSet). I’m guessing you haven’t lost the data because the

dataValues each have a PeriodID which in turn is linked to a

PeriodType. I suppose that (in such an exotic headspace) DataElements

can in fact change their PeriodTypes over time, though I imagine its

not a great idea.

The effect would be the same in the explicit relationship case, if
someone assigns a DataElement to a DataSet, enter data for it, then

changes the PeriodType of the DataElement …

Cheers

Bob


Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Re-looking at that old discussion reminds me:

The period type of a datavalue can always be deduced from the period.
So it is actually possible to import datavalues for dataelements which
are not members of a data set and not lose the period type
information.

periodType = dv.getPeriod().getPeriodType();

···

2010/5/20 Lars Helge Øverland <larshelge@gmail.com>:

Data elements derive their period type from the data sets they are members
of.

On Thu, May 20, 2010 at 11:44 AM, Ola Hodne Titlestad <olatitle@gmail.com> > wrote:

Hi,

After Kim Anh's email about the use of the same data elements with
different period types I dug up this old discussion from March 2009.

What is the status on this work, or did we not conclude this?

Ola
----------

2009/3/20 Bob Jolliffe <bobjolliffe@gmail.com>

2009/3/20 Lars Helge Øverland <larshelge@gmail.com>:
>
>>
>> Yes this is true. But what do you think of the idea to enforce
>> DataSet membership having a default DataSet for all the delinquents?
>> I'm not sure if it can be enforced by the schema, but at least by the
>> application.
>
> OK but what does this give us in terms of PeriodType-determining if
> this
> default DataSet has a null PeriodType?

Nothing really. The only effect would be you have an index on the
unassigned DataElements for what its worth. Mainly it would be useful
for determining easily the available DataElements which can be added
to a DataSet. Maybe its a nonsense idea - I was just trying to think
of ways to make editing DataSets reasonably straightforward.

>
>>
>> I don't know if its about right or wrong. There are pros and cons of
>> both approaches. What you gain on the swings you lose on the
>> roundabouts :slight_smile:
>>
>> In the explicit case the application will have to enforce that DataSet
>> members all have the same periodType.
>>
>> In the implicit case the application will have to enforce that
>> DataElements can only be members of multiple groups if these share the
>> same PeriodType.
>>
>> The net result as far as the Data API is concerned can and must be the
>> same. Perhaps we should define exactly what extra methods we want in
>> the API first. We have already identified a few. Then decide whether
>> a database change is necessitated by these.
>
> Yes. We need at least service method:
>
> Collection<DataElement> getDataElementsByPeriodType( PeriodType )
>
> and getter on the DataElement object:
>
> PeriodType getPeriodType()
>
>
> I guess we could make a branch, start coding and see how it works out.

Sure. So long as we are adding methods we won't be breaking anything
in terms of backward compatibility. Just enforcing application level
constraints. Then we can really encourage (enforce?) upper layers to
strictly interact with the data via the API. Even if this might
occasionally mean making some lightweight API methods which bypass the
ORM.

>
> Another issue would arise in the (exotic) situation where someone
> assigns a
> DataElement to a DataSet, enter data for it, then removes it from the
> DataElement. The data is there, but how do we deal with it in regard to
> the
> mentioned required functionaly (trend analysis, datamart) ?
>

Yes this gets a bit weird (I presume you mean removes it from the
DataSet). I'm guessing you haven't lost the data because the
dataValues each have a PeriodID which in turn is linked to a
PeriodType. I suppose that (in such an exotic headspace) DataElements
can in fact change their PeriodTypes over time, though I imagine its
not a great idea.

The effect would be the same in the explicit relationship case, if
someone assigns a DataElement to a DataSet, enter data for it, then
changes the PeriodType of the DataElement ...

Cheers
Bob

_______________________________________________
Mailing list: https://launchpad.net/~dhis2-devs
Post to : dhis2-devs@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-devs
More help : https://help.launchpad.net/ListHelp

And when they are members of two datasets with different period types they have multiple period types right?

The key thing to look out for in data entry and data import is to avoid overlaps in data values that will cause duplication when aggregating data periods.

E.g. if the SAME ORGUNIT registers values for the same data element for two different period types that have overlapping periods, e.g. Jan-10 and Q1-10. Then the aggregate values for Q1-10, Jan-June 2010, and 2010 will all show an incorrect value since the value for Jan-10 is counted twice.

One way to enforce this constraint is to monitor which datasets an orgunit is assigned to, and not allow orgunits to be assigned to two datasets that have the same data element AND different period types. As far as I am aware, we are not checking for this today. During data import it could be checked on data element level by looking up the period type the way Bob has shown, but that sounds like a lot of look ups and time consuming validation, or?

A relatively normal use case that we probably have to find a way to support, and I think they are struggling with in Vietnam, is that different provinces can use different period types for the same data elements (even for complete data sets). E.g. if the national data flow policy says to report on immunisation data every quarter, so that becomes the minimum requirement for all provinces. Then some of the provinces decide that all their facilities have to collect this data monthly anyway, and then at the province level they simply send the quarterly aggregates to national level (in the paper-based or Excel world). At the same time other provinces just collect quarterly data at the facility level as in the minimum national requirement. At the national level there is a need to consolidate all this data, even data by the facility level, so ideally a national DHIS database should be able to store both monthly and quarterly raw data values for the same data elements, but for different orgunits. The national information users can then easily generate quarterly reports on immunisation for all provinces, while in some provinces they can do monthly data analysis if they want to collect data using that frequency.

We support the above scenario by allowing the same data elements to be assigned to different data sets with different period types, but we don’t control for misuse of this flexibility which can lead to duplication and inconsistent aggregated data values as pointed out above.

Ola

···

2010/5/20 Lars Helge Øverland larshelge@gmail.com

Data elements derive their period type from the data sets they are members of.


On Thu, May 20, 2010 at 11:44 AM, Ola Hodne Titlestad olatitle@gmail.com wrote:

Hi,

After Kim Anh’s email about the use of the same data elements with different period types I dug up this old discussion from March 2009.

What is the status on this work, or did we not conclude this?

Ola

2009/3/20 Bob Jolliffe bobjolliffe@gmail.com

2009/3/20 Lars Helge Øverland larshelge@gmail.com:

Yes this is true. But what do you think of the idea to enforce

DataSet membership having a default DataSet for all the delinquents?

I’m not sure if it can be enforced by the schema, but at least by the

application.

OK but what does this give us in terms of PeriodType-determining if this

default DataSet has a null PeriodType?

Nothing really. The only effect would be you have an index on the

unassigned DataElements for what its worth. Mainly it would be useful

for determining easily the available DataElements which can be added

to a DataSet. Maybe its a nonsense idea - I was just trying to think

of ways to make editing DataSets reasonably straightforward.

I don’t know if its about right or wrong. There are pros and cons of

both approaches. What you gain on the swings you lose on the

roundabouts :slight_smile:

In the explicit case the application will have to enforce that DataSet

members all have the same periodType.

In the implicit case the application will have to enforce that

DataElements can only be members of multiple groups if these share the

same PeriodType.

The net result as far as the Data API is concerned can and must be the

same. Perhaps we should define exactly what extra methods we want in

the API first. We have already identified a few. Then decide whether

a database change is necessitated by these.

Yes. We need at least service method:

Collection getDataElementsByPeriodType( PeriodType )

and getter on the DataElement object:

PeriodType getPeriodType()

I guess we could make a branch, start coding and see how it works out.

Sure. So long as we are adding methods we won’t be breaking anything

in terms of backward compatibility. Just enforcing application level

constraints. Then we can really encourage (enforce?) upper layers to

strictly interact with the data via the API. Even if this might

occasionally mean making some lightweight API methods which bypass the

ORM.

Another issue would arise in the (exotic) situation where someone assigns a

DataElement to a DataSet, enter data for it, then removes it from the

DataElement. The data is there, but how do we deal with it in regard to the

mentioned required functionaly (trend analysis, datamart) ?

Yes this gets a bit weird (I presume you mean removes it from the

DataSet). I’m guessing you haven’t lost the data because the

dataValues each have a PeriodID which in turn is linked to a

PeriodType. I suppose that (in such an exotic headspace) DataElements

can in fact change their PeriodTypes over time, though I imagine its

not a great idea.

The effect would be the same in the explicit relationship case, if
someone assigns a DataElement to a DataSet, enter data for it, then

changes the PeriodType of the DataElement …

Cheers

Bob


Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Data elements derive their period type from the data sets they are members
of.

Restated (what I just sent Lars only by mistake): a datavalue derives
its period type from the data set of
which its data element is a member :slight_smile:

And when they are members of two datasets with different period types they
have multiple period types right?

It's important to remain aware that it is values ultimately which have
periods (and hence period types).

And when you look at a value you can derive its period type in one of
two ways - via dataset or via period. Potentially these could
disagree, The one which derives from its period should be considered
authoritative ie. if the period is 2009-Jan then regardless of what
the dataset might say this really must be monthly. Of course we hope
these always agree. Incidentally the lookup from
datelement-to-dataset-to-period looks like a greater complexity than
the lookup from period->periodType.

The key thing to look out for in data entry and data import is to avoid
overlaps in data values that will cause duplication when aggregating data
periods.
E.g. if the SAME ORGUNIT registers values for the same data element for two
different period types that have overlapping periods, e.g. Jan-10 and Q1-10.
Then the aggregate values for Q1-10, Jan-June 2010, and 2010 will all show
an incorrect value since the value for Jan-10 is counted twice.

OK. Thats a good concrete constraint to have.

One way to enforce this constraint is to monitor which datasets an orgunit
is assigned to, and not allow orgunits to be assigned to two datasets that
have the same data element AND different period types.

Agreed, Though this constraint should probably be imposed on forms
rather than datasets.

As far as I am aware,
we are not checking for this today. During data import it could be checked
on data element level by looking up the period type the way Bob has shown,
but that sounds like a lot of look ups and time consuming validation, or?

On data import we don't really validate at all, beyond whatever
constraints the db imposes. For efficiency we simply pop the values in
with multiple insert statement. So this validation would have to
happen as a stage before the actual import or would have to be
constrained within the db. In fact it can't be validated easily
before the import as it is dependent on existing values within the db.

A relatively normal use case that we probably have to find a way to support,
and I think they are struggling with in Vietnam, is that different provinces
can use different period types for the same data elements (even for complete
data sets). E.g. if the national data flow policy says to report on
immunisation data every quarter, so that becomes the minimum requirement for
all provinces. Then some of the provinces decide that all their facilities
have to collect this data monthly anyway, and then at the province level
they simply send the quarterly aggregates to national level (in the
paper-based or Excel world). At the same time other provinces just collect
quarterly data at the facility level as in the minimum national requirement.
At the national level there is a need to consolidate all this data, even
data by the facility level, so ideally a national DHIS database should be
able to store both monthly and quarterly raw data values for the same data
elements, but for different orgunits. The national information users can
then easily generate quarterly reports on immunisation for all provinces,
while in some provinces they can do monthly data analysis if they want to
collect data using that frequency.

We support the above scenario by allowing the same data elements to be
assigned to different data sets with different period types, but we don't
control for misuse of this flexibility which can lead to duplication and
inconsistent aggregated data values as pointed out above.

Thinking further ... I really think the problem arises because we we
have a dataset concept which represents a form and is also used to
constrain periodtypes on dataelements. Thinking of the use case you
have just described, it should be the case that one can have a paper
form which national level expect to collect quarterly, and the same
form be used at a lower level to collect data monthly. If we wanted
to mirror that use case electronically we would have to divorce the
form from the periodtype - ie a form would collect datavalues of a
certain period, but the same form could be used in different orgunits
for collecting data at a different frequency..

So (leaving dataset aside for the moment) if we can't assign a
periodtype to a form and we can't assign to a dataelement and its too
inefficient to validate on a one by one datavalue basis what is a girl
to do?

I suspect the correct answer is to refactor datavalue and create a
datavalueset type - note: a set of datavalues rather than a set of
dataelements. Designing out loud, a datavalueset would have the
following fields/attributes:

1. a formid - the collection instrument used - roughly corresponds to
current dataset
2. an orgunitid - where the datavalues come from
3. a periodid - the period of all the datavalues
couple of other useful attributes I can think of

Datavalue now becomes slightly simpler (which is always a good thing).
It only has:
value, dataelementid, categorycombooption, datasetid

We can relatively efficiently validate that a dataset object is not
persisted which has the same formid, orgunitid and an overlapping
period.

There is no longer any ambiguity about periodtype of a datavalue.

stored_by, timestamp, comment might go either way. Probably they need
to stay on datavalue. I notice comment is rarely used but its really
useful to have a comment on datavalueset for import purposes.

'nuff designing out loud. Got to go.

Regards
Bob

···

2010/5/20 Ola Hodne Titlestad <olatitle@gmail.com>:

2010/5/20 Lars Helge Øverland <larshelge@gmail.com>

Ola
---------

On Thu, May 20, 2010 at 11:44 AM, Ola Hodne Titlestad <olatitle@gmail.com> >> wrote:

Hi,

After Kim Anh's email about the use of the same data elements with
different period types I dug up this old discussion from March 2009.

What is the status on this work, or did we not conclude this?

Ola
----------

2009/3/20 Bob Jolliffe <bobjolliffe@gmail.com>

2009/3/20 Lars Helge Øverland <larshelge@gmail.com>:
>
>>
>> Yes this is true. But what do you think of the idea to enforce
>> DataSet membership having a default DataSet for all the delinquents?
>> I'm not sure if it can be enforced by the schema, but at least by the
>> application.
>
> OK but what does this give us in terms of PeriodType-determining if
> this
> default DataSet has a null PeriodType?

Nothing really. The only effect would be you have an index on the
unassigned DataElements for what its worth. Mainly it would be useful
for determining easily the available DataElements which can be added
to a DataSet. Maybe its a nonsense idea - I was just trying to think
of ways to make editing DataSets reasonably straightforward.

>
>>
>> I don't know if its about right or wrong. There are pros and cons of
>> both approaches. What you gain on the swings you lose on the
>> roundabouts :slight_smile:
>>
>> In the explicit case the application will have to enforce that
>> DataSet
>> members all have the same periodType.
>>
>> In the implicit case the application will have to enforce that
>> DataElements can only be members of multiple groups if these share
>> the
>> same PeriodType.
>>
>> The net result as far as the Data API is concerned can and must be
>> the
>> same. Perhaps we should define exactly what extra methods we want in
>> the API first. We have already identified a few. Then decide
>> whether
>> a database change is necessitated by these.
>
> Yes. We need at least service method:
>
> Collection<DataElement> getDataElementsByPeriodType( PeriodType )
>
> and getter on the DataElement object:
>
> PeriodType getPeriodType()
>
>
> I guess we could make a branch, start coding and see how it works out.

Sure. So long as we are adding methods we won't be breaking anything
in terms of backward compatibility. Just enforcing application level
constraints. Then we can really encourage (enforce?) upper layers to
strictly interact with the data via the API. Even if this might
occasionally mean making some lightweight API methods which bypass the
ORM.

>
> Another issue would arise in the (exotic) situation where someone
> assigns a
> DataElement to a DataSet, enter data for it, then removes it from the
> DataElement. The data is there, but how do we deal with it in regard
> to the
> mentioned required functionaly (trend analysis, datamart) ?
>

Yes this gets a bit weird (I presume you mean removes it from the
DataSet). I'm guessing you haven't lost the data because the
dataValues each have a PeriodID which in turn is linked to a
PeriodType. I suppose that (in such an exotic headspace) DataElements
can in fact change their PeriodTypes over time, though I imagine its
not a great idea.

The effect would be the same in the explicit relationship case, if
someone assigns a DataElement to a DataSet, enter data for it, then
changes the PeriodType of the DataElement ...

Cheers
Bob

_______________________________________________
Mailing list: https://launchpad.net/~dhis2-devs
Post to : dhis2-devs@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-devs
More help : https://help.launchpad.net/ListHelp

Data elements derive their period type from the data sets they are members
of.

Restated (what I just sent Lars only by mistake): a datavalue derives
its period type from the data set of
which its data element is a member :slight_smile:

And when they are members of two datasets with different period types they
have multiple period types right?

It's important to remain aware that it is values ultimately which have
periods (and hence period types).

And when you look at a value you can derive its period type in one of
two ways - via dataset or via period. Potentially these could
disagree, The one which derives from its period should be considered
authoritative ie. if the period is 2009-Jan then regardless of what
the dataset might say this really must be monthly. Of course we hope
these always agree. Incidentally the lookup from
datelement-to-dataset-to-period looks like a greater complexity than
the lookup from period->periodType.

The key thing to look out for in data entry and data import is to avoid
overlaps in data values that will cause duplication when aggregating data
periods.
E.g. if the SAME ORGUNIT registers values for the same data element for two
different period types that have overlapping periods, e.g. Jan-10 and Q1-10.
Then the aggregate values for Q1-10, Jan-June 2010, and 2010 will all show
an incorrect value since the value for Jan-10 is counted twice.

OK. Thats a good concrete constraint to have.

One way to enforce this constraint is to monitor which datasets an orgunit
is assigned to, and not allow orgunits to be assigned to two datasets that
have the same data element AND different period types.

Agreed, Though this constraint should probably be imposed on forms
rather than datasets.

As far as I am aware,
we are not checking for this today. During data import it could be checked
on data element level by looking up the period type the way Bob has shown,
but that sounds like a lot of look ups and time consuming validation, or?

On data import we don't really validate at all, beyond whatever
constraints the db imposes. For efficiency we simply pop the values in
with multiple insert statement. So this validation would have to
happen as a stage before the actual import or would have to be
constrained within the db. In fact it can't be validated easily
before the import as it is dependent on existing values within the db.

A relatively normal use case that we probably have to find a way to support,
and I think they are struggling with in Vietnam, is that different provinces
can use different period types for the same data elements (even for complete
data sets). E.g. if the national data flow policy says to report on
immunisation data every quarter, so that becomes the minimum requirement for
all provinces. Then some of the provinces decide that all their facilities
have to collect this data monthly anyway, and then at the province level
they simply send the quarterly aggregates to national level (in the
paper-based or Excel world). At the same time other provinces just collect
quarterly data at the facility level as in the minimum national requirement.
At the national level there is a need to consolidate all this data, even
data by the facility level, so ideally a national DHIS database should be
able to store both monthly and quarterly raw data values for the same data
elements, but for different orgunits. The national information users can
then easily generate quarterly reports on immunisation for all provinces,
while in some provinces they can do monthly data analysis if they want to
collect data using that frequency.

We support the above scenario by allowing the same data elements to be
assigned to different data sets with different period types, but we don't
control for misuse of this flexibility which can lead to duplication and
inconsistent aggregated data values as pointed out above.

Thinking further ... I really think the problem arises because we we
have a dataset concept which represents a form and is also used to
constrain periodtypes on dataelements. Thinking of the use case you
have just described, it should be the case that one can have a paper
form which national level expect to collect quarterly, and the same
form be used at a lower level to collect data monthly. If we wanted
to mirror that use case electronically we would have to divorce the
form from the periodtype - ie a form would collect datavalues of a
certain period, but the same form could be used in different orgunits
for collecting data at a different frequency..

So (leaving dataset aside for the moment) if we can't assign a
periodtype to a form and we can't assign to a dataelement and its too
inefficient to validate on a one by one datavalue basis what is a girl
to do?

I suspect the correct answer is to refactor datavalue and create a
datavalueset type - note: a set of datavalues rather than a set of
dataelements. Designing out loud, a datavalueset would have the
following fields/attributes:

1. a formid - the collection instrument used - roughly corresponds to
current dataset
2. an orgunitid - where the datavalues come from
3. a periodid - the period of all the datavalues
couple of other useful attributes I can think of

Datavalue now becomes slightly simpler (which is always a good thing).
It only has:
value, dataelementid, categorycombooption, datasetid

Afterthought:
At the risk of adding complexity to what is otherwise a
simplification, my life could become even simpler if datavalueset also
had a categorycombo attribute, which would imply that a dataset was
linked to a formsectionid rather than a formid.

So a form has sections. sections have dataelements. And sections
have a datavalueset as a model - which implies a uniform categorycombo
within the section.

There isn't really a need for dataelements to have a categorycombo.
And in lots of ways its good that they don't. Then I am reducing
complexity rather than adding to it :slight_smile:

Consider one orgunit has collected malaria deaths disaggregated by
age. Another has collected values for the the same dataelement, but
not disaggregated by age. The datavalues will come from a
datavalueset so will have a categorycombo. It is possible to
aggregate or compare these datavalues,from different datavaluesets,
but using the lowest common denominator of categorycombo ie. in both
cases you have access to malaria deaths - in the one case you have to
"roll-up" the categorycombo which does of course assume that the sum
of category options make a sensible whole, but Ola has mentioned this
one many times.

Regards
Bob

···

On 20 May 2010 15:56, Bob Jolliffe <bobjolliffe@gmail.com> wrote:

2010/5/20 Ola Hodne Titlestad <olatitle@gmail.com>:

2010/5/20 Lars Helge Øverland <larshelge@gmail.com>

We can relatively efficiently validate that a dataset object is not
persisted which has the same formid, orgunitid and an overlapping
period.

There is no longer any ambiguity about periodtype of a datavalue.

stored_by, timestamp, comment might go either way. Probably they need
to stay on datavalue. I notice comment is rarely used but its really
useful to have a comment on datavalueset for import purposes.

'nuff designing out loud. Got to go.

Regards
Bob

Ola
---------

On Thu, May 20, 2010 at 11:44 AM, Ola Hodne Titlestad <olatitle@gmail.com> >>> wrote:

Hi,

After Kim Anh's email about the use of the same data elements with
different period types I dug up this old discussion from March 2009.

What is the status on this work, or did we not conclude this?

Ola
----------

2009/3/20 Bob Jolliffe <bobjolliffe@gmail.com>

2009/3/20 Lars Helge Øverland <larshelge@gmail.com>:
>
>>
>> Yes this is true. But what do you think of the idea to enforce
>> DataSet membership having a default DataSet for all the delinquents?
>> I'm not sure if it can be enforced by the schema, but at least by the
>> application.
>
> OK but what does this give us in terms of PeriodType-determining if
> this
> default DataSet has a null PeriodType?

Nothing really. The only effect would be you have an index on the
unassigned DataElements for what its worth. Mainly it would be useful
for determining easily the available DataElements which can be added
to a DataSet. Maybe its a nonsense idea - I was just trying to think
of ways to make editing DataSets reasonably straightforward.

>
>>
>> I don't know if its about right or wrong. There are pros and cons of
>> both approaches. What you gain on the swings you lose on the
>> roundabouts :slight_smile:
>>
>> In the explicit case the application will have to enforce that
>> DataSet
>> members all have the same periodType.
>>
>> In the implicit case the application will have to enforce that
>> DataElements can only be members of multiple groups if these share
>> the
>> same PeriodType.
>>
>> The net result as far as the Data API is concerned can and must be
>> the
>> same. Perhaps we should define exactly what extra methods we want in
>> the API first. We have already identified a few. Then decide
>> whether
>> a database change is necessitated by these.
>
> Yes. We need at least service method:
>
> Collection<DataElement> getDataElementsByPeriodType( PeriodType )
>
> and getter on the DataElement object:
>
> PeriodType getPeriodType()
>
>
> I guess we could make a branch, start coding and see how it works out.

Sure. So long as we are adding methods we won't be breaking anything
in terms of backward compatibility. Just enforcing application level
constraints. Then we can really encourage (enforce?) upper layers to
strictly interact with the data via the API. Even if this might
occasionally mean making some lightweight API methods which bypass the
ORM.

>
> Another issue would arise in the (exotic) situation where someone
> assigns a
> DataElement to a DataSet, enter data for it, then removes it from the
> DataElement. The data is there, but how do we deal with it in regard
> to the
> mentioned required functionaly (trend analysis, datamart) ?
>

Yes this gets a bit weird (I presume you mean removes it from the
DataSet). I'm guessing you haven't lost the data because the
dataValues each have a PeriodID which in turn is linked to a
PeriodType. I suppose that (in such an exotic headspace) DataElements
can in fact change their PeriodTypes over time, though I imagine its
not a great idea.

The effect would be the same in the explicit relationship case, if
someone assigns a DataElement to a DataSet, enter data for it, then
changes the PeriodType of the DataElement ...

Cheers
Bob

_______________________________________________
Mailing list: https://launchpad.net/~dhis2-devs
Post to : dhis2-devs@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-devs
More help : https://help.launchpad.net/ListHelp

Some really interesting ideas you are bringing up here Bob. I like the kind of flexibility and yet structure this would bring to the data model.

One quick question though:
How would this fit with the use of data elements and categorycombooptions in metadata expressions like indicators and validation rules that are (and should be) completely independent from data collection structures? E.g. which categories and options should be available for a given data element when setting up an indicator formula? All?

Ola

···

On 20 May 2010 18:39, Bob Jolliffe bobjolliffe@gmail.com wrote:

On 20 May 2010 15:56, Bob Jolliffe bobjolliffe@gmail.com wrote:

2010/5/20 Ola Hodne Titlestad olatitle@gmail.com:

2010/5/20 Lars Helge Øverland larshelge@gmail.com

Data elements derive their period type from the data sets they are members

of.

Restated (what I just sent Lars only by mistake): a datavalue derives

its period type from the data set of

which its data element is a member :slight_smile:

And when they are members of two datasets with different period types they

have multiple period types right?

It’s important to remain aware that it is values ultimately which have

periods (and hence period types).

And when you look at a value you can derive its period type in one of

two ways - via dataset or via period. Potentially these could

disagree, The one which derives from its period should be considered

authoritative ie. if the period is 2009-Jan then regardless of what

the dataset might say this really must be monthly. Of course we hope

these always agree. Incidentally the lookup from

datelement-to-dataset-to-period looks like a greater complexity than

the lookup from period->periodType.

The key thing to look out for in data entry and data import is to avoid

overlaps in data values that will cause duplication when aggregating data

periods.

E.g. if the SAME ORGUNIT registers values for the same data element for two

different period types that have overlapping periods, e.g. Jan-10 and Q1-10.

Then the aggregate values for Q1-10, Jan-June 2010, and 2010 will all show

an incorrect value since the value for Jan-10 is counted twice.

OK. Thats a good concrete constraint to have.

One way to enforce this constraint is to monitor which datasets an orgunit

is assigned to, and not allow orgunits to be assigned to two datasets that

have the same data element AND different period types.

Agreed, Though this constraint should probably be imposed on forms

rather than datasets.

As far as I am aware,

we are not checking for this today. During data import it could be checked

on data element level by looking up the period type the way Bob has shown,

but that sounds like a lot of look ups and time consuming validation, or?

On data import we don’t really validate at all, beyond whatever

constraints the db imposes. For efficiency we simply pop the values in

with multiple insert statement. So this validation would have to

happen as a stage before the actual import or would have to be

constrained within the db. In fact it can’t be validated easily

before the import as it is dependent on existing values within the db.

A relatively normal use case that we probably have to find a way to support,

and I think they are struggling with in Vietnam, is that different provinces

can use different period types for the same data elements (even for complete

data sets). E.g. if the national data flow policy says to report on

immunisation data every quarter, so that becomes the minimum requirement for

all provinces. Then some of the provinces decide that all their facilities

have to collect this data monthly anyway, and then at the province level

they simply send the quarterly aggregates to national level (in the

paper-based or Excel world). At the same time other provinces just collect

quarterly data at the facility level as in the minimum national requirement.

At the national level there is a need to consolidate all this data, even

data by the facility level, so ideally a national DHIS database should be

able to store both monthly and quarterly raw data values for the same data

elements, but for different orgunits. The national information users can

then easily generate quarterly reports on immunisation for all provinces,

while in some provinces they can do monthly data analysis if they want to

collect data using that frequency.

We support the above scenario by allowing the same data elements to be

assigned to different data sets with different period types, but we don’t

control for misuse of this flexibility which can lead to duplication and

inconsistent aggregated data values as pointed out above.

Thinking further … I really think the problem arises because we we

have a dataset concept which represents a form and is also used to

constrain periodtypes on dataelements. Thinking of the use case you

have just described, it should be the case that one can have a paper

form which national level expect to collect quarterly, and the same

form be used at a lower level to collect data monthly. If we wanted

to mirror that use case electronically we would have to divorce the

form from the periodtype - ie a form would collect datavalues of a

certain period, but the same form could be used in different orgunits

for collecting data at a different frequency…

So (leaving dataset aside for the moment) if we can’t assign a

periodtype to a form and we can’t assign to a dataelement and its too

inefficient to validate on a one by one datavalue basis what is a girl

to do?

I suspect the correct answer is to refactor datavalue and create a

datavalueset type - note: a set of datavalues rather than a set of

dataelements. Designing out loud, a datavalueset would have the

following fields/attributes:

  1. a formid - the collection instrument used - roughly corresponds to

current dataset

  1. an orgunitid - where the datavalues come from
  1. a periodid - the period of all the datavalues

couple of other useful attributes I can think of

Datavalue now becomes slightly simpler (which is always a good thing).

It only has:

value, dataelementid, categorycombooption, datasetid

Afterthought:

At the risk of adding complexity to what is otherwise a

simplification, my life could become even simpler if datavalueset also

had a categorycombo attribute, which would imply that a dataset was

linked to a formsectionid rather than a formid.

So a form has sections. sections have dataelements. And sections

have a datavalueset as a model - which implies a uniform categorycombo

within the section.

There isn’t really a need for dataelements to have a categorycombo.

And in lots of ways its good that they don’t. Then I am reducing

complexity rather than adding to it :slight_smile:

Consider one orgunit has collected malaria deaths disaggregated by

age. Another has collected values for the the same dataelement, but

not disaggregated by age. The datavalues will come from a

datavalueset so will have a categorycombo. It is possible to

aggregate or compare these datavalues,from different datavaluesets,

but using the lowest common denominator of categorycombo ie. in both

cases you have access to malaria deaths - in the one case you have to

“roll-up” the categorycombo which does of course assume that the sum

of category options make a sensible whole, but Ola has mentioned this

one many times.


Regards

Bob

We can relatively efficiently validate that a dataset object is not

persisted which has the same formid, orgunitid and an overlapping

period.

There is no longer any ambiguity about periodtype of a datavalue.

stored_by, timestamp, comment might go either way. Probably they need

to stay on datavalue. I notice comment is rarely used but its really

useful to have a comment on datavalueset for import purposes.

'nuff designing out loud. Got to go.

Regards

Bob

Ola


On Thu, May 20, 2010 at 11:44 AM, Ola Hodne Titlestad olatitle@gmail.com > > >>> wrote:

Hi,

After Kim Anh’s email about the use of the same data elements with

different period types I dug up this old discussion from March 2009.

What is the status on this work, or did we not conclude this?

Ola


2009/3/20 Bob Jolliffe bobjolliffe@gmail.com

2009/3/20 Lars Helge Øverland larshelge@gmail.com:

Yes this is true. But what do you think of the idea to enforce

DataSet membership having a default DataSet for all the delinquents?

I’m not sure if it can be enforced by the schema, but at least by the

application.

OK but what does this give us in terms of PeriodType-determining if

this

default DataSet has a null PeriodType?

Nothing really. The only effect would be you have an index on the

unassigned DataElements for what its worth. Mainly it would be useful

for determining easily the available DataElements which can be added

to a DataSet. Maybe its a nonsense idea - I was just trying to think

of ways to make editing DataSets reasonably straightforward.

I don’t know if its about right or wrong. There are pros and cons of

both approaches. What you gain on the swings you lose on the

roundabouts :slight_smile:

In the explicit case the application will have to enforce that

DataSet

members all have the same periodType.

In the implicit case the application will have to enforce that

DataElements can only be members of multiple groups if these share

the

same PeriodType.

The net result as far as the Data API is concerned can and must be

the

same. Perhaps we should define exactly what extra methods we want in

the API first. We have already identified a few. Then decide

whether

a database change is necessitated by these.

Yes. We need at least service method:

Collection getDataElementsByPeriodType( PeriodType )

and getter on the DataElement object:

PeriodType getPeriodType()

I guess we could make a branch, start coding and see how it works out.

Sure. So long as we are adding methods we won’t be breaking anything

in terms of backward compatibility. Just enforcing application level

constraints. Then we can really encourage (enforce?) upper layers to

strictly interact with the data via the API. Even if this might

occasionally mean making some lightweight API methods which bypass the

ORM.

Another issue would arise in the (exotic) situation where someone

assigns a

DataElement to a DataSet, enter data for it, then removes it from the

DataElement. The data is there, but how do we deal with it in regard

to the

mentioned required functionaly (trend analysis, datamart) ?

Yes this gets a bit weird (I presume you mean removes it from the

DataSet). I’m guessing you haven’t lost the data because the

dataValues each have a PeriodID which in turn is linked to a

PeriodType. I suppose that (in such an exotic headspace) DataElements

can in fact change their PeriodTypes over time, though I imagine its

not a great idea.

The effect would be the same in the explicit relationship case, if

someone assigns a DataElement to a DataSet, enter data for it, then

changes the PeriodType of the DataElement …

Cheers

Bob


Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Agree that this is really interesting and important - and I don't want
to complicate things further, but from the perspective of my
department, there is also a need (mostly pronounced at higher levels
like national, but not necessarily) to accommodate estimates and
adjustments in values and indicators. This is linked to completeness -
when you know data is missing, you still want to have a reasonable
figure for reports. As an example: DHIS may not used in hospitals,
where all cesarean deliveries are performed. Thus, a province or
ministry relying only on data from DHIS will report 0 for this
particular dataelement, which is obviously wrong. I guess adjusted
figures are technically a bit like targets, in terms of how they
relate to dataelements and datavalues?

Or does this topic rather belong in its own thread/blueprint?

Knut

···

On Sat, May 22, 2010 at 8:51 PM, Ola Hodne Titlestad <olatitle@gmail.com> wrote:

On 20 May 2010 18:39, Bob Jolliffe <bobjolliffe@gmail.com> wrote:

On 20 May 2010 15:56, Bob Jolliffe <bobjolliffe@gmail.com> wrote:
> 2010/5/20 Ola Hodne Titlestad <olatitle@gmail.com>:
>>
>> 2010/5/20 Lars Helge Øverland <larshelge@gmail.com>
>>>
>>> Data elements derive their period type from the data sets they are
>>> members
>>> of.
>
> Restated (what I just sent Lars only by mistake): a datavalue derives
> its period type from the data set of
> which its data element is a member :slight_smile:
>
>>
>> And when they are members of two datasets with different period types
>> they
>> have multiple period types right?
>
> It's important to remain aware that it is values ultimately which have
> periods (and hence period types).
>
> And when you look at a value you can derive its period type in one of
> two ways - via dataset or via period. Potentially these could
> disagree, The one which derives from its period should be considered
> authoritative ie. if the period is 2009-Jan then regardless of what
> the dataset might say this really must be monthly. Of course we hope
> these always agree. Incidentally the lookup from
> datelement-to-dataset-to-period looks like a greater complexity than
> the lookup from period->periodType.
>
>>
>> The key thing to look out for in data entry and data import is to avoid
>> overlaps in data values that will cause duplication when aggregating
>> data
>> periods.
>> E.g. if the SAME ORGUNIT registers values for the same data element for
>> two
>> different period types that have overlapping periods, e.g. Jan-10 and
>> Q1-10.
>> Then the aggregate values for Q1-10, Jan-June 2010, and 2010 will all
>> show
>> an incorrect value since the value for Jan-10 is counted twice.
>
> OK. Thats a good concrete constraint to have.
>
>>
>> One way to enforce this constraint is to monitor which datasets an
>> orgunit
>> is assigned to, and not allow orgunits to be assigned to two datasets
>> that
>> have the same data element AND different period types.
>
> Agreed, Though this constraint should probably be imposed on forms
> rather than datasets.
>
>>As far as I am aware,
>> we are not checking for this today. During data import it could be
>> checked
>> on data element level by looking up the period type the way Bob has
>> shown,
>> but that sounds like a lot of look ups and time consuming validation,
>> or?
>
> On data import we don't really validate at all, beyond whatever
> constraints the db imposes. For efficiency we simply pop the values in
> with multiple insert statement. So this validation would have to
> happen as a stage before the actual import or would have to be
> constrained within the db. In fact it can't be validated easily
> before the import as it is dependent on existing values within the db.
>
>>
>> A relatively normal use case that we probably have to find a way to
>> support,
>> and I think they are struggling with in Vietnam, is that different
>> provinces
>> can use different period types for the same data elements (even for
>> complete
>> data sets). E.g. if the national data flow policy says to report on
>> immunisation data every quarter, so that becomes the minimum
>> requirement for
>> all provinces. Then some of the provinces decide that all their
>> facilities
>> have to collect this data monthly anyway, and then at the province
>> level
>> they simply send the quarterly aggregates to national level (in the
>> paper-based or Excel world). At the same time other provinces just
>> collect
>> quarterly data at the facility level as in the minimum national
>> requirement.
>> At the national level there is a need to consolidate all this data,
>> even
>> data by the facility level, so ideally a national DHIS database should
>> be
>> able to store both monthly and quarterly raw data values for the same
>> data
>> elements, but for different orgunits. The national information users
>> can
>> then easily generate quarterly reports on immunisation for all
>> provinces,
>> while in some provinces they can do monthly data analysis if they want
>> to
>> collect data using that frequency.
>>
>> We support the above scenario by allowing the same data elements to be
>> assigned to different data sets with different period types, but we
>> don't
>> control for misuse of this flexibility which can lead to duplication
>> and
>> inconsistent aggregated data values as pointed out above.
>
> Thinking further ... I really think the problem arises because we we
> have a dataset concept which represents a form and is also used to
> constrain periodtypes on dataelements. Thinking of the use case you
> have just described, it should be the case that one can have a paper
> form which national level expect to collect quarterly, and the same
> form be used at a lower level to collect data monthly. If we wanted
> to mirror that use case electronically we would have to divorce the
> form from the periodtype - ie a form would collect datavalues of a
> certain period, but the same form could be used in different orgunits
> for collecting data at a different frequency..
>
> So (leaving dataset aside for the moment) if we can't assign a
> periodtype to a form and we can't assign to a dataelement and its too
> inefficient to validate on a one by one datavalue basis what is a girl
> to do?
>
> I suspect the correct answer is to refactor datavalue and create a
> datavalueset type - note: a set of datavalues rather than a set of
> dataelements. Designing out loud, a datavalueset would have the
> following fields/attributes:
>
> 1. a formid - the collection instrument used - roughly corresponds to
> current dataset
> 2. an orgunitid - where the datavalues come from
> 3. a periodid - the period of all the datavalues
> couple of other useful attributes I can think of
>
> Datavalue now becomes slightly simpler (which is always a good thing).
> It only has:
> value, dataelementid, categorycombooption, datasetid

Afterthought:
At the risk of adding complexity to what is otherwise a
simplification, my life could become even simpler if datavalueset also
had a categorycombo attribute, which would imply that a dataset was
linked to a formsectionid rather than a formid.

So a form has sections. sections have dataelements. And sections
have a datavalueset as a model - which implies a uniform categorycombo
within the section.

There isn't really a need for dataelements to have a categorycombo.
And in lots of ways its good that they don't. Then I am reducing
complexity rather than adding to it :slight_smile:

Consider one orgunit has collected malaria deaths disaggregated by
age. Another has collected values for the the same dataelement, but
not disaggregated by age. The datavalues will come from a
datavalueset so will have a categorycombo. It is possible to
aggregate or compare these datavalues,from different datavaluesets,
but using the lowest common denominator of categorycombo ie. in both
cases you have access to malaria deaths - in the one case you have to
"roll-up" the categorycombo which does of course assume that the sum
of category options make a sensible whole, but Ola has mentioned this
one many times.

Some really interesting ideas you are bringing up here Bob. I like the kind
of flexibility and yet structure this would bring to the data model.

>>
>>>
>>> Data elements derive their period type from the data sets they are
>>> members
>>> of.
>
> Restated (what I just sent Lars only by mistake): a datavalue derives
> its period type from the data set of
> which its data element is a member :slight_smile:
>
>>
>> And when they are members of two datasets with different period types
>> they
>> have multiple period types right?
>
> It's important to remain aware that it is values ultimately which have
> periods (and hence period types).
>
> And when you look at a value you can derive its period type in one of
> two ways - via dataset or via period. Potentially these could
> disagree, The one which derives from its period should be considered
> authoritative ie. if the period is 2009-Jan then regardless of what
> the dataset might say this really must be monthly. Of course we hope
> these always agree. Incidentally the lookup from
> datelement-to-dataset-to-period looks like a greater complexity than
> the lookup from period->periodType.
>
>>
>> The key thing to look out for in data entry and data import is to avoid
>> overlaps in data values that will cause duplication when aggregating
>> data
>> periods.
>> E.g. if the SAME ORGUNIT registers values for the same data element for
>> two
>> different period types that have overlapping periods, e.g. Jan-10 and
>> Q1-10.
>> Then the aggregate values for Q1-10, Jan-June 2010, and 2010 will all
>> show
>> an incorrect value since the value for Jan-10 is counted twice.
>
> OK. Thats a good concrete constraint to have.
>
>>
>> One way to enforce this constraint is to monitor which datasets an
>> orgunit
>> is assigned to, and not allow orgunits to be assigned to two datasets
>> that
>> have the same data element AND different period types.
>
> Agreed, Though this constraint should probably be imposed on forms
> rather than datasets.
>
>>As far as I am aware,
>> we are not checking for this today. During data import it could be
>> checked
>> on data element level by looking up the period type the way Bob has
>> shown,
>> but that sounds like a lot of look ups and time consuming validation,
>> or?
>
> On data import we don't really validate at all, beyond whatever
> constraints the db imposes. For efficiency we simply pop the values in
> with multiple insert statement. So this validation would have to
> happen as a stage before the actual import or would have to be
> constrained within the db. In fact it can't be validated easily
> before the import as it is dependent on existing values within the db.
>
>>
>> A relatively normal use case that we probably have to find a way to
>> support,
>> and I think they are struggling with in Vietnam, is that different
>> provinces
>> can use different period types for the same data elements (even for
>> complete
>> data sets). E.g. if the national data flow policy says to report on
>> immunisation data every quarter, so that becomes the minimum
>> requirement for
>> all provinces. Then some of the provinces decide that all their
>> facilities
>> have to collect this data monthly anyway, and then at the province
>> level
>> they simply send the quarterly aggregates to national level (in the
>> paper-based or Excel world). At the same time other provinces just
>> collect
>> quarterly data at the facility level as in the minimum national
>> requirement.
>> At the national level there is a need to consolidate all this data,
>> even
>> data by the facility level, so ideally a national DHIS database should
>> be
>> able to store both monthly and quarterly raw data values for the same
>> data
>> elements, but for different orgunits. The national information users
>> can
>> then easily generate quarterly reports on immunisation for all
>> provinces,
>> while in some provinces they can do monthly data analysis if they want
>> to
>> collect data using that frequency.
>>
>> We support the above scenario by allowing the same data elements to be
>> assigned to different data sets with different period types, but we
>> don't
>> control for misuse of this flexibility which can lead to duplication
>> and
>> inconsistent aggregated data values as pointed out above.
>
> Thinking further ... I really think the problem arises because we we
> have a dataset concept which represents a form and is also used to
> constrain periodtypes on dataelements. Thinking of the use case you
> have just described, it should be the case that one can have a paper
> form which national level expect to collect quarterly, and the same
> form be used at a lower level to collect data monthly. If we wanted
> to mirror that use case electronically we would have to divorce the
> form from the periodtype - ie a form would collect datavalues of a
> certain period, but the same form could be used in different orgunits
> for collecting data at a different frequency..
>
> So (leaving dataset aside for the moment) if we can't assign a
> periodtype to a form and we can't assign to a dataelement and its too
> inefficient to validate on a one by one datavalue basis what is a girl
> to do?
>
> I suspect the correct answer is to refactor datavalue and create a
> datavalueset type - note: a set of datavalues rather than a set of
> dataelements. Designing out loud, a datavalueset would have the
> following fields/attributes:
>
> 1. a formid - the collection instrument used - roughly corresponds to
> current dataset
> 2. an orgunitid - where the datavalues come from
> 3. a periodid - the period of all the datavalues
> couple of other useful attributes I can think of
>
> Datavalue now becomes slightly simpler (which is always a good thing).
> It only has:
> value, dataelementid, categorycombooption, datasetid

Afterthought:
At the risk of adding complexity to what is otherwise a
simplification, my life could become even simpler if datavalueset also
had a categorycombo attribute, which would imply that a dataset was
linked to a formsectionid rather than a formid.

So a form has sections. sections have dataelements. And sections
have a datavalueset as a model - which implies a uniform categorycombo
within the section.

There isn't really a need for dataelements to have a categorycombo.
And in lots of ways its good that they don't. Then I am reducing
complexity rather than adding to it :slight_smile:

Consider one orgunit has collected malaria deaths disaggregated by
age. Another has collected values for the the same dataelement, but
not disaggregated by age. The datavalues will come from a
datavalueset so will have a categorycombo. It is possible to
aggregate or compare these datavalues,from different datavaluesets,
but using the lowest common denominator of categorycombo ie. in both
cases you have access to malaria deaths - in the one case you have to
"roll-up" the categorycombo which does of course assume that the sum
of category options make a sensible whole, but Ola has mentioned this
one many times.

Some really interesting ideas you are bringing up here Bob. I like the kind
of flexibility and yet structure this would bring to the data model.

One quick question though:
How would this fit with the use of data elements and categorycombooptions in
metadata expressions like indicators and validation rules that are (and
should be) completely independent from data collection structures? E.g.
which categories and options should be available for a given data element
when setting up an indicator formula? All?

I think its a question of the "lowest common denominator" of the
datavalues that you have. Indicators are calculated from datavalues
even though we express the calculation in terms of dataelements.

Ivalue = f(de1,de2,de3...)/g(de4, de5 ..)

Looking just at the numerator - if the set of datavalues you have
corresponding to de1, de2 and de3 share the same categorycombo (and
note that datavalues do have a categorycombo from which their
categoryoptioncombo is derived) , then you can also produce a
similalrly disaggregated indicator value.

If they use different categorycombos (some have age+sex, some have
hiv_age+sex, and some have just sex), but each of these have at least
the sex category, then you could produce an indicator value
disaggregated by sex.

If the categorycombos are a jumble of apples and pears then you can
produce just the rolled up calculation.

What is the implication? At design time, when you are coding the
expression, you probably should not include the categoryoptioncombo at
all. The indicator is just expressed in terms of dataelements (I
guess traditional DHIS14 style). But when you are generating for
example, the reporttable, the first pass analyzes the data you have
selected and suggests - would you like the indicator data
disaggregated by sex? Or age+sex? Or no disaggregation. So what you
can report on is determined by the data you've got. I think that's a
sound principle.

And I think all of this is completely independent of data collection structures.

Of course in practice you will have designed and deployed your
collection instruments such that all your datavalues for a given
dataelement will have the same categorycombo. But if you want to
compare data over the past five years, and the ministry decided only
in year two that they wanted to disaggregate by sex and in year 4
decided to introduce a third sex category, then you could still
calculate an indicator from all of those datavalues - but by rolling
up sex category.

I think what we do currently - specifying the categorycombo in the
indicator expression - is more rigid and more fragile.

In summary, what we have with categorycombos etc is really quite
brilliant. We don't have ragged data. Our datavalues are stored
compactly and uniformly. All this is great. I think a mistake we may
have made is attaching categorycombo to the dataelement. The
relationship between a categorycombo and a dataelement can and should
be a transient thing. I believe the categorycombo should be a
characteristic of the way we collect the particular datavalues ie. a
characteristic of a particular form. There is a long conversation
before where it emerged that part of the original design rationale of
the categorycombo was indeed related to form layout. At the time this
upset me a bit, because I too had bought into the rigid edifice we had
created. But in retrospect I think this thinking was absolutely on
the right track. Using the categorycombo to specify the
disaggregation layout of a particular form elements makes very good
sense. What was also inspired was having the categorycombo as a named
persisted object in its own right which could be used across different
dataelements.

Cheers
Bob

···

On 22 May 2010 19:51, Ola Hodne Titlestad <olatitle@gmail.com> wrote:

On 20 May 2010 18:39, Bob Jolliffe <bobjolliffe@gmail.com> wrote:

On 20 May 2010 15:56, Bob Jolliffe <bobjolliffe@gmail.com> wrote:
> 2010/5/20 Ola Hodne Titlestad <olatitle@gmail.com>:
>> 2010/5/20 Lars Helge Øverland <larshelge@gmail.com>

Ola
--------

Regards
Bob

>
> We can relatively efficiently validate that a dataset object is not
> persisted which has the same formid, orgunitid and an overlapping
> period.
>
> There is no longer any ambiguity about periodtype of a datavalue.
>
> stored_by, timestamp, comment might go either way. Probably they need
> to stay on datavalue. I notice comment is rarely used but its really
> useful to have a comment on datavalueset for import purposes.
>
> 'nuff designing out loud. Got to go.
>
> Regards
> Bob
>
>>
>>
>> Ola
>> ---------
>>
>>>
>>> On Thu, May 20, 2010 at 11:44 AM, Ola Hodne Titlestad >> >>> <olatitle@gmail.com> >> >>> wrote:
>>>>
>>>> Hi,
>>>>
>>>> After Kim Anh's email about the use of the same data elements with
>>>> different period types I dug up this old discussion from March 2009.
>>>>
>>>> What is the status on this work, or did we not conclude this?
>>>>
>>>> Ola
>>>> ----------
>>>>
>>>> 2009/3/20 Bob Jolliffe <bobjolliffe@gmail.com>
>>>>>
>>>>> 2009/3/20 Lars Helge Øverland <larshelge@gmail.com>:
>>>>> >
>>>>> >>
>>>>> >> Yes this is true. But what do you think of the idea to enforce
>>>>> >> DataSet membership having a default DataSet for all the
>>>>> >> delinquents?
>>>>> >> I'm not sure if it can be enforced by the schema, but at least by
>>>>> >> the
>>>>> >> application.
>>>>> >
>>>>> > OK but what does this give us in terms of PeriodType-determining
>>>>> > if
>>>>> > this
>>>>> > default DataSet has a null PeriodType?
>>>>>
>>>>> Nothing really. The only effect would be you have an index on the
>>>>> unassigned DataElements for what its worth. Mainly it would be
>>>>> useful
>>>>> for determining easily the available DataElements which can be added
>>>>> to a DataSet. Maybe its a nonsense idea - I was just trying to
>>>>> think
>>>>> of ways to make editing DataSets reasonably straightforward.
>>>>>
>>>>> >
>>>>> >>
>>>>> >> I don't know if its about right or wrong. There are pros and
>>>>> >> cons of
>>>>> >> both approaches. What you gain on the swings you lose on the
>>>>> >> roundabouts :slight_smile:
>>>>> >>
>>>>> >> In the explicit case the application will have to enforce that
>>>>> >> DataSet
>>>>> >> members all have the same periodType.
>>>>> >>
>>>>> >> In the implicit case the application will have to enforce that
>>>>> >> DataElements can only be members of multiple groups if these
>>>>> >> share
>>>>> >> the
>>>>> >> same PeriodType.
>>>>> >>
>>>>> >> The net result as far as the Data API is concerned can and must
>>>>> >> be
>>>>> >> the
>>>>> >> same. Perhaps we should define exactly what extra methods we
>>>>> >> want in
>>>>> >> the API first. We have already identified a few. Then decide
>>>>> >> whether
>>>>> >> a database change is necessitated by these.
>>>>> >
>>>>> > Yes. We need at least service method:
>>>>> >
>>>>> > Collection<DataElement> getDataElementsByPeriodType( PeriodType )
>>>>> >
>>>>> > and getter on the DataElement object:
>>>>> >
>>>>> > PeriodType getPeriodType()
>>>>> >
>>>>> >
>>>>> > I guess we could make a branch, start coding and see how it works
>>>>> > out.
>>>>>
>>>>> Sure. So long as we are adding methods we won't be breaking
>>>>> anything
>>>>> in terms of backward compatibility. Just enforcing application
>>>>> level
>>>>> constraints. Then we can really encourage (enforce?) upper layers
>>>>> to
>>>>> strictly interact with the data via the API. Even if this might
>>>>> occasionally mean making some lightweight API methods which bypass
>>>>> the
>>>>> ORM.
>>>>>
>>>>> >
>>>>> > Another issue would arise in the (exotic) situation where someone
>>>>> > assigns a
>>>>> > DataElement to a DataSet, enter data for it, then removes it from
>>>>> > the
>>>>> > DataElement. The data is there, but how do we deal with it in
>>>>> > regard
>>>>> > to the
>>>>> > mentioned required functionaly (trend analysis, datamart) ?
>>>>> >
>>>>>
>>>>> Yes this gets a bit weird (I presume you mean removes it from the
>>>>> DataSet). I'm guessing you haven't lost the data because the
>>>>> dataValues each have a PeriodID which in turn is linked to a
>>>>> PeriodType. I suppose that (in such an exotic headspace)
>>>>> DataElements
>>>>> can in fact change their PeriodTypes over time, though I imagine its
>>>>> not a great idea.
>>>>>
>>>>> The effect would be the same in the explicit relationship case, if
>>>>> someone assigns a DataElement to a DataSet, enter data for it, then
>>>>> changes the PeriodType of the DataElement ...
>>>>>
>>>>> Cheers
>>>>> Bob
>>>>>
>>>>> _______________________________________________
>>>>> Mailing list: https://launchpad.net/~dhis2-devs
>>>>> Post to : dhis2-devs@lists.launchpad.net
>>>>> Unsubscribe : https://launchpad.net/~dhis2-devs
>>>>> More help : https://help.launchpad.net/ListHelp
>>>>
>>>
>>
>>
>

Data elements derive their period type from the data sets they are

members

of.

Restated (what I just sent Lars only by mistake): a datavalue derives

its period type from the data set of

which its data element is a member :slight_smile:

And when they are members of two datasets with different period types

they

have multiple period types right?

It’s important to remain aware that it is values ultimately which have

periods (and hence period types).

And when you look at a value you can derive its period type in one of

two ways - via dataset or via period. Potentially these could

disagree, The one which derives from its period should be considered

authoritative ie. if the period is 2009-Jan then regardless of what

the dataset might say this really must be monthly. Of course we hope

these always agree. Incidentally the lookup from

datelement-to-dataset-to-period looks like a greater complexity than

the lookup from period->periodType.

The key thing to look out for in data entry and data import is to avoid

overlaps in data values that will cause duplication when aggregating

data

periods.

E.g. if the SAME ORGUNIT registers values for the same data element for

two

different period types that have overlapping periods, e.g. Jan-10 and

Q1-10.

Then the aggregate values for Q1-10, Jan-June 2010, and 2010 will all

show

an incorrect value since the value for Jan-10 is counted twice.

OK. Thats a good concrete constraint to have.

One way to enforce this constraint is to monitor which datasets an

orgunit

is assigned to, and not allow orgunits to be assigned to two datasets

that

have the same data element AND different period types.

Agreed, Though this constraint should probably be imposed on forms

rather than datasets.

As far as I am aware,

we are not checking for this today. During data import it could be

checked

on data element level by looking up the period type the way Bob has

shown,

but that sounds like a lot of look ups and time consuming validation,

or?

On data import we don’t really validate at all, beyond whatever

constraints the db imposes. For efficiency we simply pop the values in

with multiple insert statement. So this validation would have to

happen as a stage before the actual import or would have to be

constrained within the db. In fact it can’t be validated easily

before the import as it is dependent on existing values within the db.

A relatively normal use case that we probably have to find a way to

support,

and I think they are struggling with in Vietnam, is that different

provinces

can use different period types for the same data elements (even for

complete

data sets). E.g. if the national data flow policy says to report on

immunisation data every quarter, so that becomes the minimum

requirement for

all provinces. Then some of the provinces decide that all their

facilities

have to collect this data monthly anyway, and then at the province

level

they simply send the quarterly aggregates to national level (in the

paper-based or Excel world). At the same time other provinces just

collect

quarterly data at the facility level as in the minimum national

requirement.

At the national level there is a need to consolidate all this data,

even

data by the facility level, so ideally a national DHIS database should

be

able to store both monthly and quarterly raw data values for the same

data

elements, but for different orgunits. The national information users

can

then easily generate quarterly reports on immunisation for all

provinces,

while in some provinces they can do monthly data analysis if they want

to

collect data using that frequency.

We support the above scenario by allowing the same data elements to be

assigned to different data sets with different period types, but we

don’t

control for misuse of this flexibility which can lead to duplication

and

inconsistent aggregated data values as pointed out above.

Thinking further … I really think the problem arises because we we

have a dataset concept which represents a form and is also used to

constrain periodtypes on dataelements. Thinking of the use case you

have just described, it should be the case that one can have a paper

form which national level expect to collect quarterly, and the same

form be used at a lower level to collect data monthly. If we wanted

to mirror that use case electronically we would have to divorce the

form from the periodtype - ie a form would collect datavalues of a

certain period, but the same form could be used in different orgunits

for collecting data at a different frequency…

So (leaving dataset aside for the moment) if we can’t assign a

periodtype to a form and we can’t assign to a dataelement and its too

inefficient to validate on a one by one datavalue basis what is a girl

to do?

I suspect the correct answer is to refactor datavalue and create a

datavalueset type - note: a set of datavalues rather than a set of

dataelements. Designing out loud, a datavalueset would have the

following fields/attributes:

  1. a formid - the collection instrument used - roughly corresponds to

current dataset

  1. an orgunitid - where the datavalues come from
  1. a periodid - the period of all the datavalues

couple of other useful attributes I can think of

Datavalue now becomes slightly simpler (which is always a good thing).

It only has:

value, dataelementid, categorycombooption, datasetid

Afterthought:

At the risk of adding complexity to what is otherwise a

simplification, my life could become even simpler if datavalueset also

had a categorycombo attribute, which would imply that a dataset was

linked to a formsectionid rather than a formid.

So a form has sections. sections have dataelements. And sections

have a datavalueset as a model - which implies a uniform categorycombo

within the section.

There isn’t really a need for dataelements to have a categorycombo.

And in lots of ways its good that they don’t. Then I am reducing

complexity rather than adding to it :slight_smile:

Consider one orgunit has collected malaria deaths disaggregated by

age. Another has collected values for the the same dataelement, but

not disaggregated by age. The datavalues will come from a

datavalueset so will have a categorycombo. It is possible to

aggregate or compare these datavalues,from different datavaluesets,

but using the lowest common denominator of categorycombo ie. in both

cases you have access to malaria deaths - in the one case you have to

“roll-up” the categorycombo which does of course assume that the sum

of category options make a sensible whole, but Ola has mentioned this

one many times.

Some really interesting ideas you are bringing up here Bob. I like the kind

of flexibility and yet structure this would bring to the data model.

One quick question though:

How would this fit with the use of data elements and categorycombooptions in

metadata expressions like indicators and validation rules that are (and

should be) completely independent from data collection structures? E.g.

which categories and options should be available for a given data element

when setting up an indicator formula? All?

I think its a question of the “lowest common denominator” of the

datavalues that you have. Indicators are calculated from datavalues

even though we express the calculation in terms of dataelements.

Ivalue = f(de1,de2,de3…)/g(de4, de5 …)

Looking just at the numerator - if the set of datavalues you have

corresponding to de1, de2 and de3 share the same categorycombo (and

note that datavalues do have a categorycombo from which their

categoryoptioncombo is derived) , then you can also produce a

similalrly disaggregated indicator value.

If they use different categorycombos (some have age+sex, some have

hiv_age+sex, and some have just sex), but each of these have at least

the sex category, then you could produce an indicator value

disaggregated by sex.

If the categorycombos are a jumble of apples and pears then you can

produce just the rolled up calculation.

I like this idea.

What is the implication? At design time, when you are coding the

expression, you probably should not include the categoryoptioncombo at

all. The indicator is just expressed in terms of dataelements (I

guess traditional DHIS14 style). But when you are generating for

example, the reporttable, the first pass analyzes the data you have

selected and suggests - would you like the indicator data

disaggregated by sex? Or age+sex? Or no disaggregation. So what you

can report on is determined by the data you’ve got. I think that’s a

sound principle.

I can see a few challenges with this principle. In typical implementations of DHIS you would design forms and canned/fixed reports at the same time before rolling out the installations. If it is impossible to design reports before you have any data values I can see a problem with this approach. But I guess you would know, from the forms information the potential datavaluesets and therefore could allow some disaggregated reports to be prepared even before you have any data values?

Another issue I would like to bring up is performance. In the past we have struggled with and spent a lot of time on improving the performance of the datamart, the aggregation of data values. To me it sounds more complicated to have a floating set of disaggregations that needs to be looked up in a potentially huge storage of datavalues compared to working with a fixed set. Any thoughts on data mart service performance with this proposed design compared to the existing one?

And I think all of this is completely independent of data collection structures.

Of course in practice you will have designed and deployed your

collection instruments such that all your datavalues for a given

dataelement will have the same categorycombo. But if you want to

compare data over the past five years, and the ministry decided only

in year two that they wanted to disaggregate by sex and in year 4

decided to introduce a third sex category, then you could still

calculate an indicator from all of those datavalues - but by rolling

up sex category.

I think what we do currently - specifying the categorycombo in the

indicator expression - is more rigid and more fragile.

Agree, and I think most indicators analysis will be on the data element level anyway (without any disaggregations), so the current design is too complicated and cumbersome to work with.

Ola

···

On 23 May 2010 03:32, Bob Jolliffe bobjolliffe@gmail.com wrote:

On 22 May 2010 19:51, Ola Hodne Titlestad olatitle@gmail.com wrote:

On 20 May 2010 18:39, Bob Jolliffe bobjolliffe@gmail.com wrote:

On 20 May 2010 15:56, Bob Jolliffe bobjolliffe@gmail.com wrote:

2010/5/20 Ola Hodne Titlestad olatitle@gmail.com:

2010/5/20 Lars Helge Øverland larshelge@gmail.com


In summary, what we have with categorycombos etc is really quite

brilliant. We don’t have ragged data. Our datavalues are stored

compactly and uniformly. All this is great. I think a mistake we may

have made is attaching categorycombo to the dataelement. The

relationship between a categorycombo and a dataelement can and should

be a transient thing. I believe the categorycombo should be a

characteristic of the way we collect the particular datavalues ie. a

characteristic of a particular form. There is a long conversation

before where it emerged that part of the original design rationale of

the categorycombo was indeed related to form layout. At the time this

upset me a bit, because I too had bought into the rigid edifice we had

created. But in retrospect I think this thinking was absolutely on

the right track. Using the categorycombo to specify the

disaggregation layout of a particular form elements makes very good

sense. What was also inspired was having the categorycombo as a named

persisted object in its own right which could be used across different

dataelements.

Cheers

Bob

Ola


Regards

Bob

We can relatively efficiently validate that a dataset object is not

persisted which has the same formid, orgunitid and an overlapping

period.

There is no longer any ambiguity about periodtype of a datavalue.

stored_by, timestamp, comment might go either way. Probably they need

to stay on datavalue. I notice comment is rarely used but its really

useful to have a comment on datavalueset for import purposes.

'nuff designing out loud. Got to go.

Regards

Bob

Ola


On Thu, May 20, 2010 at 11:44 AM, Ola Hodne Titlestad > > >> >>> olatitle@gmail.com > > >> >>> wrote:

Hi,

After Kim Anh’s email about the use of the same data elements with

different period types I dug up this old discussion from March 2009.

What is the status on this work, or did we not conclude this?

Ola


2009/3/20 Bob Jolliffe bobjolliffe@gmail.com

2009/3/20 Lars Helge Øverland larshelge@gmail.com:

Yes this is true. But what do you think of the idea to enforce

DataSet membership having a default DataSet for all the

delinquents?

I’m not sure if it can be enforced by the schema, but at least by

the

application.

OK but what does this give us in terms of PeriodType-determining

if

this

default DataSet has a null PeriodType?

Nothing really. The only effect would be you have an index on the

unassigned DataElements for what its worth. Mainly it would be

useful

for determining easily the available DataElements which can be added

to a DataSet. Maybe its a nonsense idea - I was just trying to

think

of ways to make editing DataSets reasonably straightforward.

I don’t know if its about right or wrong. There are pros and

cons of

both approaches. What you gain on the swings you lose on the

roundabouts :slight_smile:

In the explicit case the application will have to enforce that

DataSet

members all have the same periodType.

In the implicit case the application will have to enforce that

DataElements can only be members of multiple groups if these

share

the

same PeriodType.

The net result as far as the Data API is concerned can and must

be

the

same. Perhaps we should define exactly what extra methods we

want in

the API first. We have already identified a few. Then decide

whether

a database change is necessitated by these.

Yes. We need at least service method:

Collection getDataElementsByPeriodType( PeriodType )

and getter on the DataElement object:

PeriodType getPeriodType()

I guess we could make a branch, start coding and see how it works

out.

Sure. So long as we are adding methods we won’t be breaking

anything

in terms of backward compatibility. Just enforcing application

level

constraints. Then we can really encourage (enforce?) upper layers

to

strictly interact with the data via the API. Even if this might

occasionally mean making some lightweight API methods which bypass

the

ORM.

Another issue would arise in the (exotic) situation where someone

assigns a

DataElement to a DataSet, enter data for it, then removes it from

the

DataElement. The data is there, but how do we deal with it in

regard

to the

mentioned required functionaly (trend analysis, datamart) ?

Yes this gets a bit weird (I presume you mean removes it from the

DataSet). I’m guessing you haven’t lost the data because the

dataValues each have a PeriodID which in turn is linked to a

PeriodType. I suppose that (in such an exotic headspace)

DataElements

can in fact change their PeriodTypes over time, though I imagine its

not a great idea.

The effect would be the same in the explicit relationship case, if

someone assigns a DataElement to a DataSet, enter data for it, then

changes the PeriodType of the DataElement …

Cheers

Bob


Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

We definitely need something that is manageable, both in terms of
understanding and performance. But looking at the GHO and thinking
about National Health Observatories (for which I think DHIS2 is quite
suited), people definitely want breakdowns by at least the "standard"
dimensions of age and sex.

Knut

PS: I think it can be a good practice in these long threads to snip
out some parts of the emails that are no longer needed for where the
discussion has gone (such as everything below your signature),
otherwise it becomes hard to reply and read, even in good clients like
Gmail.

···

On Sun, May 23, 2010 at 9:36 AM, Ola Hodne Titlestad <olatitle@gmail.com> wrote:

On 23 May 2010 03:32, Bob Jolliffe <bobjolliffe@gmail.com> wrote:

On 22 May 2010 19:51, Ola Hodne Titlestad <olatitle@gmail.com> wrote:
> On 20 May 2010 18:39, Bob Jolliffe <bobjolliffe@gmail.com> wrote:
>>
>> On 20 May 2010 15:56, Bob Jolliffe <bobjolliffe@gmail.com> wrote:
>> > 2010/5/20 Ola Hodne Titlestad <olatitle@gmail.com>:
>> >>
>> >> 2010/5/20 Lars Helge Øverland <larshelge@gmail.com>
>> >>>
>> >>> Data elements derive their period type from the data sets they are
>> >>> members
>> >>> of.
>> >
>> > Restated (what I just sent Lars only by mistake): a datavalue
>> > derives
>> > its period type from the data set of
>> > which its data element is a member :slight_smile:
>> >
>> >>
>> >> And when they are members of two datasets with different period
>> >> types
>> >> they
>> >> have multiple period types right?
>> >
>> > It's important to remain aware that it is values ultimately which
>> > have
>> > periods (and hence period types).
>> >
>> > And when you look at a value you can derive its period type in one of
>> > two ways - via dataset or via period. Potentially these could
>> > disagree, The one which derives from its period should be considered
>> > authoritative ie. if the period is 2009-Jan then regardless of what
>> > the dataset might say this really must be monthly. Of course we hope
>> > these always agree. Incidentally the lookup from
>> > datelement-to-dataset-to-period looks like a greater complexity than
>> > the lookup from period->periodType.
>> >
>> >>
>> >> The key thing to look out for in data entry and data import is to
>> >> avoid
>> >> overlaps in data values that will cause duplication when aggregating
>> >> data
>> >> periods.
>> >> E.g. if the SAME ORGUNIT registers values for the same data element
>> >> for
>> >> two
>> >> different period types that have overlapping periods, e.g. Jan-10
>> >> and
>> >> Q1-10.
>> >> Then the aggregate values for Q1-10, Jan-June 2010, and 2010 will
>> >> all
>> >> show
>> >> an incorrect value since the value for Jan-10 is counted twice.
>> >
>> > OK. Thats a good concrete constraint to have.
>> >
>> >>
>> >> One way to enforce this constraint is to monitor which datasets an
>> >> orgunit
>> >> is assigned to, and not allow orgunits to be assigned to two
>> >> datasets
>> >> that
>> >> have the same data element AND different period types.
>> >
>> > Agreed, Though this constraint should probably be imposed on forms
>> > rather than datasets.
>> >
>> >>As far as I am aware,
>> >> we are not checking for this today. During data import it could be
>> >> checked
>> >> on data element level by looking up the period type the way Bob has
>> >> shown,
>> >> but that sounds like a lot of look ups and time consuming
>> >> validation,
>> >> or?
>> >
>> > On data import we don't really validate at all, beyond whatever
>> > constraints the db imposes. For efficiency we simply pop the values
>> > in
>> > with multiple insert statement. So this validation would have to
>> > happen as a stage before the actual import or would have to be
>> > constrained within the db. In fact it can't be validated easily
>> > before the import as it is dependent on existing values within the
>> > db.
>> >
>> >>
>> >> A relatively normal use case that we probably have to find a way to
>> >> support,
>> >> and I think they are struggling with in Vietnam, is that different
>> >> provinces
>> >> can use different period types for the same data elements (even for
>> >> complete
>> >> data sets). E.g. if the national data flow policy says to report on
>> >> immunisation data every quarter, so that becomes the minimum
>> >> requirement for
>> >> all provinces. Then some of the provinces decide that all their
>> >> facilities
>> >> have to collect this data monthly anyway, and then at the province
>> >> level
>> >> they simply send the quarterly aggregates to national level (in the
>> >> paper-based or Excel world). At the same time other provinces just
>> >> collect
>> >> quarterly data at the facility level as in the minimum national
>> >> requirement.
>> >> At the national level there is a need to consolidate all this data,
>> >> even
>> >> data by the facility level, so ideally a national DHIS database
>> >> should
>> >> be
>> >> able to store both monthly and quarterly raw data values for the
>> >> same
>> >> data
>> >> elements, but for different orgunits. The national information users
>> >> can
>> >> then easily generate quarterly reports on immunisation for all
>> >> provinces,
>> >> while in some provinces they can do monthly data analysis if they
>> >> want
>> >> to
>> >> collect data using that frequency.
>> >>
>> >> We support the above scenario by allowing the same data elements to
>> >> be
>> >> assigned to different data sets with different period types, but we
>> >> don't
>> >> control for misuse of this flexibility which can lead to duplication
>> >> and
>> >> inconsistent aggregated data values as pointed out above.
>> >
>> > Thinking further ... I really think the problem arises because we we
>> > have a dataset concept which represents a form and is also used to
>> > constrain periodtypes on dataelements. Thinking of the use case you
>> > have just described, it should be the case that one can have a paper
>> > form which national level expect to collect quarterly, and the same
>> > form be used at a lower level to collect data monthly. If we wanted
>> > to mirror that use case electronically we would have to divorce the
>> > form from the periodtype - ie a form would collect datavalues of a
>> > certain period, but the same form could be used in different orgunits
>> > for collecting data at a different frequency..
>> >
>> > So (leaving dataset aside for the moment) if we can't assign a
>> > periodtype to a form and we can't assign to a dataelement and its too
>> > inefficient to validate on a one by one datavalue basis what is a
>> > girl
>> > to do?
>> >
>> > I suspect the correct answer is to refactor datavalue and create a
>> > datavalueset type - note: a set of datavalues rather than a set of
>> > dataelements. Designing out loud, a datavalueset would have the
>> > following fields/attributes:
>> >
>> > 1. a formid - the collection instrument used - roughly corresponds
>> > to
>> > current dataset
>> > 2. an orgunitid - where the datavalues come from
>> > 3. a periodid - the period of all the datavalues
>> > couple of other useful attributes I can think of
>> >
>> > Datavalue now becomes slightly simpler (which is always a good
>> > thing).
>> > It only has:
>> > value, dataelementid, categorycombooption, datasetid
>>
>> Afterthought:
>> At the risk of adding complexity to what is otherwise a
>> simplification, my life could become even simpler if datavalueset also
>> had a categorycombo attribute, which would imply that a dataset was
>> linked to a formsectionid rather than a formid.
>>
>> So a form has sections. sections have dataelements. And sections
>> have a datavalueset as a model - which implies a uniform categorycombo
>> within the section.
>>
>> There isn't really a need for dataelements to have a categorycombo.
>> And in lots of ways its good that they don't. Then I am reducing
>> complexity rather than adding to it :slight_smile:
>>
>> Consider one orgunit has collected malaria deaths disaggregated by
>> age. Another has collected values for the the same dataelement, but
>> not disaggregated by age. The datavalues will come from a
>> datavalueset so will have a categorycombo. It is possible to
>> aggregate or compare these datavalues,from different datavaluesets,
>> but using the lowest common denominator of categorycombo ie. in both
>> cases you have access to malaria deaths - in the one case you have to
>> "roll-up" the categorycombo which does of course assume that the sum
>> of category options make a sensible whole, but Ola has mentioned this
>> one many times.
>>
>
> Some really interesting ideas you are bringing up here Bob. I like the
> kind
> of flexibility and yet structure this would bring to the data model.
>
> One quick question though:
> How would this fit with the use of data elements and
> categorycombooptions in
> metadata expressions like indicators and validation rules that are (and
> should be) completely independent from data collection structures? E.g.
> which categories and options should be available for a given data
> element
> when setting up an indicator formula? All?

I think its a question of the "lowest common denominator" of the
datavalues that you have. Indicators are calculated from datavalues
even though we express the calculation in terms of dataelements.

Ivalue = f(de1,de2,de3...)/g(de4, de5 ..)

Looking just at the numerator - if the set of datavalues you have
corresponding to de1, de2 and de3 share the same categorycombo (and
note that datavalues do have a categorycombo from which their
categoryoptioncombo is derived) , then you can also produce a
similalrly disaggregated indicator value.

If they use different categorycombos (some have age+sex, some have
hiv_age+sex, and some have just sex), but each of these have at least
the sex category, then you could produce an indicator value
disaggregated by sex.

If the categorycombos are a jumble of apples and pears then you can
produce just the rolled up calculation.

I like this idea.

What is the implication? At design time, when you are coding the
expression, you probably should not include the categoryoptioncombo at
all. The indicator is just expressed in terms of dataelements (I
guess traditional DHIS14 style). But when you are generating for
example, the reporttable, the first pass analyzes the data you have
selected and suggests - would you like the indicator data
disaggregated by sex? Or age+sex? Or no disaggregation. So what you
can report on is determined by the data you've got. I think that's a
sound principle.

I can see a few challenges with this principle. In typical implementations
of DHIS you would design forms and canned/fixed reports at the same time
before rolling out the installations. If it is impossible to design reports
before you have any data values I can see a problem with this approach. But
I guess you would know, from the forms information the potential
datavaluesets and therefore could allow some disaggregated reports to be
prepared even before you have any data values?

Another issue I would like to bring up is performance. In the past we have
struggled with and spent a lot of time on improving the performance of the
datamart, the aggregation of data values. To me it sounds more complicated
to have a floating set of disaggregations that needs to be looked up in a
potentially huge storage of datavalues compared to working with a fixed set.
Any thoughts on data mart service performance with this proposed design
compared to the existing one?

And I think all of this is completely independent of data collection
structures.

Of course in practice you will have designed and deployed your
collection instruments such that all your datavalues for a given
dataelement will have the same categorycombo. But if you want to
compare data over the past five years, and the ministry decided only
in year two that they wanted to disaggregate by sex and in year 4
decided to introduce a third sex category, then you could still
calculate an indicator from all of those datavalues - but by rolling
up sex category.

I think what we do currently - specifying the categorycombo in the
indicator expression - is more rigid and more fragile.

Agree, and I think most indicators analysis will be on the data element
level anyway (without any disaggregations), so the current design is too
complicated and cumbersome to work with.

I’m not sure I see the “principled” difference here. Changing the workflow to require categoryoptioncombo specification at report generation time sounds interesting, but, as Ola says, you would not want to require human interaction for every report generation. So then we end up with more a case of workflow changes and storing the same information in a different way? This could give more automated support for doing reporting that today is hard to manage manually, but would also give a more complex data model and workflow to keep track of.

Still interesting, though :slight_smile:

Jo

···

Den 23. mai 2010 kl. 09.36 skrev Ola Hodne Titlestad:

What is the implication? At design time, when you are coding the
expression, you probably should not include the categoryoptioncombo at
all. The indicator is just expressed in terms of dataelements (I
guess traditional DHIS14 style). But when you are generating for
example, the reporttable, the first pass analyzes the data you have
selected and suggests - would you like the indicator data
disaggregated by sex? Or age+sex? Or no disaggregation. So what you
can report on is determined by the data you’ve got. I think that’s a
sound principle.

I can see a few challenges with this principle. In typical implementations of DHIS you would design forms and canned/fixed reports at the same time before rolling out the installations. If it is impossible to design reports before you have any data values I can see a problem with this approach. But I guess you would know, from the forms information the potential datavaluesets and therefore could allow some disaggregated reports to be prepared even before you have any data values?

>>
>> >>
>> >>>
>> >>> Data elements derive their period type from the data sets they are
>> >>> members
>> >>> of.
>> >
>> > Restated (what I just sent Lars only by mistake): a datavalue
>> > derives
>> > its period type from the data set of
>> > which its data element is a member :slight_smile:
>> >
>> >>
>> >> And when they are members of two datasets with different period
>> >> types
>> >> they
>> >> have multiple period types right?
>> >
>> > It's important to remain aware that it is values ultimately which
>> > have
>> > periods (and hence period types).
>> >
>> > And when you look at a value you can derive its period type in one of
>> > two ways - via dataset or via period. Potentially these could
>> > disagree, The one which derives from its period should be considered
>> > authoritative ie. if the period is 2009-Jan then regardless of what
>> > the dataset might say this really must be monthly. Of course we hope
>> > these always agree. Incidentally the lookup from
>> > datelement-to-dataset-to-period looks like a greater complexity than
>> > the lookup from period->periodType.
>> >
>> >>
>> >> The key thing to look out for in data entry and data import is to
>> >> avoid
>> >> overlaps in data values that will cause duplication when aggregating
>> >> data
>> >> periods.
>> >> E.g. if the SAME ORGUNIT registers values for the same data element
>> >> for
>> >> two
>> >> different period types that have overlapping periods, e.g. Jan-10
>> >> and
>> >> Q1-10.
>> >> Then the aggregate values for Q1-10, Jan-June 2010, and 2010 will
>> >> all
>> >> show
>> >> an incorrect value since the value for Jan-10 is counted twice.
>> >
>> > OK. Thats a good concrete constraint to have.
>> >
>> >>
>> >> One way to enforce this constraint is to monitor which datasets an
>> >> orgunit
>> >> is assigned to, and not allow orgunits to be assigned to two
>> >> datasets
>> >> that
>> >> have the same data element AND different period types.
>> >
>> > Agreed, Though this constraint should probably be imposed on forms
>> > rather than datasets.
>> >
>> >>As far as I am aware,
>> >> we are not checking for this today. During data import it could be
>> >> checked
>> >> on data element level by looking up the period type the way Bob has
>> >> shown,
>> >> but that sounds like a lot of look ups and time consuming
>> >> validation,
>> >> or?
>> >
>> > On data import we don't really validate at all, beyond whatever
>> > constraints the db imposes. For efficiency we simply pop the values
>> > in
>> > with multiple insert statement. So this validation would have to
>> > happen as a stage before the actual import or would have to be
>> > constrained within the db. In fact it can't be validated easily
>> > before the import as it is dependent on existing values within the
>> > db.
>> >
>> >>
>> >> A relatively normal use case that we probably have to find a way to
>> >> support,
>> >> and I think they are struggling with in Vietnam, is that different
>> >> provinces
>> >> can use different period types for the same data elements (even for
>> >> complete
>> >> data sets). E.g. if the national data flow policy says to report on
>> >> immunisation data every quarter, so that becomes the minimum
>> >> requirement for
>> >> all provinces. Then some of the provinces decide that all their
>> >> facilities
>> >> have to collect this data monthly anyway, and then at the province
>> >> level
>> >> they simply send the quarterly aggregates to national level (in the
>> >> paper-based or Excel world). At the same time other provinces just
>> >> collect
>> >> quarterly data at the facility level as in the minimum national
>> >> requirement.
>> >> At the national level there is a need to consolidate all this data,
>> >> even
>> >> data by the facility level, so ideally a national DHIS database
>> >> should
>> >> be
>> >> able to store both monthly and quarterly raw data values for the
>> >> same
>> >> data
>> >> elements, but for different orgunits. The national information users
>> >> can
>> >> then easily generate quarterly reports on immunisation for all
>> >> provinces,
>> >> while in some provinces they can do monthly data analysis if they
>> >> want
>> >> to
>> >> collect data using that frequency.
>> >>
>> >> We support the above scenario by allowing the same data elements to
>> >> be
>> >> assigned to different data sets with different period types, but we
>> >> don't
>> >> control for misuse of this flexibility which can lead to duplication
>> >> and
>> >> inconsistent aggregated data values as pointed out above.
>> >
>> > Thinking further ... I really think the problem arises because we we
>> > have a dataset concept which represents a form and is also used to
>> > constrain periodtypes on dataelements. Thinking of the use case you
>> > have just described, it should be the case that one can have a paper
>> > form which national level expect to collect quarterly, and the same
>> > form be used at a lower level to collect data monthly. If we wanted
>> > to mirror that use case electronically we would have to divorce the
>> > form from the periodtype - ie a form would collect datavalues of a
>> > certain period, but the same form could be used in different orgunits
>> > for collecting data at a different frequency..
>> >
>> > So (leaving dataset aside for the moment) if we can't assign a
>> > periodtype to a form and we can't assign to a dataelement and its too
>> > inefficient to validate on a one by one datavalue basis what is a
>> > girl
>> > to do?
>> >
>> > I suspect the correct answer is to refactor datavalue and create a
>> > datavalueset type - note: a set of datavalues rather than a set of
>> > dataelements. Designing out loud, a datavalueset would have the
>> > following fields/attributes:
>> >
>> > 1. a formid - the collection instrument used - roughly corresponds
>> > to
>> > current dataset
>> > 2. an orgunitid - where the datavalues come from
>> > 3. a periodid - the period of all the datavalues
>> > couple of other useful attributes I can think of
>> >
>> > Datavalue now becomes slightly simpler (which is always a good
>> > thing).
>> > It only has:
>> > value, dataelementid, categorycombooption, datasetid
>>
>> Afterthought:
>> At the risk of adding complexity to what is otherwise a
>> simplification, my life could become even simpler if datavalueset also
>> had a categorycombo attribute, which would imply that a dataset was
>> linked to a formsectionid rather than a formid.
>>
>> So a form has sections. sections have dataelements. And sections
>> have a datavalueset as a model - which implies a uniform categorycombo
>> within the section.
>>
>> There isn't really a need for dataelements to have a categorycombo.
>> And in lots of ways its good that they don't. Then I am reducing
>> complexity rather than adding to it :slight_smile:
>>
>> Consider one orgunit has collected malaria deaths disaggregated by
>> age. Another has collected values for the the same dataelement, but
>> not disaggregated by age. The datavalues will come from a
>> datavalueset so will have a categorycombo. It is possible to
>> aggregate or compare these datavalues,from different datavaluesets,
>> but using the lowest common denominator of categorycombo ie. in both
>> cases you have access to malaria deaths - in the one case you have to
>> "roll-up" the categorycombo which does of course assume that the sum
>> of category options make a sensible whole, but Ola has mentioned this
>> one many times.
>>
>
> Some really interesting ideas you are bringing up here Bob. I like the
> kind
> of flexibility and yet structure this would bring to the data model.
>
> One quick question though:
> How would this fit with the use of data elements and
> categorycombooptions in
> metadata expressions like indicators and validation rules that are (and
> should be) completely independent from data collection structures? E.g.
> which categories and options should be available for a given data
> element
> when setting up an indicator formula? All?

I think its a question of the "lowest common denominator" of the
datavalues that you have. Indicators are calculated from datavalues
even though we express the calculation in terms of dataelements.

Ivalue = f(de1,de2,de3...)/g(de4, de5 ..)

Looking just at the numerator - if the set of datavalues you have
corresponding to de1, de2 and de3 share the same categorycombo (and
note that datavalues do have a categorycombo from which their
categoryoptioncombo is derived) , then you can also produce a
similalrly disaggregated indicator value.

If they use different categorycombos (some have age+sex, some have
hiv_age+sex, and some have just sex), but each of these have at least
the sex category, then you could produce an indicator value
disaggregated by sex.

If the categorycombos are a jumble of apples and pears then you can
produce just the rolled up calculation.

I like this idea.

What is the implication? At design time, when you are coding the
expression, you probably should not include the categoryoptioncombo at
all. The indicator is just expressed in terms of dataelements (I
guess traditional DHIS14 style). But when you are generating for
example, the reporttable, the first pass analyzes the data you have
selected and suggests - would you like the indicator data
disaggregated by sex? Or age+sex? Or no disaggregation. So what you
can report on is determined by the data you've got. I think that's a
sound principle.

I can see a few challenges with this principle. In typical implementations
of DHIS you would design forms and canned/fixed reports at the same time
before rolling out the installations. If it is impossible to design reports
before you have any data values I can see a problem with this approach. But
I guess you would know, from the forms information the potential
datavaluesets and therefore could allow some disaggregated reports to be
prepared even before you have any data values?

Yes I hesitated a bit before I suggested that. You are of course
right. You might generally want to express an indicator to be
reported in terms of a particular dataelement and categoryoptioncombo.
And this would only produce results if you collect data using that
dataelement and that categorycombo. Which would be the case if you
have forms which do that.

Another issue I would like to bring up is performance. In the past we have
struggled with and spent a lot of time on improving the performance of the
datamart, the aggregation of data values. To me it sounds more complicated
to have a floating set of disaggregations that needs to be looked up in a
potentially huge storage of datavalues compared to working with a fixed set.
Any thoughts on data mart service performance with this proposed design
compared to the existing one?

I am not really sure. Would have to think more and look closer at the
datamart. I suspect it wouldn't actually make much difference.
Either way you are going to be aggregating datavalues with a
particular datalement value and a particular categoryoptioncombo. I
can't see how the fact that a particualr categorycombo is hard-linked
to a dataelement at a particular moment in time actually should makes
any difference in the calculation. Of course I could be wrong. What
could make a difference is that instead of selecting datavalues from
the gazillions there might be some performance benefit of being able
to first easily select the datavaluesets to be aggregated - assuming
that datavalueset contains the important time and space dimensions of
period and orgunit. As I say, I'm not really sure. At the end of
the day you are still going to end up with the same bundle of
datavalues to crunch. Having these datavalues grouped by period and
orgunit by virtue of being members of datavaluesets may or may not
help, but as I mentioned above, I suspect it won't actually make a
difference.

Lars has pointed out to me at least once, that it the categorycombo is
actually not hardlinked to the dataelement anyway. You can change the
categorycombo on dataelement at any time in the maintenance module.
And what this would mean is that "from now on we're going to collect
datavalues for that dataelement using the new categorycombo". Which
only feeds my curioisity into believing that the categorycombo isn't
really an innate characteristic of the dataelement. Its just a way of
saying how we are *currently* collecting datavalues for it. A form
object (or maybe form section) strikes me as a better place to
maintain that information. For one thing, when you create a new form
you would still have your old ones in the system. Having knowledge of
the forms in the system can give your report designer clues as to what
sort of indicators it can produce. Including across 10 years of data
where the categorycombos have gone through some changes.

I'm sorry I haven't really thought this through and I do really have
other things to think about so I'm not likely to in the immediate
future... the train of thought really started when I start thinking
about having a form or data collection object - like we would have if
we implemented xforms or something similar. Having a form and a
datavalueset (instead of a dhis dataset) creates new possibilities of
where to attach things like the categorycombo and periodtype. But
that's not the primary motivation for having such a beast.

Regards
Bob

···

On 23 May 2010 08:36, Ola Hodne Titlestad <olatitle@gmail.com> wrote:

On 23 May 2010 03:32, Bob Jolliffe <bobjolliffe@gmail.com> wrote:

On 22 May 2010 19:51, Ola Hodne Titlestad <olatitle@gmail.com> wrote:
> On 20 May 2010 18:39, Bob Jolliffe <bobjolliffe@gmail.com> wrote:
>> On 20 May 2010 15:56, Bob Jolliffe <bobjolliffe@gmail.com> wrote:
>> > 2010/5/20 Ola Hodne Titlestad <olatitle@gmail.com>:
>> >> 2010/5/20 Lars Helge Øverland <larshelge@gmail.com>

And I think all of this is completely independent of data collection
structures.

Of course in practice you will have designed and deployed your
collection instruments such that all your datavalues for a given
dataelement will have the same categorycombo. But if you want to
compare data over the past five years, and the ministry decided only
in year two that they wanted to disaggregate by sex and in year 4
decided to introduce a third sex category, then you could still
calculate an indicator from all of those datavalues - but by rolling
up sex category.

I think what we do currently - specifying the categorycombo in the
indicator expression - is more rigid and more fragile.

Agree, and I think most indicators analysis will be on the data element
level anyway (without any disaggregations), so the current design is too
complicated and cumbersome to work with.

Ola
----------

In summary, what we have with categorycombos etc is really quite
brilliant. We don't have ragged data. Our datavalues are stored
compactly and uniformly. All this is great. I think a mistake we may
have made is attaching categorycombo to the dataelement. The
relationship between a categorycombo and a dataelement can and should
be a transient thing. I believe the categorycombo should be a
characteristic of the way we collect the particular datavalues ie. a
characteristic of a particular form. There is a long conversation
before where it emerged that part of the original design rationale of
the categorycombo was indeed related to form layout. At the time this
upset me a bit, because I too had bought into the rigid edifice we had
created. But in retrospect I think this thinking was absolutely on
the right track. Using the categorycombo to specify the
disaggregation layout of a particular form elements makes very good
sense. What was also inspired was having the categorycombo as a named
persisted object in its own right which could be used across different
dataelements.

Cheers
Bob

>
> Ola
> --------
>
>
>
>
>>
>> Regards
>> Bob
>>
>> >
>> > We can relatively efficiently validate that a dataset object is not
>> > persisted which has the same formid, orgunitid and an overlapping
>> > period.
>> >
>> > There is no longer any ambiguity about periodtype of a datavalue.
>> >
>> > stored_by, timestamp, comment might go either way. Probably they
>> > need
>> > to stay on datavalue. I notice comment is rarely used but its really
>> > useful to have a comment on datavalueset for import purposes.
>> >
>> > 'nuff designing out loud. Got to go.
>> >
>> > Regards
>> > Bob
>> >
>> >>
>> >>
>> >> Ola
>> >> ---------
>> >>
>> >>>
>> >>> On Thu, May 20, 2010 at 11:44 AM, Ola Hodne Titlestad >> >> >>> <olatitle@gmail.com> >> >> >>> wrote:
>> >>>>
>> >>>> Hi,
>> >>>>
>> >>>> After Kim Anh's email about the use of the same data elements with
>> >>>> different period types I dug up this old discussion from March
>> >>>> 2009.
>> >>>>
>> >>>> What is the status on this work, or did we not conclude this?
>> >>>>
>> >>>> Ola
>> >>>> ----------
>> >>>>
>> >>>> 2009/3/20 Bob Jolliffe <bobjolliffe@gmail.com>
>> >>>>>
>> >>>>> 2009/3/20 Lars Helge Øverland <larshelge@gmail.com>:
>> >>>>> >
>> >>>>> >>
>> >>>>> >> Yes this is true. But what do you think of the idea to
>> >>>>> >> enforce
>> >>>>> >> DataSet membership having a default DataSet for all the
>> >>>>> >> delinquents?
>> >>>>> >> I'm not sure if it can be enforced by the schema, but at least
>> >>>>> >> by
>> >>>>> >> the
>> >>>>> >> application.
>> >>>>> >
>> >>>>> > OK but what does this give us in terms of
>> >>>>> > PeriodType-determining
>> >>>>> > if
>> >>>>> > this
>> >>>>> > default DataSet has a null PeriodType?
>> >>>>>
>> >>>>> Nothing really. The only effect would be you have an index on
>> >>>>> the
>> >>>>> unassigned DataElements for what its worth. Mainly it would be
>> >>>>> useful
>> >>>>> for determining easily the available DataElements which can be
>> >>>>> added
>> >>>>> to a DataSet. Maybe its a nonsense idea - I was just trying to
>> >>>>> think
>> >>>>> of ways to make editing DataSets reasonably straightforward.
>> >>>>>
>> >>>>> >
>> >>>>> >>
>> >>>>> >> I don't know if its about right or wrong. There are pros and
>> >>>>> >> cons of
>> >>>>> >> both approaches. What you gain on the swings you lose on the
>> >>>>> >> roundabouts :slight_smile:
>> >>>>> >>
>> >>>>> >> In the explicit case the application will have to enforce that
>> >>>>> >> DataSet
>> >>>>> >> members all have the same periodType.
>> >>>>> >>
>> >>>>> >> In the implicit case the application will have to enforce that
>> >>>>> >> DataElements can only be members of multiple groups if these
>> >>>>> >> share
>> >>>>> >> the
>> >>>>> >> same PeriodType.
>> >>>>> >>
>> >>>>> >> The net result as far as the Data API is concerned can and
>> >>>>> >> must
>> >>>>> >> be
>> >>>>> >> the
>> >>>>> >> same. Perhaps we should define exactly what extra methods we
>> >>>>> >> want in
>> >>>>> >> the API first. We have already identified a few. Then decide
>> >>>>> >> whether
>> >>>>> >> a database change is necessitated by these.
>> >>>>> >
>> >>>>> > Yes. We need at least service method:
>> >>>>> >
>> >>>>> > Collection<DataElement> getDataElementsByPeriodType( PeriodType
>> >>>>> > )
>> >>>>> >
>> >>>>> > and getter on the DataElement object:
>> >>>>> >
>> >>>>> > PeriodType getPeriodType()
>> >>>>> >
>> >>>>> >
>> >>>>> > I guess we could make a branch, start coding and see how it
>> >>>>> > works
>> >>>>> > out.
>> >>>>>
>> >>>>> Sure. So long as we are adding methods we won't be breaking
>> >>>>> anything
>> >>>>> in terms of backward compatibility. Just enforcing application
>> >>>>> level
>> >>>>> constraints. Then we can really encourage (enforce?) upper
>> >>>>> layers
>> >>>>> to
>> >>>>> strictly interact with the data via the API. Even if this might
>> >>>>> occasionally mean making some lightweight API methods which
>> >>>>> bypass
>> >>>>> the
>> >>>>> ORM.
>> >>>>>
>> >>>>> >
>> >>>>> > Another issue would arise in the (exotic) situation where
>> >>>>> > someone
>> >>>>> > assigns a
>> >>>>> > DataElement to a DataSet, enter data for it, then removes it
>> >>>>> > from
>> >>>>> > the
>> >>>>> > DataElement. The data is there, but how do we deal with it in
>> >>>>> > regard
>> >>>>> > to the
>> >>>>> > mentioned required functionaly (trend analysis, datamart) ?
>> >>>>> >
>> >>>>>
>> >>>>> Yes this gets a bit weird (I presume you mean removes it from the
>> >>>>> DataSet). I'm guessing you haven't lost the data because the
>> >>>>> dataValues each have a PeriodID which in turn is linked to a
>> >>>>> PeriodType. I suppose that (in such an exotic headspace)
>> >>>>> DataElements
>> >>>>> can in fact change their PeriodTypes over time, though I imagine
>> >>>>> its
>> >>>>> not a great idea.
>> >>>>>
>> >>>>> The effect would be the same in the explicit relationship case,
>> >>>>> if
>> >>>>> someone assigns a DataElement to a DataSet, enter data for it,
>> >>>>> then
>> >>>>> changes the PeriodType of the DataElement ...
>> >>>>>
>> >>>>> Cheers
>> >>>>> Bob
>> >>>>>
>> >>>>> _______________________________________________
>> >>>>> Mailing list: https://launchpad.net/~dhis2-devs
>> >>>>> Post to : dhis2-devs@lists.launchpad.net
>> >>>>> Unsubscribe : https://launchpad.net/~dhis2-devs
>> >>>>> More help : https://help.launchpad.net/ListHelp
>> >>>>
>> >>>
>> >>
>> >>
>> >
>
>

What is the implication? At design time, when you are coding the
expression, you probably should not include the categoryoptioncombo at
all. The indicator is just expressed in terms of dataelements (I
guess traditional DHIS14 style). But when you are generating for
example, the reporttable, the first pass analyzes the data you have
selected and suggests - would you like the indicator data
disaggregated by sex? Or age+sex? Or no disaggregation. So what you
can report on is determined by the data you've got. I think that's a
sound principle.

I can see a few challenges with this principle. In typical implementations
of DHIS you would design forms and canned/fixed reports at the same time
before rolling out the installations. If it is impossible to design reports
before you have any data values I can see a problem with this approach. But
I guess you would know, from the forms information the potential
datavaluesets and therefore could allow some disaggregated reports to be
prepared even before you have any data values?

I'm not sure I see the "principled" difference here. Changing the workflow
to require categoryoptioncombo specification at report generation time
sounds interesting, but, as Ola says, you would not want to require human
interaction for every report generation.

Agreed.

So then we end up with more a case
of workflow changes and storing the same information in a different way?
This could give more automated support for doing reporting that today is
hard to manage manually, but would also give a more complex data model and
workflow to keep track of.

Yes you might be right. I am trying to resolve one complex model with
another and factor the differences. In the process exposing some
alternatives to the way we do things. And I think some gaps - which
largely have to do with keeping track of data and metadata changes.
Will try and sift some of this once I'm done with mapping our
datamodel to the sdmx one.

Regards
Bob

···

On 23 May 2010 11:23, Jo Størset <storset@gmail.com> wrote:

Den 23. mai 2010 kl. 09.36 skrev Ola Hodne Titlestad:

Still interesting, though :slight_smile:
Jo