Improving min/max functionality

Hi,
I’ve added a new blueprint here:
https://blueprints.launchpad.net/dhis2/+spec/improve-minmax-value-functionality

-which is about improving the min/max validation functionality. The current solution is very basic and not sufficient in many ways. Here are my thoughts on how to improve this. We can use this list for discussion and then update the blueprint when we settle on something concrete.

This is what I wrote in the blueprint:

A few improvements are needed to the min/max value functionality:

  1. Generation of min/max values should be available from the data administration module

Currently you need to generate min/max ranges for each orgunit/dataset combination one by one in the data entry module. Sometimes you want to generate ranges for all orgunits and datasets at once and then data entry is not the place for this. In Data Administration we can add a new menu heading called “Min/MAx validation” and in there we can allow min/max generation for any combination of orgunit/dataset, and easily allow all combinations to be selected. Maybe also a good idea to include a “from” and “to” field to indicate which periods to use as the basis for the generation, e.g. from 2008-01-01 to 2008-12-31 would indicate that all 12 months of 2008 will be used if the dataset has monthly period type, or the 4 quarters of 2008 will be used if quarterly dataset etc.

  1. User defined parameters that control how the generation is done. Currently the range values are set to 10% lower than the lowest value and 10% higher than the highest value, which is a very crude method. This does not take care of outliers that might already be in the system.

any suggestions for a better statistical method for this? And on how to make it user defined?

  1. I assume we would like to keep the generate min/max option in data entry which can be useful for users that do not deal with all, but just a limited number of orgunits and know that a new round of generation would correct the min/max ranges. But thsi generation should then be configured in a setting, especially how many periods to use. So we could add another property in Data Administration->min/max validation that defines how many periods to use as basis for the generation, for monthly, weekly, yearly etc. period types. Do we need one property per period type? Currently this property is hard-coded to 6 in the source code.

  2. Default min/max range per data element

Normally a min/max range is linked to an orgunit/dataelement combination, but sometimes, e.g when there is very little data or very poor data quality in the system it is useful to have a default range that can be used for all orgunits as a first level of validation to avoid typos and crazy outliers. These default values need to be set somewhere, and maybe data set management is the best suited place for this, at least that is where it is located in DHIS 1.4. Here we need some functionality to quickly set these ranges, even as quick as setting the same range for all data elements in a dataset, and then also the possibility to adjust individual data elements in the data (set) element list.

In Data entry the procedure will be to first check whether a min/max range exists for the orgunit/data element (the best option) and if not then load the default range for the data element (the next best option), and if nothing is set then leave it blank (the worst option).

best regards,
Ola Hodne Titlestad
HISP
University of Oslo

Ola Hodne Titlestad wrote:

Hi,

I’ve added a new blueprint here:

https://blueprints.launchpad.net/dhis2/+spec/improve-minmax-value-functionality

-which is about improving the min/max validation functionality. The
current solution is very basic and not sufficient in many ways. Here
are my thoughts on how to improve this. We can use this list for
discussion and then update the blueprint when we settle on something
concrete.

This is what I wrote in the blueprint:

A few improvements are needed to the min/max value functionality:

  1. Generation of min/max values should be available from the data
    administration module

Currently you need to generate min/max ranges for each orgunit/dataset
combination one by one in the data entry module. Sometimes you want to
generate ranges for all orgunits and datasets at once and then data
entry is not the place for this. In Data Administration we can add a
new menu heading called “Min/MAx validation” and in there we can allow
min/max generation for any combination of orgunit/dataset, and easily
allow all combinations to be selected. Maybe also a good idea to
include a “from” and “to” field to indicate which periods to use as the
basis for the generation, e.g. from 2008-01-01 to 2008-12-31 would
indicate that all 12 months of 2008 will be used if the dataset has
monthly period type, or the 4 quarters of 2008 will be used if
quarterly dataset etc.

Not sure which is the best way to do this, but one way could be to have
“Data quality” as an item under maintenance, where you can set ranges,
and also define and keep track of validation rules. Then, the “data
quality” page currently under services could be split, so that you take
the definition-side of it to maintenance, and the report-side of it as
a subitem in the reports menu.

  1. User defined parameters that control how the generation is
    done.
    Currently the range values are set to 10% lower than the lowest value
    and 10% higher than the highest value, which is a very crude method.
    This does not take care of outliers that might already be in the system.

any suggestions for a better statistical method for this? And on how to
make it user defined?

Use some factor of standard deviation. That will take care of spread.
+/- 10 % will not work for malaria, for instance, as it fluctuates
naturally over the year, due to rainy season. I don’t have here my copy
of the infamous “Statistical concepts and methods” by Bhattacharyya and
Johnson, arguably the most boring book in the world, but this would do
for an explanation: .
Then, as I think it is in DHIS 1.4, you can set the factor to calculate
from, for instance 1.5, making the min and max the mean - 1.5 x st.dev
and the mean + 1.5 x st.dev, respectively.
I concur. In both Sierra Leone and Botswana, setting ranges for
individual facilities, for all data elements, has just created a lot of
extra work for the districts, which are not really aware of how the
process works. So this has so far been skipped in Sierra Leone. As we
want some kind of warning (colour coding and/or pop-ups), this can
create a great deal of frustration until the ranges are correctly set,
and also there are some wild typos where it looks like people have
fallen asleep on the keyboard, which we want to avoid. It would then
make sense to be able to set some global range default.
Johan

···

http://en.wikipedia.org/wiki/Standard_deviation

  1. I assume we would like to keep the generate min/max option in
    data entry which can be useful for users that do not deal with all, but
    just a limited number of orgunits and know that a new round of
    generation would correct the min/max ranges. But thsi generation should
    then be configured in a setting, especially how many periods to use. So
    we could add another property in Data Administration->min/max
    validation that defines how many periods to use as basis for the
    generation, for monthly, weekly, yearly etc. period types. Do we need
    one property per period type? Currently this property is hard-coded to
    6 in the source code.
  1. Default min/max range per data element

Normally a min/max range is linked to an orgunit/dataelement
combination, but sometimes, e.g when there is very little data or very
poor data quality in the system it is useful to have a default range
that can be used for all orgunits as a first level of validation to
avoid typos and crazy outliers. These default values need to be set
somewhere, and maybe data set management is the best suited place for
this, at least that is where it is located in DHIS 1.4. Here we need
some functionality to quickly set these ranges, even as quick as
setting the same range for all data elements in a dataset, and then
also the possibility to adjust individual data elements in the data
(set) element list.

In Data entry the procedure will be to first check whether a min/max
range exists for the orgunit/data element (the best option) and if not
then load the default range for the data element (the next best
option), and if nothing is set then leave it blank (the worst option).

best regards,

Ola Hodne Titlestad

HISP

University of Oslo



---

_______________________________________________
Mailing list: Post to : Unsubscribe : More help :

https://launchpad.net/~dhis2-devsdhis2-devs@lists.launchpad.nethttps://launchpad.net/~dhis2-devshttps://help.launchpad.net/ListHelp