How do we treat ready-made indicators?

In some cases, we would like to import indicator values that have already been calculated/estimated through various external procedures. The only option here is really to import these as DataElements and DataValues. However, this seems to break our logical/philosophical model and the clear distinctions between “raw counts” and “calculated forumlas”.

Still, I do think we need to handle them - and be able to display them in reports and maps (currently not available for data elements). Such indicator values are often based on statistical analyses of survey data etc. and can even be for the same kinds of indicators that we calculate inside DHIS.

How do people feel they should be treated? Should we create a blueprint for this?

Knut

This problem statement is just a consequence of my repeated efforts to
highlight the discrepancy between our handling of data
elements/indicators and how its handled by statisticians (e.g. world
bank, UN and all their "statistical yearbooks and other outlets, those
with background in medical records (because single aggregate from a
EPR will be a calculation of even smaller "atomic" units (e.g. sex and
age))).

As an example; in Malawi we were formulating plans and strategies with
a University of Washington group called ITECH (Bill Lober etc) which
is working for CDC/Pepfar worldwide, i.e. important partner, and which
belongs to the above EPR (or they would say EMR) crowd regarding all
health statisitistics of interest as indicators. As a consequence the
plan for DHIS is Malawi is that of an "indicator repository". Of the
reasons above, I have no problems with that.

A data repository is by definition (if we using that of a data
warehouse) a repository of aggregates from production systems deemed
important or essential by user groups. With other words; a data
repository is by definition based on the "clear
distinctions between "raw counts" and "calculated formulas", by only
containing the latter, based on the "raw counts" from the "production
systems". The different "starting points", i.e. what is a "production
system" and at what level of aggregation does a "production system"
stop, is what causing the confusion between DHIS and others. The data
elements in DHIS are the atomic data units - in the DHIS, but not in
the world of data!!. All data elements are based on more basic records
and/or counts (the definition of aggregation). As seen from e.g.
OpenMRS, the data elements are therefore already indicators (because
they starts at a more granular level). Hence the term "indicator
repository" for DHIS in Malawi.

The different types of systems are using various points of departure
(the atomic units) for their calculated formulas and aggregations
(DHIS, OpenMRS, World Bank statistics, UNDP annual development report,
etc), but at the other end (output, statistics, reports - what to
display in a GIS) they are all, in principle, at the same level - and
these data are more or less universally called indicators.

This is not more strange than that all we put in the data mart is to
be understand as indicators, and that all that we want to use as
"end-results, for display and reporting, not for further aggregation
in that format, we put in the data mart (or conceptually similar
tables).

As argued before, the quick fix for Knut is to define all he wants to
use for the GIS he defines as indicators using the formula "divided by
one".

Remember, many of the data elements (raw primary counts) in DHIS are
to be defined as indicators in their own right, e.g. cases of measles,
cholera, yellow fewer, maternal deaths, swine flu etc. maybe more
elegantly displayed in the GIS as dots than as crude numbers. All
things being relative. The level of granularity (as in any part of the
formula) varies.

jorn

···

On Sat, Nov 14, 2009 at 11:59 PM, Knut Staring <knutst@gmail.com> wrote:

In some cases, we would like to import indicator values that have already
been calculated/estimated through various external procedures. The only
option here is really to import these as DataElements and DataValues.
However, this seems to break our logical/philosophical model and the clear
distinctions between "raw counts" and "calculated forumlas".
Still, I do think we need to handle them - and be able to display them in
reports and maps (currently not available for data elements). Such indicator
values are often based on statistical analyses of survey data etc. and can
even be for the same kinds of indicators that we calculate inside DHIS.
How do people feel they should be treated? Should we create a blueprint for
this?

Knut

_______________________________________________
Mailing list: https://launchpad.net/~dhis2-devs
Post to : dhis2-devs@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-devs
More help : https://help.launchpad.net/ListHelp

Thanks, Jørn.

From a practical standpoint, we need a user interface where it is quick and easy to create “dummy indicators” (indicators with 1 as denominator) for many data elements at the same time. This should hopefully be easy.

With regard to “semi-permanent data”, a similar question comes up for GIS - either we enable display of “raw data elements” such as “Has services for ART”, “Number of beds” etc, or we make it very straightforward to generate corresponding indicators for the Service Availability dataset.

https://bugs.launchpad.net/dhis2/+bug/483057

Knut

···

On Sun, Nov 15, 2009 at 12:25 PM, Jørn Braa jornbraa@gmail.com wrote:

This problem statement is just a consequence of my repeated efforts to

highlight the discrepancy between our handling of data

elements/indicators and how its handled by statisticians (e.g. world

bank, UN and all their "statistical yearbooks and other outlets, those

with background in medical records (because single aggregate from a

EPR will be a calculation of even smaller “atomic” units (e.g. sex and

age))).

As an example; in Malawi we were formulating plans and strategies with

a University of Washington group called ITECH (Bill Lober etc) which

is working for CDC/Pepfar worldwide, i.e. important partner, and which

belongs to the above EPR (or they would say EMR) crowd regarding all

health statisitistics of interest as indicators. As a consequence the

plan for DHIS is Malawi is that of an “indicator repository”. Of the

reasons above, I have no problems with that.

A data repository is by definition (if we using that of a data

warehouse) a repository of aggregates from production systems deemed

important or essential by user groups. With other words; a data

repository is by definition based on the "clear

distinctions between “raw counts” and “calculated formulas”, by only

containing the latter, based on the “raw counts” from the "production

systems". The different “starting points”, i.e. what is a "production

system" and at what level of aggregation does a “production system”

stop, is what causing the confusion between DHIS and others. The data

elements in DHIS are the atomic data units - in the DHIS, but not in

the world of data!!. All data elements are based on more basic records

and/or counts (the definition of aggregation). As seen from e.g.

OpenMRS, the data elements are therefore already indicators (because

they starts at a more granular level). Hence the term "indicator

repository" for DHIS in Malawi.

The different types of systems are using various points of departure

(the atomic units) for their calculated formulas and aggregations

(DHIS, OpenMRS, World Bank statistics, UNDP annual development report,

etc), but at the other end (output, statistics, reports - what to

display in a GIS) they are all, in principle, at the same level - and

these data are more or less universally called indicators.

This is not more strange than that all we put in the data mart is to

be understand as indicators, and that all that we want to use as

"end-results, for display and reporting, not for further aggregation

in that format, we put in the data mart (or conceptually similar

tables).

As argued before, the quick fix for Knut is to define all he wants to

use for the GIS he defines as indicators using the formula "divided by

one".

Remember, many of the data elements (raw primary counts) in DHIS are

to be defined as indicators in their own right, e.g. cases of measles,

cholera, yellow fewer, maternal deaths, swine flu etc. maybe more

elegantly displayed in the GIS as dots than as crude numbers. All

things being relative. The level of granularity (as in any part of the

formula) varies.

jorn

On Sat, Nov 14, 2009 at 11:59 PM, Knut Staring knutst@gmail.com wrote:

In some cases, we would like to import indicator values that have already

been calculated/estimated through various external procedures. The only

option here is really to import these as DataElements and DataValues.

However, this seems to break our logical/philosophical model and the clear

distinctions between “raw counts” and “calculated forumlas”.

Still, I do think we need to handle them - and be able to display them in

reports and maps (currently not available for data elements). Such indicator

values are often based on statistical analyses of survey data etc. and can

even be for the same kinds of indicators that we calculate inside DHIS.

How do people feel they should be treated? Should we create a blueprint for

this?

Knut


Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp


Cheers,
Knut Staring

To fully delve into the topic of how we should handle indicators (both the ones we calculate from datavalues and estimates from elsewhere), I include below a list of desirable features for indicator systems developed by CDC. As you will see, DHIS 2 currently fits many (most?) of the criteria, but there are some glaring gaps, including (A) the ability to associate indicators with funding sources or budget lines; (B) target values for indicators; and (C) categories (e.g. sex, age) as a fully-implemented feature (rather than repeating the indicator for each category value), similar to how we handle geography.

I have created blueprints for some of the missing features:

  1. Ability to accept manually entered data

  2. Ability to define custom forms for manual data entry

  3. Ability to accept data exported from an electronic data system or perform an extraction therefrom

  4. Support for the following types of indicators: count, ratio, %, population rate, $, national currency, yes/no/unknown

  5. Hierarchical indicator definition via inheritance or import at (minimally) international, national and local level

  6. Indicator frequency: day, week, month, quarter, semester, annual (and aggregates thereof)

  7. Ability to aggregate indicators by donor, government agency, partner, sub-partner, funding source, activity type, program area

  8. Ability to disaggregate indicators by geography (nested), age group (nested), sex, direct/indirect

  9. Ability to define additional aggregation/disaggregation categories

  10. Multi-lingual

  11. Role-based security with dimensions of job function, geography and organizational affiliation

  12. Versioning of data (knowledge of latest date an element of an aggregate was updated), release status (preliminary, validated, released, corrected)

  13. Tools for defining/generating reports, graphs (including longitudinal), maps, pivot tables and combinations thereof, and permitting their being saved as files

  14. Suppression of small counts in output, “smudging” of geography to prevent identification of individuals

  15. Data interchange (especially via IXF3)

  16. Target, revised target and actual data (provides budget capability)

  17. Audit trail

  18. Flexibility to respond to changing reporting requirements of donors, countries and others

  19. Extensibility to other diseases and conditions

  20. Extensibility to changing scenarios

  21. Hierarchical control or import of code sets and data tables

  22. Hierarchical control or import of forms and reports

  23. Ability to maintain facility-level data for multiple facilities on a single machine

  24. Simple maintenance and management at the facility level

  25. Highly responsive data entry, moderately reponsive report generation

  26. System monitoring reports (timeliness, completeness)

  27. Ability to store and forward text or attachments as identified items (narrative reporting)

  28. Ability to add adjustments to indicator values and their justifications at each hierarchical level and to specify their allocation among disaggregation categories

  29. Database neutrality

  30. Reliable, ongoing customer support, including training, software and documentation updates, etc.

···

On Sun, Nov 15, 2009 at 1:46 PM, Knut Staring knutst@gmail.com wrote:

Thanks, Jørn.

From a practical standpoint, we need a user interface where it is quick and easy to create “dummy indicators” (indicators with 1 as denominator) for many data elements at the same time. This should hopefully be easy.

With regard to “semi-permanent data”, a similar question comes up for GIS - either we enable display of “raw data elements” such as “Has services for ART”, “Number of beds” etc, or we make it very straightforward to generate corresponding indicators for the Service Availability dataset.

https://bugs.launchpad.net/dhis2/+bug/483057

Knut

On Sun, Nov 15, 2009 at 12:25 PM, Jørn Braa jornbraa@gmail.com wrote:

This problem statement is just a consequence of my repeated efforts to

highlight the discrepancy between our handling of data

elements/indicators and how its handled by statisticians (e.g. world

bank, UN and all their "statistical yearbooks and other outlets, those

with background in medical records (because single aggregate from a

EPR will be a calculation of even smaller “atomic” units (e.g. sex and

age))).

As an example; in Malawi we were formulating plans and strategies with

a University of Washington group called ITECH (Bill Lober etc) which

is working for CDC/Pepfar worldwide, i.e. important partner, and which

belongs to the above EPR (or they would say EMR) crowd regarding all

health statisitistics of interest as indicators. As a consequence the

plan for DHIS is Malawi is that of an “indicator repository”. Of the

reasons above, I have no problems with that.

A data repository is by definition (if we using that of a data

warehouse) a repository of aggregates from production systems deemed

important or essential by user groups. With other words; a data

repository is by definition based on the "clear

distinctions between “raw counts” and “calculated formulas”, by only

containing the latter, based on the “raw counts” from the "production

systems". The different “starting points”, i.e. what is a "production

system" and at what level of aggregation does a “production system”

stop, is what causing the confusion between DHIS and others. The data

elements in DHIS are the atomic data units - in the DHIS, but not in

the world of data!!. All data elements are based on more basic records

and/or counts (the definition of aggregation). As seen from e.g.

OpenMRS, the data elements are therefore already indicators (because

they starts at a more granular level). Hence the term "indicator

repository" for DHIS in Malawi.

The different types of systems are using various points of departure

(the atomic units) for their calculated formulas and aggregations

(DHIS, OpenMRS, World Bank statistics, UNDP annual development report,

etc), but at the other end (output, statistics, reports - what to

display in a GIS) they are all, in principle, at the same level - and

these data are more or less universally called indicators.

This is not more strange than that all we put in the data mart is to

be understand as indicators, and that all that we want to use as

"end-results, for display and reporting, not for further aggregation

in that format, we put in the data mart (or conceptually similar

tables).

As argued before, the quick fix for Knut is to define all he wants to

use for the GIS he defines as indicators using the formula "divided by

one".

Remember, many of the data elements (raw primary counts) in DHIS are

to be defined as indicators in their own right, e.g. cases of measles,

cholera, yellow fewer, maternal deaths, swine flu etc. maybe more

elegantly displayed in the GIS as dots than as crude numbers. All

things being relative. The level of granularity (as in any part of the

formula) varies.

jorn

On Sat, Nov 14, 2009 at 11:59 PM, Knut Staring knutst@gmail.com wrote:

In some cases, we would like to import indicator values that have already

been calculated/estimated through various external procedures. The only

option here is really to import these as DataElements and DataValues.

However, this seems to break our logical/philosophical model and the clear

distinctions between “raw counts” and “calculated forumlas”.

Still, I do think we need to handle them - and be able to display them in

reports and maps (currently not available for data elements). Such indicator

values are often based on statistical analyses of survey data etc. and can

even be for the same kinds of indicators that we calculate inside DHIS.

How do people feel they should be treated? Should we create a blueprint for

this?

Knut


Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp


Cheers,
Knut Staring


Cheers,
Knut Staring