validation of data values and data element value types

Hi there,

just wanted to share a tip. We often get inquiries about captured data not showing up in pivots/analytics. A typical reason for this situation is data values not being compatible with the value of type of their data elements.

This typically happens because data has been injected directly in the database through SQL, or stems from the old days where DHIS 2 did not strictly validate the data type during import (it does now).

As an example: A data element is set to value type “Number”. Then textual data values (e.g. “High”, “Partial”, “N/A”) are imported directly into the database.

Currently the analytics engine will filter out these values. As a result, they will not be part of pivots and charts. It is a good idea to clean this up, either by converting the data values, changing the data element value type or simply deleting the data values.

This SQL integrity check will count the number of non-numeric data values which are linked to data elements with numerical value types:

select count(*)

from datavalue dv

where value !~* ‘^(-?[0-9]+)(.[0-9]+)?$’

and value is not null

and dv.dataelementid in (

select dataelementid

from dataelement de

where de.valuetype in (‘INTEGER’, ‘INTEGER_POSITIVE’, ‘INTEGER_NEGATIVE’, ‘INTEGER_ZERO_OR_POSITIVE’, ‘NUMBER’, ‘UNIT_INTERVAL’, ‘PERCENTAGE’) );

This SQL integrity check will reveal the actual invalid values (limited to 1000 might have to be adjusted/run several times):

select de.name as de_name, de.uid as de_uid, de.dataelementid as de_id,

pe.startdate, pe.enddate, pe.periodid as pe_id,

ou.name as ou_name, ou.uid as ou_uid, ou.organisationunitid as ou_id,

dv.categoryoptioncomboid, dv.attributeoptioncomboid, dv.lastupdated, dv.created, de.valuetype as de_valuetype, dv.value

from datavalue dv

inner join dataelement de on dv.dataelementid=de.dataelementid

inner join period pe on dv.periodid=pe.periodid

inner join organisationunit ou on dv.sourceid=ou.organisationunitid

where value !~* ‘^(-?[0-9]+)(.[0-9]+)?$’

and value is not null

and dv.dataelementid in (

select dataelementid

from dataelement de

where de.valuetype in (‘INTEGER’, ‘INTEGER_POSITIVE’, ‘INTEGER_NEGATIVE’, ‘INTEGER_ZERO_OR_POSITIVE’, ‘NUMBER’, ‘UNIT_INTERVAL’, ‘PERCENTAGE’) )

limit 1000;

The DHIS 2 team is working on performance improvements which require that this is in order, so it’s a good idea to check this for your systems and potentially clean it up.

best regards,

Lars

···

Lars Helge Øverland

Lead developer, DHIS 2

University of Oslo

Skype: larshelgeoverland

lars@dhis2.org

http://www.dhis2.org