Serious problem with duplicate records in Individual records/tracker

Hi all,

We have an big issue with several of Rwanda’s event reporting systems that have been set up using Tracker/Individual records. For some reason, users are able to create duplicate records in the system through normal data entry. It seems to happen when they enter a value for an attribute and then a couple of seconds or milliseconds later they change that value. The system then stores both records with last updated several milliseconds apart.

This is a big issue because analytics fails to run when it finds these duplicates.

Here is the query that finds the duplicates:

select programstageinstanceid, dataelementname, count(*) as numrecs from

(

SELECT

program.name AS programname,

program.description,

programinstance.dateofincident,

programinstance.trackedentityinstanceid,

programinstance.status,

programstageinstance.programstageinstanceid,

programstageinstance.programinstanceid,

programstageinstance.programstageid,

trackedentitydatavalue.value,

dataelement.name AS dataelementname

FROM

public.programstageinstance,

public.programinstance,

public.program,

public.trackedentitydatavalue,

public.dataelement

WHERE

programstageinstance.programinstanceid = programinstance.programinstanceid AND

program.programid = programinstance.programid AND

trackedentitydatavalue.programstageinstanceid = programstageinstance.programstageinstanceid AND

dataelement.dataelementid = trackedentitydatavalue.dataelementid

) as foo

group by programstageinstanceid, dataelementname

having count(*) >1

order by numrecs desc;

Here is the typical output of the query when there are duplicate records:

Programstageinstanceid, dataelementname,numrecs

777921;“TB Sample type_morning Sputum”;2

776886;“TB Date of collection_sample”;2

773132;“TB History of contact with TPB+ case”;2

Here is an example of the duplicates we see when we select the duplicated records from the trackedentitydatavalue table.

prograstageinstanceid, dataelementid,value,lastupdated,providedelsewhere,user

773132;35105;“true”;“2015-10-02 09:32:48.334”;FALSE;“abibarimana”

773132;35105;“true”;“2015-10-02 09:32:48.508”;FALSE;“abibarimana”

776886;31879;“2015-07-17”;“2015-10-07 16:25:26.341”;FALSE;“mugeni”

776886;31879;“2015-07-14”;“2015-10-07 16:25:26.546”;FALSE;“mugeni”

777921;35106;“true”;“2015-10-08 09:55:36.133”;FALSE;“emufasha”

777921;35106;“true”;“2015-10-08 09:55:36.278”;FALSE;“emufasha”

In order to clean up the trackedentitydatavalue table, I need to write these records to a temporary table, delete them from trackedentitydatavalue, then decide which of the duplicate pairs to delete before copying the correct ones back into trackedentitydatavalues.

This error seems to happen almost daily when the systems are in full use.

Is there a way we can put a constraint on the trackedentitydatavalue table so that it will not accept duplicate values for programstageinstanceid and dataelementid? I also think there must be an error in the java code that should be updating the first saved record with the one entered by the user later rather than inserting a new record.

This is a major bug because it causes analytics to fail and when it fails we cannot run any event reports to get data out of the system.

···

Randy Wilson
*Team Leader: *Knowledge Management, Data Use and Research

Rwanda Health System Strengthening Activity

Management Sciences for Health

Rwanda-Kigali

Direct: +250 788308835

E-mail: rwilson@msh.org

Skype: wilsonrandy_us

Stronger health systems. Greater health impact.

www.msh.org

Hi Randy,
There is actually a constraint on the trackedentitydatavalue table and you cannot create two entries with the same programstageinstance and dataelementid. I notice that you are grouping on dataelement name, is it possible that you could have two dataelements with the same name in your dataelement table? I.e different dataelementids but the same name?

Regards

Ant

This message and any attachments are subject to a disclaimer published at http://www.hisp.org/policies.html#comms_disclaimer. Please read the disclaimer before opening any attachment or taking any other action in terms of this electronic transmission. If you cannot access the disclaimer, kindly send an email to disclaimer@hisp.org and a copy will be provided to you. By replying to this e-mail or opening any attachment you agree to be bound by the provisions of the disclaimer.

image

···

On 9 October 2015 at 10:22, Wilson, Randy rwilson@msh.org wrote:

Hi all,

We have an big issue with several of Rwanda’s event reporting systems that have been set up using Tracker/Individual records. For some reason, users are able to create duplicate records in the system through normal data entry. It seems to happen when they enter a value for an attribute and then a couple of seconds or milliseconds later they change that value. The system then stores both records with last updated several milliseconds apart.

This is a big issue because analytics fails to run when it finds these duplicates.

Here is the query that finds the duplicates:

select programstageinstanceid, dataelementname, count(*) as numrecs from

(

SELECT

program.name AS programname,

program.description,

programinstance.dateofincident,

programinstance.trackedentityinstanceid,

programinstance.status,

programstageinstance.programstageinstanceid,

programstageinstance.programinstanceid,

programstageinstance.programstageid,

trackedentitydatavalue.value,

dataelement.name AS dataelementname

FROM

public.programstageinstance,

public.programinstance,

public.program,

public.trackedentitydatavalue,

public.dataelement

WHERE

programstageinstance.programinstanceid = programinstance.programinstanceid AND

program.programid = programinstance.programid AND

trackedentitydatavalue.programstageinstanceid = programstageinstance.programstageinstanceid AND

dataelement.dataelementid = trackedentitydatavalue.dataelementid

) as foo

group by programstageinstanceid, dataelementname

having count(*) >1

order by numrecs desc;

Here is the typical output of the query when there are duplicate records:

Programstageinstanceid, dataelementname,numrecs

777921;“TB Sample type_morning Sputum”;2

776886;“TB Date of collection_sample”;2

773132;“TB History of contact with TPB+ case”;2

Here is an example of the duplicates we see when we select the duplicated records from the trackedentitydatavalue table.

prograstageinstanceid, dataelementid,value,lastupdated,providedelsewhere,user

773132;35105;“true”;“2015-10-02 09:32:48.334”;FALSE;“abibarimana”

773132;35105;“true”;“2015-10-02 09:32:48.508”;FALSE;“abibarimana”

776886;31879;“2015-07-17”;“2015-10-07 16:25:26.341”;FALSE;“mugeni”

776886;31879;“2015-07-14”;“2015-10-07 16:25:26.546”;FALSE;“mugeni”

777921;35106;“true”;“2015-10-08 09:55:36.133”;FALSE;“emufasha”

777921;35106;“true”;“2015-10-08 09:55:36.278”;FALSE;“emufasha”

In order to clean up the trackedentitydatavalue table, I need to write these records to a temporary table, delete them from trackedentitydatavalue, then decide which of the duplicate pairs to delete before copying the correct ones back into trackedentitydatavalues.

This error seems to happen almost daily when the systems are in full use.

Is there a way we can put a constraint on the trackedentitydatavalue table so that it will not accept duplicate values for programstageinstanceid and dataelementid? I also think there must be an error in the java code that should be updating the first saved record with the one entered by the user later rather than inserting a new record.

This is a major bug because it causes analytics to fail and when it fails we cannot run any event reports to get data out of the system.


Randy Wilson
*Team Leader: *Knowledge Management, Data Use and Research

Rwanda Health System Strengthening Activity

Management Sciences for Health

Rwanda-Kigali

Direct: +250 788308835

E-mail: rwilson@msh.org

Skype: wilsonrandy_us

Stronger health systems. Greater health impact.

www.msh.org

This message and its attachments are confidential and solely for the intended recipients. If received in error, please delete them and notify the sender via reply e-mail immediately.


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

Ant Snyman

Cell: 0824910449

Landline: 028 2713242

Health Information Systems Program - SA

I will check but when i remove the duplicates analytics runs. Also usually we get many more duplicates 100 or more each time I check.

Randy

image

···

On 9 October 2015 at 10:22, Wilson, Randy rwilson@msh.org wrote:

Hi all,

We have an big issue with several of Rwanda’s event reporting systems that have been set up using Tracker/Individual records. For some reason, users are able to create duplicate records in the system through normal data entry. It seems to happen when they enter a value for an attribute and then a couple of seconds or milliseconds later they change that value. The system then stores both records with last updated several milliseconds apart.

This is a big issue because analytics fails to run when it finds these duplicates.

Here is the query that finds the duplicates:

select programstageinstanceid, dataelementname, count(*) as numrecs from

(

SELECT

program.name AS programname,

program.description,

programinstance.dateofincident,

programinstance.trackedentityinstanceid,

programinstance.status,

programstageinstance.programstageinstanceid,

programstageinstance.programinstanceid,

programstageinstance.programstageid,

trackedentitydatavalue.value,

dataelement.name AS dataelementname

FROM

public.programstageinstance,

public.programinstance,

public.program,

public.trackedentitydatavalue,

public.dataelement

WHERE

programstageinstance.programinstanceid = programinstance.programinstanceid AND

program.programid = programinstance.programid AND

trackedentitydatavalue.programstageinstanceid = programstageinstance.programstageinstanceid AND

dataelement.dataelementid = trackedentitydatavalue.dataelementid

) as foo

group by programstageinstanceid, dataelementname

having count(*) >1

order by numrecs desc;

Here is the typical output of the query when there are duplicate records:

Programstageinstanceid, dataelementname,numrecs

777921;“TB Sample type_morning Sputum”;2

776886;“TB Date of collection_sample”;2

773132;“TB History of contact with TPB+ case”;2

Here is an example of the duplicates we see when we select the duplicated records from the trackedentitydatavalue table.

prograstageinstanceid, dataelementid,value,lastupdated,providedelsewhere,user

773132;35105;“true”;“2015-10-02 09:32:48.334”;FALSE;“abibarimana”

773132;35105;“true”;“2015-10-02 09:32:48.508”;FALSE;“abibarimana”

776886;31879;“2015-07-17”;“2015-10-07 16:25:26.341”;FALSE;“mugeni”

776886;31879;“2015-07-14”;“2015-10-07 16:25:26.546”;FALSE;“mugeni”

777921;35106;“true”;“2015-10-08 09:55:36.133”;FALSE;“emufasha”

777921;35106;“true”;“2015-10-08 09:55:36.278”;FALSE;“emufasha”

In order to clean up the trackedentitydatavalue table, I need to write these records to a temporary table, delete them from trackedentitydatavalue, then decide which of the duplicate pairs to delete before copying the correct ones back into trackedentitydatavalues.

This error seems to happen almost daily when the systems are in full use.

Is there a way we can put a constraint on the trackedentitydatavalue table so that it will not accept duplicate values for programstageinstanceid and dataelementid? I also think there must be an error in the java code that should be updating the first saved record with the one entered by the user later rather than inserting a new record.

This is a major bug because it causes analytics to fail and when it fails we cannot run any event reports to get data out of the system.


Randy Wilson
*Team Leader: *Knowledge Management, Data Use and Research

Rwanda Health System Strengthening Activity

Management Sciences for Health

Rwanda-Kigali

Direct: +250 788308835

E-mail: rwilson@msh.org

Skype: wilsonrandy_us

Stronger health systems. Greater health impact.

www.msh.org

This message and its attachments are confidential and solely for the intended recipients. If received in error, please delete them and notify the sender via reply e-mail immediately.


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

Ant Snyman

Cell: 0824910449

Landline: 028 2713242

Health Information Systems Program - SA

If you experience reasonably high latency (slow response times) it seems quite likely you might well get duplicates. Probably related to

https://bugs.launchpad.net/dhis2/+bug/1498505

image

···

On 9 October 2015 at 15:40, Wilson, Randy rwilson@msh.org wrote:

I will check but when i remove the duplicates analytics runs. Also usually we get many more duplicates 100 or more each time I check.

Randy

On Oct 9, 2015 3:48 PM, “Ant Snyman” ant@hisp.org wrote:

Hi Randy,
There is actually a constraint on the trackedentitydatavalue table and you cannot create two entries with the same programstageinstance and dataelementid. I notice that you are grouping on dataelement name, is it possible that you could have two dataelements with the same name in your dataelement table? I.e different dataelementids but the same name?

Regards

Ant

This message and any attachments are subject to a disclaimer published at http://www.hisp.org/policies.html#comms_disclaimer. Please read the disclaimer before opening any attachment or taking any other action in terms of this electronic transmission. If you cannot access the disclaimer, kindly send an email to disclaimer@hisp.org and a copy will be provided to you. By replying to this e-mail or opening any attachment you agree to be bound by the provisions of the disclaimer.

This message and its attachments are confidential and solely for the intended recipients. If received in error, please delete them and notify the sender via reply e-mail immediately.


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

On 9 October 2015 at 10:22, Wilson, Randy rwilson@msh.org wrote:

Hi all,

We have an big issue with several of Rwanda’s event reporting systems that have been set up using Tracker/Individual records. For some reason, users are able to create duplicate records in the system through normal data entry. It seems to happen when they enter a value for an attribute and then a couple of seconds or milliseconds later they change that value. The system then stores both records with last updated several milliseconds apart.

This is a big issue because analytics fails to run when it finds these duplicates.

Here is the query that finds the duplicates:

select programstageinstanceid, dataelementname, count(*) as numrecs from

(

SELECT

program.name AS programname,

program.description,

programinstance.dateofincident,

programinstance.trackedentityinstanceid,

programinstance.status,

programstageinstance.programstageinstanceid,

programstageinstance.programinstanceid,

programstageinstance.programstageid,

trackedentitydatavalue.value,

dataelement.name AS dataelementname

FROM

public.programstageinstance,

public.programinstance,

public.program,

public.trackedentitydatavalue,

public.dataelement

WHERE

programstageinstance.programinstanceid = programinstance.programinstanceid AND

program.programid = programinstance.programid AND

trackedentitydatavalue.programstageinstanceid = programstageinstance.programstageinstanceid AND

dataelement.dataelementid = trackedentitydatavalue.dataelementid

) as foo

group by programstageinstanceid, dataelementname

having count(*) >1

order by numrecs desc;

Here is the typical output of the query when there are duplicate records:

Programstageinstanceid, dataelementname,numrecs

777921;“TB Sample type_morning Sputum”;2

776886;“TB Date of collection_sample”;2

773132;“TB History of contact with TPB+ case”;2

Here is an example of the duplicates we see when we select the duplicated records from the trackedentitydatavalue table.

prograstageinstanceid, dataelementid,value,lastupdated,providedelsewhere,user

773132;35105;“true”;“2015-10-02 09:32:48.334”;FALSE;“abibarimana”

773132;35105;“true”;“2015-10-02 09:32:48.508”;FALSE;“abibarimana”

776886;31879;“2015-07-17”;“2015-10-07 16:25:26.341”;FALSE;“mugeni”

776886;31879;“2015-07-14”;“2015-10-07 16:25:26.546”;FALSE;“mugeni”

777921;35106;“true”;“2015-10-08 09:55:36.133”;FALSE;“emufasha”

777921;35106;“true”;“2015-10-08 09:55:36.278”;FALSE;“emufasha”

In order to clean up the trackedentitydatavalue table, I need to write these records to a temporary table, delete them from trackedentitydatavalue, then decide which of the duplicate pairs to delete before copying the correct ones back into trackedentitydatavalues.

This error seems to happen almost daily when the systems are in full use.

Is there a way we can put a constraint on the trackedentitydatavalue table so that it will not accept duplicate values for programstageinstanceid and dataelementid? I also think there must be an error in the java code that should be updating the first saved record with the one entered by the user later rather than inserting a new record.

This is a major bug because it causes analytics to fail and when it fails we cannot run any event reports to get data out of the system.


Randy Wilson
*Team Leader: *Knowledge Management, Data Use and Research

Rwanda Health System Strengthening Activity

Management Sciences for Health

Rwanda-Kigali

Direct: +250 788308835

E-mail: rwilson@msh.org

Skype: wilsonrandy_us

Stronger health systems. Greater health impact.

www.msh.org

This message and its attachments are confidential and solely for the intended recipients. If received in error, please delete them and notify the sender via reply e-mail immediately.


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

Ant Snyman

Cell: 0824910449

Landline: 028 2713242

Health Information Systems Program - SA

Seems like what we also experienced a lot for tLiberia, I think.

Knut

image

···

On Fri, Oct 9, 2015 at 11:13 AM, Bob Jolliffe bobjolliffe@gmail.com wrote:

If you experience reasonably high latency (slow response times) it seems quite likely you might well get duplicates. Probably related to

https://bugs.launchpad.net/dhis2/+bug/1498505


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

On 9 October 2015 at 15:40, Wilson, Randy rwilson@msh.org wrote:

I will check but when i remove the duplicates analytics runs. Also usually we get many more duplicates 100 or more each time I check.

Randy

On Oct 9, 2015 3:48 PM, “Ant Snyman” ant@hisp.org wrote:

Hi Randy,
There is actually a constraint on the trackedentitydatavalue table and you cannot create two entries with the same programstageinstance and dataelementid. I notice that you are grouping on dataelement name, is it possible that you could have two dataelements with the same name in your dataelement table? I.e different dataelementids but the same name?

Regards

Ant

This message and any attachments are subject to a disclaimer published at http://www.hisp.org/policies.html#comms_disclaimer. Please read the disclaimer before opening any attachment or taking any other action in terms of this electronic transmission. If you cannot access the disclaimer, kindly send an email to disclaimer@hisp.org and a copy will be provided to you. By replying to this e-mail or opening any attachment you agree to be bound by the provisions of the disclaimer.

This message and its attachments are confidential and solely for the intended recipients. If received in error, please delete them and notify the sender via reply e-mail immediately.


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

On 9 October 2015 at 10:22, Wilson, Randy rwilson@msh.org wrote:

Hi all,

We have an big issue with several of Rwanda’s event reporting systems that have been set up using Tracker/Individual records. For some reason, users are able to create duplicate records in the system through normal data entry. It seems to happen when they enter a value for an attribute and then a couple of seconds or milliseconds later they change that value. The system then stores both records with last updated several milliseconds apart.

This is a big issue because analytics fails to run when it finds these duplicates.

Here is the query that finds the duplicates:

select programstageinstanceid, dataelementname, count(*) as numrecs from

(

SELECT

program.name AS programname,

program.description,

programinstance.dateofincident,

programinstance.trackedentityinstanceid,

programinstance.status,

programstageinstance.programstageinstanceid,

programstageinstance.programinstanceid,

programstageinstance.programstageid,

trackedentitydatavalue.value,

dataelement.name AS dataelementname

FROM

public.programstageinstance,

public.programinstance,

public.program,

public.trackedentitydatavalue,

public.dataelement

WHERE

programstageinstance.programinstanceid = programinstance.programinstanceid AND

program.programid = programinstance.programid AND

trackedentitydatavalue.programstageinstanceid = programstageinstance.programstageinstanceid AND

dataelement.dataelementid = trackedentitydatavalue.dataelementid

) as foo

group by programstageinstanceid, dataelementname

having count(*) >1

order by numrecs desc;

Here is the typical output of the query when there are duplicate records:

Programstageinstanceid, dataelementname,numrecs

777921;“TB Sample type_morning Sputum”;2

776886;“TB Date of collection_sample”;2

773132;“TB History of contact with TPB+ case”;2

Here is an example of the duplicates we see when we select the duplicated records from the trackedentitydatavalue table.

prograstageinstanceid, dataelementid,value,lastupdated,providedelsewhere,user

773132;35105;“true”;“2015-10-02 09:32:48.334”;FALSE;“abibarimana”

773132;35105;“true”;“2015-10-02 09:32:48.508”;FALSE;“abibarimana”

776886;31879;“2015-07-17”;“2015-10-07 16:25:26.341”;FALSE;“mugeni”

776886;31879;“2015-07-14”;“2015-10-07 16:25:26.546”;FALSE;“mugeni”

777921;35106;“true”;“2015-10-08 09:55:36.133”;FALSE;“emufasha”

777921;35106;“true”;“2015-10-08 09:55:36.278”;FALSE;“emufasha”

In order to clean up the trackedentitydatavalue table, I need to write these records to a temporary table, delete them from trackedentitydatavalue, then decide which of the duplicate pairs to delete before copying the correct ones back into trackedentitydatavalues.

This error seems to happen almost daily when the systems are in full use.

Is there a way we can put a constraint on the trackedentitydatavalue table so that it will not accept duplicate values for programstageinstanceid and dataelementid? I also think there must be an error in the java code that should be updating the first saved record with the one entered by the user later rather than inserting a new record.

This is a major bug because it causes analytics to fail and when it fails we cannot run any event reports to get data out of the system.


Randy Wilson
*Team Leader: *Knowledge Management, Data Use and Research

Rwanda Health System Strengthening Activity

Management Sciences for Health

Rwanda-Kigali

Direct: +250 788308835

E-mail: rwilson@msh.org

Skype: wilsonrandy_us

Stronger health systems. Greater health impact.

www.msh.org

This message and its attachments are confidential and solely for the intended recipients. If received in error, please delete them and notify the sender via reply e-mail immediately.


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

Ant Snyman

Cell: 0824910449

Landline: 028 2713242

Health Information Systems Program - SA

Knut Staring

Dept. of Informatics, University of Oslo

Norway: +4791880522

Skype: knutstar

http://dhis2.org

Hi all,

It seems to me from the pgadmin create table script below that the table trackedentitydatavalue table does not have a unique primary key constraint (programstageinstanceid+dataelementid). I theory adding that constraint would make it impossible to have duplicates.

here is the extra constraint needed:

CONSTRAINT pk_programstageinstanceid_dataelementid PRIMARY KEY (programstageinstanceid, dataelementid)

I hesitate to do it on our production instance because I wonder if the Java code should be checked to make sure the system will recover gracefully if a user tries to save the same record twice within a few seconds due to the latency issues discussed below.

Is that a change that can be tried by the developers and then moved into trunk?

current structure of table:

image

···

On Fri, Oct 9, 2015 at 6:35 PM, Knut Staring knutst@gmail.com wrote:

Seems like what we also experienced a lot for tLiberia, I think.

Knut

On Fri, Oct 9, 2015 at 11:13 AM, Bob Jolliffe bobjolliffe@gmail.com wrote:

If you experience reasonably high latency (slow response times) it seems quite likely you might well get duplicates. Probably related to

https://bugs.launchpad.net/dhis2/+bug/1498505


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

Knut Staring

Dept. of Informatics, University of Oslo

Norway: +4791880522

Skype: knutstar

http://dhis2.org

On 9 October 2015 at 15:40, Wilson, Randy rwilson@msh.org wrote:

I will check but when i remove the duplicates analytics runs. Also usually we get many more duplicates 100 or more each time I check.

Randy

On Oct 9, 2015 3:48 PM, “Ant Snyman” ant@hisp.org wrote:

Hi Randy,
There is actually a constraint on the trackedentitydatavalue table and you cannot create two entries with the same programstageinstance and dataelementid. I notice that you are grouping on dataelement name, is it possible that you could have two dataelements with the same name in your dataelement table? I.e different dataelementids but the same name?

Regards

Ant

This message and any attachments are subject to a disclaimer published at http://www.hisp.org/policies.html#comms_disclaimer. Please read the disclaimer before opening any attachment or taking any other action in terms of this electronic transmission. If you cannot access the disclaimer, kindly send an email to disclaimer@hisp.org and a copy will be provided to you. By replying to this e-mail or opening any attachment you agree to be bound by the provisions of the disclaimer.

This message and its attachments are confidential and solely for the intended recipients. If received in error, please delete them and notify the sender via reply e-mail immediately.


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

On 9 October 2015 at 10:22, Wilson, Randy rwilson@msh.org wrote:

Hi all,

We have an big issue with several of Rwanda’s event reporting systems that have been set up using Tracker/Individual records. For some reason, users are able to create duplicate records in the system through normal data entry. It seems to happen when they enter a value for an attribute and then a couple of seconds or milliseconds later they change that value. The system then stores both records with last updated several milliseconds apart.

This is a big issue because analytics fails to run when it finds these duplicates.

Here is the query that finds the duplicates:

select programstageinstanceid, dataelementname, count(*) as numrecs from

(

SELECT

program.name AS programname,

program.description,

programinstance.dateofincident,

programinstance.trackedentityinstanceid,

programinstance.status,

programstageinstance.programstageinstanceid,

programstageinstance.programinstanceid,

programstageinstance.programstageid,

trackedentitydatavalue.value,

dataelement.name AS dataelementname

FROM

public.programstageinstance,

public.programinstance,

public.program,

public.trackedentitydatavalue,

public.dataelement

WHERE

programstageinstance.programinstanceid = programinstance.programinstanceid AND

program.programid = programinstance.programid AND

trackedentitydatavalue.programstageinstanceid = programstageinstance.programstageinstanceid AND

dataelement.dataelementid = trackedentitydatavalue.dataelementid

) as foo

group by programstageinstanceid, dataelementname

having count(*) >1

order by numrecs desc;

Here is the typical output of the query when there are duplicate records:

Programstageinstanceid, dataelementname,numrecs

777921;“TB Sample type_morning Sputum”;2

776886;“TB Date of collection_sample”;2

773132;“TB History of contact with TPB+ case”;2

Here is an example of the duplicates we see when we select the duplicated records from the trackedentitydatavalue table.

prograstageinstanceid, dataelementid,value,lastupdated,providedelsewhere,user

773132;35105;“true”;“2015-10-02 09:32:48.334”;FALSE;“abibarimana”

773132;35105;“true”;“2015-10-02 09:32:48.508”;FALSE;“abibarimana”

776886;31879;“2015-07-17”;“2015-10-07 16:25:26.341”;FALSE;“mugeni”

776886;31879;“2015-07-14”;“2015-10-07 16:25:26.546”;FALSE;“mugeni”

777921;35106;“true”;“2015-10-08 09:55:36.133”;FALSE;“emufasha”

777921;35106;“true”;“2015-10-08 09:55:36.278”;FALSE;“emufasha”

In order to clean up the trackedentitydatavalue table, I need to write these records to a temporary table, delete them from trackedentitydatavalue, then decide which of the duplicate pairs to delete before copying the correct ones back into trackedentitydatavalues.

This error seems to happen almost daily when the systems are in full use.

Is there a way we can put a constraint on the trackedentitydatavalue table so that it will not accept duplicate values for programstageinstanceid and dataelementid? I also think there must be an error in the java code that should be updating the first saved record with the one entered by the user later rather than inserting a new record.

This is a major bug because it causes analytics to fail and when it fails we cannot run any event reports to get data out of the system.


Randy Wilson
*Team Leader: *Knowledge Management, Data Use and Research

Rwanda Health System Strengthening Activity

Management Sciences for Health

Rwanda-Kigali

Direct: +250 788308835

E-mail: rwilson@msh.org

Skype: wilsonrandy_us

Stronger health systems. Greater health impact.

www.msh.org

This message and its attachments are confidential and solely for the intended recipients. If received in error, please delete them and notify the sender via reply e-mail immediately.


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

Ant Snyman

Cell: 0824910449

Landline: 028 2713242

Health Information Systems Program - SA

Randy Wilson
*Team Leader: *Knowledge Management, Data Use and Research

Rwanda Health System Strengthening Activity

Management Sciences for Health

Rwanda-Kigali

Direct: +250 788308835

E-mail: rwilson@msh.org

Skype: wilsonrandy_us

Stronger health systems. Greater health impact.

www.msh.org

Seems like a good idea to me - this is certainly a serious problem. Hope it can be part of 2.21.

Knut

image

···

On Thu, Oct 15, 2015 at 5:33 AM, Wilson, Randy rwilson@msh.org wrote:

Hi all,

It seems to me from the pgadmin create table script below that the table trackedentitydatavalue table does not have a unique primary key constraint (programstageinstanceid+dataelementid). I theory adding that constraint would make it impossible to have duplicates.

here is the extra constraint needed:

CONSTRAINT pk_programstageinstanceid_dataelementid PRIMARY KEY (programstageinstanceid, dataelementid)

I hesitate to do it on our production instance because I wonder if the Java code should be checked to make sure the system will recover gracefully if a user tries to save the same record twice within a few seconds due to the latency issues discussed below.

Is that a change that can be tried by the developers and then moved into trunk?

current structure of table:


CREATE TABLE trackedentitydatavalue

(

programstageinstanceid integer NOT NULL,

dataelementid integer NOT NULL,

value character varying(255),

“timestamp” timestamp without time zone,

providedelsewhere boolean,

storedby character varying(31),

CONSTRAINT fk_entityinstancedatavalue_dataelementid FOREIGN KEY (dataelementid)

REFERENCES dataelement (dataelementid) MATCH SIMPLE

ON UPDATE NO ACTION ON DELETE NO ACTION,

CONSTRAINT fk_entityinstancedatavalue_programstageinstanceid FOREIGN KEY (programstageinstanceid)

REFERENCES programstageinstance (programstageinstanceid) MATCH SIMPLE

ON UPDATE NO ACTION ON DELETE NO ACTION

)

WITH (

OIDS=FALSE

);

ALTER TABLE trackedentitydatavalue

OWNER TO individualrecords;

Thanks,

Randy

This message and its attachments are confidential and solely for the intended recipients. If received in error, please delete them and notify the sender via reply e-mail immediately.

On Fri, Oct 9, 2015 at 6:35 PM, Knut Staring knutst@gmail.com wrote:

Seems like what we also experienced a lot for tLiberia, I think.

Knut


Randy Wilson
*Team Leader: *Knowledge Management, Data Use and Research

Rwanda Health System Strengthening Activity

Management Sciences for Health

Rwanda-Kigali

Direct: +250 788308835

E-mail: rwilson@msh.org

Skype: wilsonrandy_us

Stronger health systems. Greater health impact.

www.msh.org

On Fri, Oct 9, 2015 at 11:13 AM, Bob Jolliffe bobjolliffe@gmail.com wrote:

If you experience reasonably high latency (slow response times) it seems quite likely you might well get duplicates. Probably related to

https://bugs.launchpad.net/dhis2/+bug/1498505


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

Knut Staring

Dept. of Informatics, University of Oslo

Norway: +4791880522

Skype: knutstar

http://dhis2.org

On 9 October 2015 at 15:40, Wilson, Randy rwilson@msh.org wrote:

I will check but when i remove the duplicates analytics runs. Also usually we get many more duplicates 100 or more each time I check.

Randy

On Oct 9, 2015 3:48 PM, “Ant Snyman” ant@hisp.org wrote:

Hi Randy,
There is actually a constraint on the trackedentitydatavalue table and you cannot create two entries with the same programstageinstance and dataelementid. I notice that you are grouping on dataelement name, is it possible that you could have two dataelements with the same name in your dataelement table? I.e different dataelementids but the same name?

Regards

Ant

This message and any attachments are subject to a disclaimer published at http://www.hisp.org/policies.html#comms_disclaimer. Please read the disclaimer before opening any attachment or taking any other action in terms of this electronic transmission. If you cannot access the disclaimer, kindly send an email to disclaimer@hisp.org and a copy will be provided to you. By replying to this e-mail or opening any attachment you agree to be bound by the provisions of the disclaimer.

This message and its attachments are confidential and solely for the intended recipients. If received in error, please delete them and notify the sender via reply e-mail immediately.


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

On 9 October 2015 at 10:22, Wilson, Randy rwilson@msh.org wrote:

Hi all,

We have an big issue with several of Rwanda’s event reporting systems that have been set up using Tracker/Individual records. For some reason, users are able to create duplicate records in the system through normal data entry. It seems to happen when they enter a value for an attribute and then a couple of seconds or milliseconds later they change that value. The system then stores both records with last updated several milliseconds apart.

This is a big issue because analytics fails to run when it finds these duplicates.

Here is the query that finds the duplicates:

select programstageinstanceid, dataelementname, count(*) as numrecs from

(

SELECT

program.name AS programname,

program.description,

programinstance.dateofincident,

programinstance.trackedentityinstanceid,

programinstance.status,

programstageinstance.programstageinstanceid,

programstageinstance.programinstanceid,

programstageinstance.programstageid,

trackedentitydatavalue.value,

dataelement.name AS dataelementname

FROM

public.programstageinstance,

public.programinstance,

public.program,

public.trackedentitydatavalue,

public.dataelement

WHERE

programstageinstance.programinstanceid = programinstance.programinstanceid AND

program.programid = programinstance.programid AND

trackedentitydatavalue.programstageinstanceid = programstageinstance.programstageinstanceid AND

dataelement.dataelementid = trackedentitydatavalue.dataelementid

) as foo

group by programstageinstanceid, dataelementname

having count(*) >1

order by numrecs desc;

Here is the typical output of the query when there are duplicate records:

Programstageinstanceid, dataelementname,numrecs

777921;“TB Sample type_morning Sputum”;2

776886;“TB Date of collection_sample”;2

773132;“TB History of contact with TPB+ case”;2

Here is an example of the duplicates we see when we select the duplicated records from the trackedentitydatavalue table.

prograstageinstanceid, dataelementid,value,lastupdated,providedelsewhere,user

773132;35105;“true”;“2015-10-02 09:32:48.334”;FALSE;“abibarimana”

773132;35105;“true”;“2015-10-02 09:32:48.508”;FALSE;“abibarimana”

776886;31879;“2015-07-17”;“2015-10-07 16:25:26.341”;FALSE;“mugeni”

776886;31879;“2015-07-14”;“2015-10-07 16:25:26.546”;FALSE;“mugeni”

777921;35106;“true”;“2015-10-08 09:55:36.133”;FALSE;“emufasha”

777921;35106;“true”;“2015-10-08 09:55:36.278”;FALSE;“emufasha”

In order to clean up the trackedentitydatavalue table, I need to write these records to a temporary table, delete them from trackedentitydatavalue, then decide which of the duplicate pairs to delete before copying the correct ones back into trackedentitydatavalues.

This error seems to happen almost daily when the systems are in full use.

Is there a way we can put a constraint on the trackedentitydatavalue table so that it will not accept duplicate values for programstageinstanceid and dataelementid? I also think there must be an error in the java code that should be updating the first saved record with the one entered by the user later rather than inserting a new record.

This is a major bug because it causes analytics to fail and when it fails we cannot run any event reports to get data out of the system.


Randy Wilson
*Team Leader: *Knowledge Management, Data Use and Research

Rwanda Health System Strengthening Activity

Management Sciences for Health

Rwanda-Kigali

Direct: +250 788308835

E-mail: rwilson@msh.org

Skype: wilsonrandy_us

Stronger health systems. Greater health impact.

www.msh.org

This message and its attachments are confidential and solely for the intended recipients. If received in error, please delete them and notify the sender via reply e-mail immediately.


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

Ant Snyman

Cell: 0824910449

Landline: 028 2713242

Health Information Systems Program - SA

Knut Staring

Dept. of Informatics, University of Oslo

Norway: +4791880522

Skype: knutstar

http://dhis2.org

Randy I think you should go ahead.

It is likely that you may get some ungraceful errors in the browser if the exception doesn’t get caught, but I suspect this is better than dhis “gracefully” saving the duplicate.

image

···

On 15 October 2015 at 11:33, Wilson, Randy rwilson@msh.org wrote:

Hi all,

It seems to me from the pgadmin create table script below that the table trackedentitydatavalue table does not have a unique primary key constraint (programstageinstanceid+dataelementid). I theory adding that constraint would make it impossible to have duplicates.

here is the extra constraint needed:

CONSTRAINT pk_programstageinstanceid_dataelementid PRIMARY KEY (programstageinstanceid, dataelementid)

I hesitate to do it on our production instance because I wonder if the Java code should be checked to make sure the system will recover gracefully if a user tries to save the same record twice within a few seconds due to the latency issues discussed below.

Is that a change that can be tried by the developers and then moved into trunk?

current structure of table:


CREATE TABLE trackedentitydatavalue

(

programstageinstanceid integer NOT NULL,

dataelementid integer NOT NULL,

value character varying(255),

“timestamp” timestamp without time zone,

providedelsewhere boolean,

storedby character varying(31),

CONSTRAINT fk_entityinstancedatavalue_dataelementid FOREIGN KEY (dataelementid)

REFERENCES dataelement (dataelementid) MATCH SIMPLE

ON UPDATE NO ACTION ON DELETE NO ACTION,

CONSTRAINT fk_entityinstancedatavalue_programstageinstanceid FOREIGN KEY (programstageinstanceid)

REFERENCES programstageinstance (programstageinstanceid) MATCH SIMPLE

ON UPDATE NO ACTION ON DELETE NO ACTION

)

WITH (

OIDS=FALSE

);

ALTER TABLE trackedentitydatavalue

OWNER TO individualrecords;

Thanks,

Randy

This message and its attachments are confidential and solely for the intended recipients. If received in error, please delete them and notify the sender via reply e-mail immediately.

On Fri, Oct 9, 2015 at 6:35 PM, Knut Staring knutst@gmail.com wrote:

Seems like what we also experienced a lot for tLiberia, I think.

Knut


Randy Wilson
*Team Leader: *Knowledge Management, Data Use and Research

Rwanda Health System Strengthening Activity

Management Sciences for Health

Rwanda-Kigali

Direct: +250 788308835

E-mail: rwilson@msh.org

Skype: wilsonrandy_us

Stronger health systems. Greater health impact.

www.msh.org

On Fri, Oct 9, 2015 at 11:13 AM, Bob Jolliffe bobjolliffe@gmail.com wrote:

If you experience reasonably high latency (slow response times) it seems quite likely you might well get duplicates. Probably related to

https://bugs.launchpad.net/dhis2/+bug/1498505


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

Knut Staring

Dept. of Informatics, University of Oslo

Norway: +4791880522

Skype: knutstar

http://dhis2.org

On 9 October 2015 at 15:40, Wilson, Randy rwilson@msh.org wrote:

I will check but when i remove the duplicates analytics runs. Also usually we get many more duplicates 100 or more each time I check.

Randy

On Oct 9, 2015 3:48 PM, “Ant Snyman” ant@hisp.org wrote:

Hi Randy,
There is actually a constraint on the trackedentitydatavalue table and you cannot create two entries with the same programstageinstance and dataelementid. I notice that you are grouping on dataelement name, is it possible that you could have two dataelements with the same name in your dataelement table? I.e different dataelementids but the same name?

Regards

Ant

This message and any attachments are subject to a disclaimer published at http://www.hisp.org/policies.html#comms_disclaimer. Please read the disclaimer before opening any attachment or taking any other action in terms of this electronic transmission. If you cannot access the disclaimer, kindly send an email to disclaimer@hisp.org and a copy will be provided to you. By replying to this e-mail or opening any attachment you agree to be bound by the provisions of the disclaimer.

This message and its attachments are confidential and solely for the intended recipients. If received in error, please delete them and notify the sender via reply e-mail immediately.


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

On 9 October 2015 at 10:22, Wilson, Randy rwilson@msh.org wrote:

Hi all,

We have an big issue with several of Rwanda’s event reporting systems that have been set up using Tracker/Individual records. For some reason, users are able to create duplicate records in the system through normal data entry. It seems to happen when they enter a value for an attribute and then a couple of seconds or milliseconds later they change that value. The system then stores both records with last updated several milliseconds apart.

This is a big issue because analytics fails to run when it finds these duplicates.

Here is the query that finds the duplicates:

select programstageinstanceid, dataelementname, count(*) as numrecs from

(

SELECT

program.name AS programname,

program.description,

programinstance.dateofincident,

programinstance.trackedentityinstanceid,

programinstance.status,

programstageinstance.programstageinstanceid,

programstageinstance.programinstanceid,

programstageinstance.programstageid,

trackedentitydatavalue.value,

dataelement.name AS dataelementname

FROM

public.programstageinstance,

public.programinstance,

public.program,

public.trackedentitydatavalue,

public.dataelement

WHERE

programstageinstance.programinstanceid = programinstance.programinstanceid AND

program.programid = programinstance.programid AND

trackedentitydatavalue.programstageinstanceid = programstageinstance.programstageinstanceid AND

dataelement.dataelementid = trackedentitydatavalue.dataelementid

) as foo

group by programstageinstanceid, dataelementname

having count(*) >1

order by numrecs desc;

Here is the typical output of the query when there are duplicate records:

Programstageinstanceid, dataelementname,numrecs

777921;“TB Sample type_morning Sputum”;2

776886;“TB Date of collection_sample”;2

773132;“TB History of contact with TPB+ case”;2

Here is an example of the duplicates we see when we select the duplicated records from the trackedentitydatavalue table.

prograstageinstanceid, dataelementid,value,lastupdated,providedelsewhere,user

773132;35105;“true”;“2015-10-02 09:32:48.334”;FALSE;“abibarimana”

773132;35105;“true”;“2015-10-02 09:32:48.508”;FALSE;“abibarimana”

776886;31879;“2015-07-17”;“2015-10-07 16:25:26.341”;FALSE;“mugeni”

776886;31879;“2015-07-14”;“2015-10-07 16:25:26.546”;FALSE;“mugeni”

777921;35106;“true”;“2015-10-08 09:55:36.133”;FALSE;“emufasha”

777921;35106;“true”;“2015-10-08 09:55:36.278”;FALSE;“emufasha”

In order to clean up the trackedentitydatavalue table, I need to write these records to a temporary table, delete them from trackedentitydatavalue, then decide which of the duplicate pairs to delete before copying the correct ones back into trackedentitydatavalues.

This error seems to happen almost daily when the systems are in full use.

Is there a way we can put a constraint on the trackedentitydatavalue table so that it will not accept duplicate values for programstageinstanceid and dataelementid? I also think there must be an error in the java code that should be updating the first saved record with the one entered by the user later rather than inserting a new record.

This is a major bug because it causes analytics to fail and when it fails we cannot run any event reports to get data out of the system.


Randy Wilson
*Team Leader: *Knowledge Management, Data Use and Research

Rwanda Health System Strengthening Activity

Management Sciences for Health

Rwanda-Kigali

Direct: +250 788308835

E-mail: rwilson@msh.org

Skype: wilsonrandy_us

Stronger health systems. Greater health impact.

www.msh.org

This message and its attachments are confidential and solely for the intended recipients. If received in error, please delete them and notify the sender via reply e-mail immediately.


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

Ant Snyman

Cell: 0824910449

Landline: 028 2713242

Health Information Systems Program - SA