Dataset Canary Question and Discussion

Hello DHIS2 List,

I’ve got an interesting problem, and I’ve already cobbled together a solution in test, but I wanted to pose it to the community before I went through with it:

image

Quick Background:

We’ve transitioned from one dataset form to another in the past 12 months in Vanuatu, but it wasn’t done cleanly. So there are about 12 months time (roughly June 2013 - June 2014) where some facilities were filling out the old form/dataset, and some were filling out the new one. When we created the new form we tried our best to reuse as many data elements as possible to minimize the number of new ones, and minimize the institutional knowledge needed to make a true and compelling interpretations of the data (via charts/graphs). This, however, has created a problem where some elements will populate in each dataset ‘version’ so to the untrained eye, it is difficult to know which exact dataset was used to fill it out.

Problem

How do I let the user know that the dataset/timeperiod combo is not the dataset/timeperiod combo used to originally used during data entry?

Solutions

I came up with 2 ideas on how to solve this:

  1. Include a Dataset Canary data element to record the UID of the form filling it out, use javascript to alert the user and fill out the element for future forms.

Pros: Workable without changing the database or DHIS2, all javascript inside the dataset, I can do it with existing tools

Cons: Susceptible to failure on DHIS2 upgrades, anyone updating the datasets/forms in the future will need to be aware of the javascript and how it works, extra dataelement, not very extensible

  1. Add a column called ‘datasetid’ to ‘datavalue’ that is a foreign key to the ‘dataset’ table’s ‘datasetid’. Every datavalue will then have a reference as to which dataset enterred it

Pros: Not as easy to break during upgrades because it is baked into DHIS2, myriad ways to alert the user and can have features like ‘click to go to correct dataset/entryform’, works automatically

Cons: Requires development work and community buy in, no simple solution for legacy datavalues that are not currently associated with a dataset (where there are more than one possible dataset).

  1. Other ideas?

My Solution, the Dataset Canary Explained:

1. The Legacy Data

I knew which form was which just because all of our first form were actually imported from Microsoft Access, and during the import a comment was added to each row in ‘datavalue’ denoting its origin. So after creating a new dataelement called “Dataset Canary” I inserted an entry for each form for that new element with the UID of the dataset it came from. We had about 16,000 of these forms (OrgUnit/TimePeriod combos), so it was 16,000 new entries in ‘datavalue’

INSERT INTO datavalue (dataelementid, periodid, sourceid, categoryoptioncomboid, value, storedby, lastupdated, comment, followup, attributeoptioncomboid)

SELECT de.dataelementid AS dataelementid,

dv.periodid AS periodid, 

dv.sourceid AS sourceid, 

etc…

2. Hidden Span inside the dataentry form:

Dataset Canary:&nbsp;<input id="Zdx2UHSluH7-efg0iPzSOl7-val" style="width: 7em; text-align: center;" title="Zdx2UHSluH7 - Dataset Canary - efg0iPzSOl7 - (default) - string" value="[ Dataset Canary (default) ]" /></p>

3. Javascript to evaluate and alert the user and Div to appear or not appear (I stole the class ‘inputCriteria’ and adjusted the colors a bit)

$(document).on(‘dhis2.de.event.dataValuesLoaded’, function(e) {

	vuSelectedDataset = dhis2.de.currentDataSetId;

//If the dataset canary element does not match the current dataset, then alert the user that they are viewing elements enterred from another dataset

if($("#Zdx2UHSluH7-efg0iPzSOl7-val").val() != vuSelectedDataset

		&& $("#Zdx2UHSluH7-efg0iPzSOl7-val").val() != "") { //Currently I compare this to nothing as well because the newer datasets don't have the canary element populated yet.

	$("#VUJS_datasetcanarywarning").css("display", "block"); //show the error div

} else {

	$("#VUJS_datasetcanarywarning").css("display", "none"); //hide the error div (if it was shown for a previous date, which doesn't reload everything it seems)

}

} );

and the div:

<div class="selectionLabel" style="width: 500px;">This form was filled out in the Access Based VanPHIS, please use the <em>VanPhis Health Facilities (2007-2014)</em> dataset listed above to view this form.</div>

Repeat step 3 for each form

4. Update the datacanary for each dataset/orgid/timeperiod combo as they are filled out.

I’ll need to put in some more javascript to fill out the Dataset Canary for each new form as well, still working on how to accomplish that. (So yup, I’m not 100% complete, but getting there).

I know it was a long email, and thanks to anyone who made it down here. If you have any feedback or other ideas I’d greatly appreciate hearing them :).

Timothy Harding
PeaceCorps Volunteer

Republic of Vanuatu

hardingt@gmail.com

+678 5955137

Hi Tim,

I guess the question I would have to you is why this matters in the data entry form?

I would think the normal way to do this in DHIS2 would be to “complete” the records for the legacy dataset when it would be applicable. It would then be possible to see through the analytics where you have data and where you do not have data. Just seems like a lot of work to display the canary, when it would be already visible with the data set completion information.

Regards,

Jason

image

···

On Oct 27, 2014 5:40 AM, “Timothy Harding” hardingt@gmail.com wrote:

Hello DHIS2 List,

I’ve got an interesting problem, and I’ve already cobbled together a solution in test, but I wanted to pose it to the community before I went through with it:

Inline image 1

Quick Background:

We’ve transitioned from one dataset form to another in the past 12 months in Vanuatu, but it wasn’t done cleanly. So there are about 12 months time (roughly June 2013 - June 2014) where some facilities were filling out the old form/dataset, and some were filling out the new one. When we created the new form we tried our best to reuse as many data elements as possible to minimize the number of new ones, and minimize the institutional knowledge needed to make a true and compelling interpretations of the data (via charts/graphs). This, however, has created a problem where some elements will populate in each dataset ‘version’ so to the untrained eye, it is difficult to know which exact dataset was used to fill it out.

Problem

How do I let the user know that the dataset/timeperiod combo is not the dataset/timeperiod combo used to originally used during data entry?

Solutions

I came up with 2 ideas on how to solve this:

  1. Include a Dataset Canary data element to record the UID of the form filling it out, use javascript to alert the user and fill out the element for future forms.

Pros: Workable without changing the database or DHIS2, all javascript inside the dataset, I can do it with existing tools

Cons: Susceptible to failure on DHIS2 upgrades, anyone updating the datasets/forms in the future will need to be aware of the javascript and how it works, extra dataelement, not very extensible

  1. Add a column called ‘datasetid’ to ‘datavalue’ that is a foreign key to the ‘dataset’ table’s ‘datasetid’. Every datavalue will then have a reference as to which dataset enterred it

Pros: Not as easy to break during upgrades because it is baked into DHIS2, myriad ways to alert the user and can have features like ‘click to go to correct dataset/entryform’, works automatically

Cons: Requires development work and community buy in, no simple solution for legacy datavalues that are not currently associated with a dataset (where there are more than one possible dataset).

  1. Other ideas?

My Solution, the Dataset Canary Explained:

1. The Legacy Data

I knew which form was which just because all of our first form were actually imported from Microsoft Access, and during the import a comment was added to each row in ‘datavalue’ denoting its origin. So after creating a new dataelement called “Dataset Canary” I inserted an entry for each form for that new element with the UID of the dataset it came from. We had about 16,000 of these forms (OrgUnit/TimePeriod combos), so it was 16,000 new entries in ‘datavalue’

INSERT INTO datavalue (dataelementid, periodid, sourceid, categoryoptioncomboid, value, storedby, lastupdated, comment, followup, attributeoptioncomboid)

SELECT de.dataelementid AS dataelementid,

dv.periodid AS periodid,

dv.sourceid AS sourceid,

etc…

2. Hidden Span inside the dataentry form:

Dataset Canary: 

3. Javascript to evaluate and alert the user and Div to appear or not appear (I stole the class ‘inputCriteria’ and adjusted the colors a bit)

$(document).on(‘dhis2.de.event.dataValuesLoaded’, function(e) {

  vuSelectedDataset = dhis2.de.currentDataSetId;

//If the dataset canary element does not match the current dataset, then alert the user that they are viewing elements enterred from another dataset

if($(“#Zdx2UHSluH7-efg0iPzSOl7-val”).val() != vuSelectedDataset

  	&& $("#Zdx2UHSluH7-efg0iPzSOl7-val").val() != "") { //Currently I compare this to nothing as well because the newer datasets don't have the canary element populated yet.
  $("#VUJS_datasetcanarywarning").css("display", "block"); //show the error div

} else {

  $("#VUJS_datasetcanarywarning").css("display", "none"); //hide the error div (if it was shown for a previous date, which doesn't reload everything it seems)

}

} );

and the div:

This form was filled out in the Access Based VanPHIS, please use the VanPhis Health Facilities (2007-2014) dataset listed above to view this form.

Repeat step 3 for each form

4. Update the datacanary for each dataset/orgid/timeperiod combo as they are filled out.

I’ll need to put in some more javascript to fill out the Dataset Canary for each new form as well, still working on how to accomplish that. (So yup, I’m not 100% complete, but getting there).

I know it was a long email, and thanks to anyone who made it down here. If you have any feedback or other ideas I’d greatly appreciate hearing them :).

Timothy Harding
PeaceCorps Volunteer

Republic of Vanuatu

hardingt@gmail.com

+678 5955137


Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Jason,

Agreed that it is a lot of work for a few small possible cases that erroneous data may be entered. The reasoning is often we need to review older forms, especially when we find values outside the Min/Max. It helps the person reviewing the forms to know which one was used to enter the values in the first place. Because many of the elements, but not all, are used in both forms, the person looking has no easy way to know which form was used to ‘complete’ it. If I recall correctly, when we generated the completion data for the legacy data here, we generated it for the new forms, so that complete bit wouldn’t tell the user in this case. Additionally, it is at the bottom of the page, where they may not think to look right away.

I just noticed another con in this solution, it assumes only one dataset can be completed per org unit/timeperiod combo. That constraint works for us now, but it may create a stumbling block if we incorporate supplemental datasets in the future. I’ll need to think of a way to bake that possibility in. The first thing I can think of is one Canary per form type.

image

···

Timothy Harding
PeaceCorps Volunteer

Republic of Vanuatu

hardingt@gmail.com

+678 5955137

On Mon, Oct 27, 2014 at 6:22 PM, Jason Pickering jason.p.pickering@gmail.com wrote:

Hi Tim,

I guess the question I would have to you is why this matters in the data entry form?

I would think the normal way to do this in DHIS2 would be to “complete” the records for the legacy dataset when it would be applicable. It would then be possible to see through the analytics where you have data and where you do not have data. Just seems like a lot of work to display the canary, when it would be already visible with the data set completion information.

Regards,

Jason

On Oct 27, 2014 5:40 AM, “Timothy Harding” hardingt@gmail.com wrote:

Hello DHIS2 List,

I’ve got an interesting problem, and I’ve already cobbled together a solution in test, but I wanted to pose it to the community before I went through with it:

Quick Background:

We’ve transitioned from one dataset form to another in the past 12 months in Vanuatu, but it wasn’t done cleanly. So there are about 12 months time (roughly June 2013 - June 2014) where some facilities were filling out the old form/dataset, and some were filling out the new one. When we created the new form we tried our best to reuse as many data elements as possible to minimize the number of new ones, and minimize the institutional knowledge needed to make a true and compelling interpretations of the data (via charts/graphs). This, however, has created a problem where some elements will populate in each dataset ‘version’ so to the untrained eye, it is difficult to know which exact dataset was used to fill it out.

Problem

How do I let the user know that the dataset/timeperiod combo is not the dataset/timeperiod combo used to originally used during data entry?

Solutions

I came up with 2 ideas on how to solve this:

  1. Include a Dataset Canary data element to record the UID of the form filling it out, use javascript to alert the user and fill out the element for future forms.

Pros: Workable without changing the database or DHIS2, all javascript inside the dataset, I can do it with existing tools

Cons: Susceptible to failure on DHIS2 upgrades, anyone updating the datasets/forms in the future will need to be aware of the javascript and how it works, extra dataelement, not very extensible

  1. Add a column called ‘datasetid’ to ‘datavalue’ that is a foreign key to the ‘dataset’ table’s ‘datasetid’. Every datavalue will then have a reference as to which dataset enterred it

Pros: Not as easy to break during upgrades because it is baked into DHIS2, myriad ways to alert the user and can have features like ‘click to go to correct dataset/entryform’, works automatically

Cons: Requires development work and community buy in, no simple solution for legacy datavalues that are not currently associated with a dataset (where there are more than one possible dataset).

  1. Other ideas?

My Solution, the Dataset Canary Explained:

1. The Legacy Data

I knew which form was which just because all of our first form were actually imported from Microsoft Access, and during the import a comment was added to each row in ‘datavalue’ denoting its origin. So after creating a new dataelement called “Dataset Canary” I inserted an entry for each form for that new element with the UID of the dataset it came from. We had about 16,000 of these forms (OrgUnit/TimePeriod combos), so it was 16,000 new entries in ‘datavalue’

INSERT INTO datavalue (dataelementid, periodid, sourceid, categoryoptioncomboid, value, storedby, lastupdated, comment, followup, attributeoptioncomboid)

SELECT de.dataelementid AS dataelementid,

dv.periodid AS periodid, 
dv.sourceid AS sourceid, 

etc…

2. Hidden Span inside the dataentry form:

Dataset Canary:&nbsp;<input id="Zdx2UHSluH7-efg0iPzSOl7-val" style="width: 7em; text-align: center;" title="Zdx2UHSluH7 - Dataset Canary - efg0iPzSOl7 - (default) - string" value="[ Dataset Canary (default) ]" /></p>

3. Javascript to evaluate and alert the user and Div to appear or not appear (I stole the class ‘inputCriteria’ and adjusted the colors a bit)

$(document).on(‘dhis2.de.event.dataValuesLoaded’, function(e) {

  vuSelectedDataset = dhis2.de.currentDataSetId;
//If the dataset canary element does not match the current dataset, then alert the user that they are viewing elements enterred from another dataset
if($("#Zdx2UHSluH7-efg0iPzSOl7-val").val() != vuSelectedDataset
  	&& $("#Zdx2UHSluH7-efg0iPzSOl7-val").val() != "") { //Currently I compare this to nothing as well because the newer datasets don't have the canary element populated yet.
  $("#VUJS_datasetcanarywarning").css("display", "block"); //show the error div
} else {
  $("#VUJS_datasetcanarywarning").css("display", "none"); //hide the error div (if it was shown for a previous date, which doesn't reload everything it seems)
}

} );

and the div:

<div class="selectionLabel" style="width: 500px;">This form was filled out in the Access Based VanPHIS, please use the <em>VanPhis Health Facilities (2007-2014)</em> dataset listed above to view this form.</div>

Repeat step 3 for each form

4. Update the datacanary for each dataset/orgid/timeperiod combo as they are filled out.

I’ll need to put in some more javascript to fill out the Dataset Canary for each new form as well, still working on how to accomplish that. (So yup, I’m not 100% complete, but getting there).

I know it was a long email, and thanks to anyone who made it down here. If you have any feedback or other ideas I’d greatly appreciate hearing them :).

Timothy Harding
PeaceCorps Volunteer

Republic of Vanuatu

hardingt@gmail.com

+678 5955137


Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp