Persisting data validation results

Calle_Hedberg · 2 February 2018 11:06

Hi

Version 2.28 had a new optional feature for Data Validation called “Persist new result”. Many users do not understand/know how this is supposed to be used - partially because they are not familiar with IT techie terms like “Persist” (“Store” or “Save” would be more user-friendly), but mainly because the 2.28 release note and the documentation do not really SAY anything:

“(Optional) Select Persist new results to persist any non-persisted results found during the analysis”

···

Questions:

To “persist any non-persisted results” seems to indicate that certain parts of the data validation result ARE persisted even if this is not selected. What are those?

Note also that the Dev Manual, chapter 1.24, states that

"When running the scheduled validation task, any violations found will be persisted as

validation results. These results can be accessed trough the validation result api."

What is not clear, since all results are already persisted when running a scheduled task - does selecting “Persist new result” for custom validation runs simply do the same as what is automatically done for scheduled validation tasks?

How can users access the persisted results in the UI?
If stored validation results are only retrievable via the ValidationResults API end-point, you need to pass an id (uid?) to retrieve specific results. Is there any other way to determine specific ids than listing all of them and choose?

Regards

Calle

Calle Hedberg

46D Alma Road, 7700 Rosebank, SOUTH AFRICA

Tel/fax (home): +27-21-685-6472

Cell: +27-82-853-5352

Iridium SatPhone: +8816-315-19119

Email: calle.hedberg@gmail.com

Skype: calle_hedberg

Stian · 2 February 2018 18:41

Hi Calle,

I see your points. Although persist is a techie-term, I think store and save might also cause some confusion in this context, so maybe we should add a short description in addition to clarify. It’s hard to try and explain it completely, but an example would be:
“Store any violations found”

“Storing violations found during the process will allow you to generate analytical data based on the violations” (Also available trough api, and keeps track of notifications sent if validation rule has notifications. By not storing the violations, notifications will be sent again if checked.)

If the documentation is missing or lacking in informasjon, that would be my fault. I’ll look into it and add/update the documentation to better explain the feature.

To “persist any non-persisted results” seems to indicate that certain parts of the data validation result ARE persisted even if this is not selected. What are those?

When persisting (storing) violations, only violations not already in the database (stored from a previous job) will be stored.

That means if you run a job for the same rule, period and orgUnits twice, and the first time there are 3 violations which you persist, then the second time (after some data entry) you get an additional 2 violations (So you will actually see that there are 5 violations, but only 3 of them is persisted already). The non-persisted results then refer to the 2 new violations.

What is not clear, since all results are already persisted when running a scheduled task - does selecting “Persist new result” for custom validation runs simply do the same as what is automatically done for scheduled validation tasks?

Some instances does not run the scheduled job, but would still like to persist the violations. This is due to the instance having huge amounts of violations, and it would be too big of a job to actually run the scheduled job which has some hardcoded parameters. In their case running only for a small subset of their data is the only option.

How can users access the persisted results in the UI?

Persisted results is named “ValidationResults”, which can be accessed only trough the api (/api/validationResults). Additionally, based on the ValidationResults, the analytics job will generate a analytics table based on these results and this data can be accessed trough the analytics api. Currently there is no UI to see this data yet.

If stored validation results are only retrievable via the ValidationResults API end-point, you need to pass an id (uid?) to retrieve specific results. Is there any other way to determine specific ids than listing all of them and choose?

In this case, you need an id (of type integer not a traditional uid). There currently doesn’t exist any way to get more specific results, mainly because this object initially only was supposed to be used internally. However since it seems there is more demand to look at this information, I could improve the endpoint to make it more useable, including adding the normal uid as well. If this sounds interesting, it would be greatly appriciated if you created a jira issue pointing out which changes you would like to see, and we could see what we can do with that.

Hope this answers most of your questions.

···

On Fri, Feb 2, 2018 at 12:06 PM, Calle Hedberg calle.hedberg@gmail.com wrote:

Hi

Version 2.28 had a new optional feature for Data Validation called “Persist new result”. Many users do not understand/know how this is supposed to be used - partially because they are not familiar with IT techie terms like “Persist” (“Store” or “Save” would be more user-friendly), but mainly because the 2.28 release note and the documentation do not really SAY anything:

“(Optional) Select Persist new results to persist any non-persisted results found during the analysis”

Questions:

To “persist any non-persisted results” seems to indicate that certain parts of the data validation result ARE persisted even if this is not selected. What are those?

Note also that the Dev Manual, chapter 1.24, states that

"When running the scheduled validation task, any violations found will be persisted as

validation results. These results can be accessed trough the validation result api."

What is not clear, since all results are already persisted when running a scheduled task - does selecting “Persist new result” for custom validation runs simply do the same as what is automatically done for scheduled validation tasks?

How can users access the persisted results in the UI?

If stored validation results are only retrievable via the ValidationResults API end-point, you need to pass an id (uid?) to retrieve specific results. Is there any other way to determine specific ids than listing all of them and choose?

Regards

Calle

Calle Hedberg

46D Alma Road, 7700 Rosebank, SOUTH AFRICA

Tel/fax (home): +27-21-685-6472

Cell: +27-82-853-5352

Iridium SatPhone: +8816-315-19119

Email: calle.hedberg@gmail.com

Skype: calle_hedberg

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

–

Stian Sandvold
Software developer, DHIS2

University of Oslo

http://www.dhis2.org

Calle_Hedberg · 2 February 2018 19:30

Stian,

We will write a JIRA issue for this - but let me just point out immediately:

Normal users cannot be expected to understand/use the API, so with a UI interface to do some kind of validation reporting, this feature is much less useful that what it could be.
The idea of NOT storing repeats of the same violations might be a problem, mainly because user are often particularly interested in repeat violations (Read: which validation violations have NOT been fixed during the last 1 or 2 or 3 monthly validation runs? Bring out the tar and feathers…)

I for one would like to see a dashboard item listing

Number of violations detected during previous validation run
Number of violations gone (issue fixed since last run)
Number of last run violations remaining (not fixed)
Number of violations detected >5 times
Number of violations detected 3-5 times
Number of violations detected 1-2 times

etc

Another typical report would be a ranking list for data capturers - either highest number of violations (worst performance) or lowest number (best performance), preferably only using data value records NOT marked for follow-up.

Regards

Calle

···

On 2 February 2018 at 20:41, Stian Sandvold stian@dhis2.org wrote:

Hi Calle,

I see your points. Although persist is a techie-term, I think store and save might also cause some confusion in this context, so maybe we should add a short description in addition to clarify. It’s hard to try and explain it completely, but an example would be:
“Store any violations found”

“Storing violations found during the process will allow you to generate analytical data based on the violations” (Also available trough api, and keeps track of notifications sent if validation rule has notifications. By not storing the violations, notifications will be sent again if checked.)

If the documentation is missing or lacking in informasjon, that would be my fault. I’ll look into it and add/update the documentation to better explain the feature.

To “persist any non-persisted results” seems to indicate that certain parts of the data validation result ARE persisted even if this is not selected. What are those?

When persisting (storing) violations, only violations not already in the database (stored from a previous job) will be stored.

That means if you run a job for the same rule, period and orgUnits twice, and the first time there are 3 violations which you persist, then the second time (after some data entry) you get an additional 2 violations (So you will actually see that there are 5 violations, but only 3 of them is persisted already). The non-persisted results then refer to the 2 new violations.

What is not clear, since all results are already persisted when running a scheduled task - does selecting “Persist new result” for custom validation runs simply do the same as what is automatically done for scheduled validation tasks?

How can users access the persisted results in the UI?
Some instances does not run the scheduled job, but would still like to persist the violations. This is due to the instance having huge amounts of violations, and it would be too big of a job to actually run the scheduled job which has some hardcoded parameters. In their case running only for a small subset of their data is the only option.

Persisted results is named “ValidationResults”, which can be accessed only trough the api (/api/validationResults). Additionally, based on the ValidationResults, the analytics job will generate a analytics table based on these results and this data can be accessed trough the analytics api. Currently there is no UI to see this data yet.

If stored validation results are only retrievable via the ValidationResults API end-point, you need to pass an id (uid?) to retrieve specific results. Is there any other way to determine specific ids than listing all of them and choose?

In this case, you need an id (of type integer not a traditional uid). There currently doesn’t exist any way to get more specific results, mainly because this object initially only was supposed to be used internally. However since it seems there is more demand to look at this information, I could improve the endpoint to make it more useable, including adding the normal uid as well. If this sounds interesting, it would be greatly appriciated if you created a jira issue pointing out which changes you would like to see, and we could see what we can do with that.

Hope this answers most of your questions.

–

On Fri, Feb 2, 2018 at 12:06 PM, Calle Hedberg calle.hedberg@gmail.com wrote:

Hi

Version 2.28 had a new optional feature for Data Validation called “Persist new result”. Many users do not understand/know how this is supposed to be used - partially because they are not familiar with IT techie terms like “Persist” (“Store” or “Save” would be more user-friendly), but mainly because the 2.28 release note and the documentation do not really SAY anything:

“(Optional) Select Persist new results to persist any non-persisted results found during the analysis”

Questions:

To “persist any non-persisted results” seems to indicate that certain parts of the data validation result ARE persisted even if this is not selected. What are those?

Note also that the Dev Manual, chapter 1.24, states that

"When running the scheduled validation task, any violations found will be persisted as

validation results. These results can be accessed trough the validation result api."

What is not clear, since all results are already persisted when running a scheduled task - does selecting “Persist new result” for custom validation runs simply do the same as what is automatically done for scheduled validation tasks?

How can users access the persisted results in the UI?

If stored validation results are only retrievable via the ValidationResults API end-point, you need to pass an id (uid?) to retrieve specific results. Is there any other way to determine specific ids than listing all of them and choose?

Regards

Calle

Calle Hedberg

46D Alma Road, 7700 Rosebank, SOUTH AFRICA

Tel/fax (home): +27-21-685-6472

Cell: +27-82-853-5352

Iridium SatPhone: +8816-315-19119

Email: calle.hedberg@gmail.com

Skype: calle_hedberg

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Stian Sandvold
Software developer, DHIS2

University of Oslo

http://www.dhis2.org

–

Calle Hedberg

46D Alma Road, 7700 Rosebank, SOUTH AFRICA

Tel/fax (home): +27-21-685-6472

Cell: +27-82-853-5352

Iridium SatPhone: +8816-315-19119

Email: calle.hedberg@gmail.com

Skype: calle_hedberg