We have a DHIS instance using 2.37 with about 3000 Tracker records. We need to de-identify some of the data being collected before handing over the instance to an external consultant. Might anyone have any tools or tips to make this process easier?
My understanding of the data which needs to be de-identified:
- a date (data element)
- a text field containing gender (data element, text)
- a DHIS-generated sequential ID (TEI, text)
- a manually-entered ID (TEI, text) – values can be removed rather than reassigned
We’re thinking of incorporating Excel and SQL updates or DHIS import to do what is needed, but are open to other ideas. TIA.
I think it will be extremely difficult to de-identify a whole instance. Personal identifiable data (PID) includes anything that can identify an individual patient, including visit dates and locations. (If a friend or relative knows the dates a patient visited a facility, they can easily query the data set to identify them and then obtain confidential information such as test results or services received.)
And there are many places in a DHIS2 instance in which data can end up being retained (including soft deletes and logs). If you just remove obvious identifiers, you are effectively producing ‘pseudonymised’ data, rather than fully anonymised data, and you still need to put very tight controls in place around how this pseudonymised data is shared.
What is the use case for this data sharing? Could the consultant use a data export (which can then be aggregated to hide individual records)? Or could a tight confidentiality agreement be signed with the consultant to ensure that pseudonymised data is only used by a very restricted circle of staff (who have all in turn signed confidentiality agreements), and then destroyed once the project is completed?
This was a very helpful response, Sam. Thank you.