How to deal with historical data when adding disaggregation to existing data elements

HaydnJ · 24 February 2023 14:38

Hi all,

I’m wondering what people would do in the situation that they are asked to add disaggregation to a data element that previously never had it. What would be your way of dealing with it.

For this example imagine we have a data element that records the total number of new cases of a particular disease. But a few years later it is decided that we want to record all the different categories of the disease, so a categeory combination is added to this data element that allows us to record how many new cases of each category come in.

How would you recommend dealing with the historical data? Some thoughts I have:

Leave it as is, the aggregate system will still count these non-disaggregated values, but the user will not be able to see this data disaggregated (I believe this is how DHIS2 aggregation behaves, back when I accidentally loaded a bunch of data without categoryComboOptions).

Pro: Easy
Con: Historical data seemingly disappears when looking at the disaggreated data elements

Create a new category option combo called unknown and update all the historical values to sit in there.

Pro: Historical data is now visible when disaggregated, even if it isn’t particularly useful.
Con: Requires a lot of shuffling of data about, and unecesarrily bloats the disaggregations.

Put the historical data somewhere else and forget about it

Pro: The data is sitting in some historical dataset, not upsetting the new data coming in
Con: Less than ideal…

I am really hoping there is some out of the box solution or recommended way of dealing with this, hope the community can help.

Shurajit_Dutta · 24 February 2023 17:55

Hi there @HaydnJ

In this case, I would recommend you explore the category combo override function

If your current data element/data is not disaggregated, this means the data will sit in the default category option combo. If you then apply the override, any NEW data will sit in the new category option combos. In this way, you will be able to view all of your historical data together when you do not select any disaggregations. If you start disaggregating your data in analysis however, you will not be able to display the historical data by these disaggregations.

I have an exercise that I use in training this concept, it is attached for your reference.

Metadata Creation Exercise.pdf (57.3 KB)

If you do end up making a new data element and applying a category combo to it, then the best way to link this data would be to make an indicator summing the old + new DE. That way you can still view the data over time; but this would not be my recommended approach given the existence of the above feature. Over time this gets really messy and you will just be adding more and more DEs to your system. By overriding the category combo, you are able to make changes using your already existing DEs.

bobj · 24 February 2023 18:09

Not disagreeing with @Shurajit_Dutta on this, but I have some thoughts about @HaydnJ 's option 2 as well. Motivated from a different place.

During a recent exercise trying to extract some tracker data into aggregate I have come to appreciate the value of having an Unknown option in all/many/most categories. This particular aggregate data had the common age/sex breakdown. The aggregate age group being calculated from the TEI date of birth. The problem was that the source tracker data had quite a number of cases where the DOB hadn’t been entered. And a few where there was no sex captured. Without modifying the categories to cater for these unknowns, significant numbers were getting lost in the count.

So in principle I think it is almost always a good idea to have an unknown option within a category from the outset. To cater for importing data where the disaggregation is unknown. And which in your case could also cater for the historical data.

The downside I accept is having yet more categoryoptioncombos

HaydnJ · 27 February 2023 15:12

Thank you @Shurajit_Dutta and @bobj for the recommendations, definitely some food for thought.

I am going to reverse-delegate this decision to the program managers as ultimately they are going to be the ones making use of the data. But yeah, as much as I don’t want to explode my data sets with even more category combinations, it is the best solution in terms of data quality. What I didn’t mention in the original post was that some of these data elements actually are disaggregated already, but are having additional categories added to the category combination, so simply stuffing all the data into the default category would result in losing some disaggregation data.

This is such a mess…

Edit:

@Shurajit_Dutta I have just read the links in your reply and actually it has got me thinking this is an ideal solution.

Can I create category combinations that will fit the historical data, and then import that data using the dataValueSets API and the new category option combo ids, rather than overriding the category combination in the dataset (which will be using the new disaggregations going forward)?

If so that will sort the problem perfectly. I can have my cake and eat it.

bobj · 8 March 2023 15:40

@HaydnJ I feel your pain where the data is already disaggregated and folk want to do seemingly reasonable things (like adding a category option) which in fact create really difficult problems.

Here is an untested thought: did you consider exporting the troubled data as ADX. And then do whatever you need to do with the metadata and then just import the data again (as ADX). The beauty of ADX is that it is blissfully unaware of that hellish construction called a categoryoptioncombo. It might just work

HaydnJ · 8 March 2023 15:53

Hi Bob,

I did it the hard way and created new category option combos for old data, importing it using the newly created category option combo ids.

This appears to have worked.

I need to put some time aside to learn ADX though, so thanks for the added incentive!