Is there an implicit restriction on overlapping categoryOptions between the categories in a categoryCombinations? I tried creating the category options ‘True’ and ‘False’, and two categories ‘Bool1’ and ‘Bool2’, both containing the category options true and false. When combining Bool1 and Bool2 in a categoryCombination I expected to get four categoryOptionCombinations:
True, True
True, False
False, True
False, False
Instead I got:
True, True
True, False, True, False
False, False
(note: True, False, True, False only referred to the category options once, only the name had four entries)
It appears that this is a known issue with a recommended fix in the documentation:
Duplicated category option combinations within a category combination.
Within each category combination, a unique set of category option combinations should exist. In certain circumstances, duplicate category option combinations may exist in the system. This usually results from changes to category combinations after they have been created, or direct manipulation of the various category tables in the database. This may result in certain data element/category option combinations not appearing or being unavailable in the data entry screens and/or analytics apps.
Severity: Severe
Recommendation: Duplicated category option combinations within a category combination will require you to merge category option combinations together. This will require direct manipulation of the database, and should always be conducted first in a testing environment. Only after you have thoroughly tested your procedure, and have confidence that it works, should you perform the procedure on your production environment. The DHIS2 implementation team has created a series of SQL functions to help you remove these duplicated COCs from your system. (source)
This does not seem to be the same bug. What I reported is not a case of duplicate category option combinations within a category combination, but rather overlapping category options between categories in a category combination. The resulting three category option combinations were perfectly unique, they just did not match my expectations, and I’m wondering if it is intentional or not.
Hi @augustsm
If I understand you correctly, you created two categories, each with category options “True” and “False”?
As noted in the docs, this type of design should not be used. Namely, you should not have any category options which are shared between categories within the same category combination. And as noted in the docs “This may result in certain data element/category option combinations not appearing or being unavailable in the data entry screens and/or analytics apps.”.
I am not sure what exactly you are trying to do, but normally this case happens when implementers use “Unknown” in one of their categeories. So you might have one category like “Gender” (Male, Female, Unknown) and Age (<15, 15+, Unknown). You could then create a category combination of Age + Gender. Now, what you should NOT do is to use the same “Unknown” in two categories which are part of the same category combination. You should instead use “Unknown gender” and “Unknown age” (two different category options).
Thanks @jason, this was what I wanted to know. Can you link to the place in the docs where it is mentioned?
On the question of what I’m trying to do, it relates to auto-import of data via the API. In order to do this reliably, I need to know how category option combinations work under the hood to make sure I reliably set the correct category option combination ID when calling the API. I have not found a comprehensive guide to this in the docs, so I have resorted to experimentation, which is why created the categories in question. Does such a guide exist?
Best regards, August
Edit: I should note that the end goal here is for our users to be able to create data sets outside DHIS2 and then import them into the solution for visualisation, sharing via API etc. Therefore I need to know what the limitations on that input data will be. Based on this conversation, it will not be possible for them to have multiple input columns with overlapping values like “True”, “False” or “Unknown”.
@jason Better late than never, but the section in the documentation that you linked to, describes another problem than the one I encountered. I assume that you wanted to link to this paragraph, but the section describes duplicate category option combinations within a category combination, not overlapping category options between categories in a category combination. The experiment I described in the initial post did not lead to duplicate category option combinations, it only lead to wrong category option combinations.
Hi @augustsm
Yes, that might not yet be documented actually.
Here is the link the actual integrity check in the source code.
Here is the description of the problem:
As a general rule, category options should be reused where possible between categories. The exception
to this rule however, is when you have a category combo with multiple categories, and within those
categories, a category option is shared. As a simple example, lets say you have a category called "Sex" with
options "Male"", "Female" and "Unknown". There is also a second category called "Age" with options "<15", "15+",
and "Unknown". A category combination called "Age/Sex" is then created with these two categories, which share
the option "Unknown". This situation should be avoided, as it creates issues when analyzing data.
We have added a number of new integrity checks in later versions of DHIS2 and I think this might be one where we are missing documentation. Nonetheless, the recommendation is that you should not share the same category options in two separate categories which are part of a category combination. This applies to all versions of DHIS2, not just the more recent ones where we actually have a check for this problem.