DHIS2 API Request Error 409: Data Elements Conflict in Analytics Query for Aggregation Type

Jake_Lamers · 11 September 2024 13:15

Hello I am trying to download all data for a DE_GROUP for a specific org level but I keep getting the same error.

dhis2.exceptions.RequestException: code: 409, url: https://sl.dhis2.org/hmis23/api/analytics?dimension=pe:LAST_MONTH&dimension=dx:DE_GROUP-j3ghXQXN6
5o&dimension=ou:LEVEL-6;.json, description: {"httpStatus":"Conflict","httpStatusCode":409,"status":"ERROR","message":"Data elements must be of a value and aggregation type that allow aggregation: `[lBYF7dshoF3]`","errorCode":"E7115"}

I recieve the same error if I try and download the data directly through the Data Visualizer.

Here is my code

# Import necessary libraries and modules
from dhis2 import Api                  # API class for connecting to DHIS2
from get_data import get_credentials   # Function to get username and password
from io import StringIO                # StringIO to handle string-based data as file-like objects
from tqdm import tqdm                  # tqdm for progress bars (not used here, but typically useful for loops)
from get_org_hier import get_org_heir  # Custom module for getting organization hierarchy (spelling 'heir' likely should be 'hier')

import pandas as pd                    # pandas for data manipulation and analysis
import json                            # json for handling JSON responses (not used here)
import re                              # re for regular expressions (not used here)
import os                              # os for interacting with the operating system (not used here)

# Step 1: Get credentials (username and password) to authenticate the API connection
username, password = get_credentials()

# Step 2: Initialize the API connection to DHIS2 using the given credentials
api = Api('https://sl.dhis2.org/hmis23', username, password)

# Step 3: Define the time period, data element, and organizational level for data extraction
period = 'LAST_MONTH'  # Define the time period as the last month
data_element = 'j3ghXQXN65o'  # Define the data element group ID (from DHIS2)
org = 'LEVEL-6'  # Specify organizational unit level (e.g., level-6 in the hierarchy)

# Step 4: Build the query string for DHIS2 analytics API to retrieve the required data
query_string = (
    f'analytics?dimension=pe:{period}'             # Add period dimension
    f'&dimension=dx:DE_GROUP-{data_element}'       # Add data element group dimension
    f'&dimension=ou:{org};'                        # Add organizational unit dimension
)

# Step 5: Fetch the data from DHIS2 API using the constructed query string
response = api.get(query_string)

# Step 6: Load the API response into a pandas DataFrame
# The response text is assumed to be in CSV format, so we use StringIO to treat it as a file-like object
df = pd.read_csv(StringIO(response.text))

# Step 7: Display the DataFrame's column names
print(df.columns)

# Step 8: Print the entire DataFrame to inspect the fetched data
print(df)

Thank you for your help.

jason · 11 September 2024 16:10

Hi @Jake_Lamers ,
The message is a bit cryptic, but what is happening is here is that the DHIS2 analytics API cannot aggregate data which is not numeric. I suspect that the data elements which you are attempting to query are perhaps text or some other non-numeric data type?

Best regards,
Jason

Jake_Lamers · 11 September 2024 16:38

Hello @jason,

Do you know if there is a way to filter out all data that is non-numeric or just request the non-aggregate group data from the DHIS2 analytics API? I have been sending a request for each indicator but that is rather slow.

Thank you for your response,

Jake

jason · 11 September 2024 16:42

I think you would need to handle that in your analytics request itself. You can query to the dataElements API first and determine what the type of the data element actually is. If its not numeric, you cannot aggregate it, and you can only query that data at the level at which it is entered. So, if it is entered at say the facility level, you will need to get the data at that level. For numeric data, the analytics API can aggregate the data for you. But for text, you will need to handle how that data is aggregated in your code.

Jake_Lamers · 11 September 2024 16:54

Would it be through the query string so like some Query parameter isNumeric=True or would I have to make seperate queries to determine the types of data?

query_string = (
    f'analytics?dimension=pe:{period}'             # Add period dimension
    f'&dimension=dx:DE_GROUP-{data_element}'       # Add data element group dimension
    f'&dimension=ou:{org}'                        # Add organizational unit dimension
    f'&isNumeric=True;'
)

jason · 11 September 2024 17:52

@Jake_Lamers does that parameter exist in the API documentation? If so, then yeah, that seems like a good route. I am not sure though but please let me know if that is the case.

Jake_Lamers · 11 September 2024 18:53

I am not sure if there is a param like isNumeric I just gave a generic example sorry, but after reviewing the documentation, I’ve identified some potential candidates. However, I’m still unsure about what the options mean or if they’re what I’m looking for. Since I am off, I’ll try them out tomorrow and provide an update afterward.

measureCriteria	Required?	Description	Options
measureCriteria	No	Filters applied to the data/measures.	EQ GT GE LT LE
preAggregationMeasureCriteria	No	Filters applied to the data/measures before aggregation.	EQ GT GE LT LE

jason · 12 September 2024 06:26

@Jake_Lamers well, if the paramater does not exist in the in the analytics API documentation, then it will not work.

Those two parameters will not do what you want.

I think you have two choices.

Filter out the non-numeric data elements prior to making the query.
Query the data at the level at which is was entered. If you do that, then you will be able to retrieve non-numeric data. If you then need to subsequently aggregate it, you can handle that client side in your Python code.

Best regards,
Jason

Jake_Lamers · 12 September 2024 10:17

Okay for the potential solutions

Are you aware of a way to query the Names, ids and data types of the elements of a data group? I have found the following post but the program indicators are empty.
The level I am currently querying at is the lowest level which I believe is the level that it is entered at. Regardless I have tried every level and I receive the same error. I also checked the smallest period to ensure it was not aggregating based on time and that did not work either. However interestingly if I place the data into the filter box it runs what appears to be an aggregate of all the data.

Jake_Lamers · 12 September 2024 11:12

Hello the following code should query all data types for each data element in a data group. You can then filter based on the types you want and request for only those elements avoiding the aggregation error.


# Import necessary libraries and modules
from dhis2 import Api              # DHIS2 API library to handle requests
from get_credentials import get_credentials  # Function to retrieve credentials
from io import StringIO            # Allows us to treat strings as file-like objects
from tqdm import tqdm              # Progress bar for iterations

import pandas as pd                # Pandas for data handling (not used here but can be useful)
import json                        # To handle JSON formatted data

# Step 1: Get credentials (username and password) to authenticate the API connection
username, password = get_credentials()

# Step 2: Initialize the API connection to DHIS2 using the provided credentials
api = Api('https://sl.dhis2.org/hmis23', username, password)

# Step 3: Define the Community Based Services (CBS) data element group details
CBS_DATA_UID = {
    "name": "HF04 - Community Based Services",   # Name of the data element group
    "id": "j3ghXQXN65o"                          # Unique ID for the group
}

# Step 4: Construct the query string to get all data element IDs within the specified group
query_string = f'dataElementGroups/{CBS_DATA_UID["id"]}?fields=dataElements[]'

# Step 5: Fetch the data from the DHIS2 API using the constructed query string
response = api.get(query_string)

# Step 6: Load the API response (in JSON format) into a variable
# Note: The response is JSON, not CSV, so no need for StringIO. Parsing JSON directly.
dataElements = response.json()   # Correct method to parse JSON response

# Step 7: Iterate over each data element in the response to fetch its `valueType`
for index, element in tqdm(enumerate(dataElements["dataElements"])):
    # Construct the query string for each data element to get its valueType
    query_string = f'dataElements/{element["id"]}?fields=valueType[]'

    # Fetch the valueType data for the current data element
    response = api.get(query_string)

    # Parse the valueType JSON response
    valueType_json = response.json()

    # Store the valueType inside the corresponding data element in the list
    dataElements["dataElements"][index]["valueType"] = valueType_json["valueType"]

# Step 8: Print the final dataElements dictionary with valueTypes included
print(dataElements)