[Knowledge Sharing] How to fetch data values using python & dhis2 api using Analytics

Jake_Lamers · 10 September 2024 12:45

Hello everyone,

I wanted to share my approach to requesting data from DHIS2 using the dhis2.py Python package version 2.3.0.

While I found various discussions about querying through dataElements, I encountered issues where the queries only returned metadata without the actual data I needed.

I then came across this post, which suggested using the /api/analytics endpoint. This method proved effective. To figure out the appropriate URL queries, I analyzed the queries used in the data visualizer tab when downloading JSON files.

The analytics documentation was particularly helpful. Important note: If you use this method, be sure to add a semicolon ; at the end of your query. For some reason, api.get() appends .json to every query, and adding the semicolon ensures the query is processed correctly.

I apologize if this is a duplicate post or if I’m posting under the wrong tags—this is my first time posting here. I hope this helps others who are navigating similar challenges!

# Import necessary libraries
import pandas as pd               # For data manipulation and analysis
import getpass                    # For secure password input
import re                         # For regular expressions
import ast                        # For evaluating string representations of data structures
from dhis2 import Api             # DHIS2 API for accessing data
from io import StringIO           # To handle strings as file-like objects
from tqdm import tqdm             # For displaying progress bars during loops

# Function to get credentials securely
def get_credentials():
    try:
        # Prompt the user for their username
        username = input("Enter your username: ")

        # Prompt the user for their password securely (input is hidden)
        password = getpass.getpass("Enter your password: ")

        return username, password
    except Exception as e:
        print(f"An error occurred: {e}")

# Get user credentials for DHIS2 API
username, password = get_credentials()

# Initialize DHIS2 API with provided credentials
api = Api(
    'https://sl.dhis2.org/hmis23',  # URL for the DHIS2 instance
    username,
    password
)

# Initialize an empty string to store scraped metadata containing CHW name and ID feilds
s = ''

# Loop through the paginated results of organisation units from the DHIS2 API
for page in api.get_paged('organisationUnits', page_size=100):
    s = s + str(page)  # Concatenate each page's data into the string 's'

# Define a regular expression pattern to match specific organisation units containing 'CHW' in the display name
pattern = r"\{'displayName': '[^']*CHW[^']*', 'id': '[^']*'\}"

# Find all matches in the data string
matches = re.findall(pattern, s)

# Convert each match (which is a string) into a dictionary
data_dicts = [ast.literal_eval(match) for match in matches]

# Create a pandas DataFrame from the list of dictionaries
df = pd.DataFrame(data_dicts)

# Define the period and data element for the API query
period = 'LAST_MONTH'
data_element = 'wfaQBIVQxWo'  # Data element for "Fever Case (suspected malaria)"

# Add a new column to the DataFrame for storing suspected malaria cases
df["Fever Case (suspected malaria) in HTR and ETR"] = pd.NA

##################################################################
#
# This is the section that downloads the data
#
#
##################################################################

# Loop through each row of the DataFrame using a progress bar to query Fever data
for index, row in tqdm(df.iterrows()):
    # Fetch analytics data for each organisation unit (row["id"]) using the DHIS2 API
    r = api.get(f'analytics.csv?dimension=pe:{period}&dimension=dx:{data_element}&filter=ou:{row["id"]};')
    
    # Convert the CSV response to a pandas DataFrame
    data = pd.read_csv(StringIO(r.text))

    try:
        # Try to set the "Fever Case (suspected malaria)" value in the DataFrame
        df.loc[index, "Fever Case (suspected malaria) in HTR and ETR"] = data.loc[0, "Value"]
    except Exception as e:
        # If an error occurs (e.g., missing data), skip to the next row
        pass

##################################################################
#
# This is the section that downloads the data
#
# I realize I could probably do this as one request rather than a request for each data point but I haven't figured that part out yet
#
##################################################################

# Print the first few rows of the DataFrame
print(df.head())

# Save the DataFrame to a CSV file
df.to_csv('test.csv')

Gassim · 10 September 2024 14:51

Hi @Jake_Lamers

Welcome to the community! Thank you for the knowledge sharing What a great first post!

It seems the script filters out exactly the type of metadata needed based on the ‘pattern’, right?

Jake_Lamers · 11 September 2024 12:55

Thanks and yeah however I actually found a much simpler solution rather than doing

f'analytics.csv?dimension=pe:{period}&dimension=dx:{data_element}&filter=ou:{row["id"]};'

I replaced the row["id"] which contains the id for the CHW with LEVEL-6 which grabs all the 6th level data which I believe should be the same and is significantly faster. So the code is now just.

period='LAST_MONTH'
data_element = 'wfaQBIVQxWo'  # Data element for "Fever Case (suspected malaria)"
org='LEVEL-6'

r = api.get(f'analytics.csv?dimension=pe:{period}&dimension=dx:{data_element}&dimension=ou:{org};')
    
# Convert the CSV response to a pandas DataFrame
data = pd.read_csv(StringIO(r.text))