Hello everyone,
I wanted to share my approach to requesting data from DHIS2 using the dhis2.py
Python package version 2.3.0.
While I found various discussions about querying through dataElements
, I encountered issues where the queries only returned metadata without the actual data I needed.
I then came across this post, which suggested using the /api/analytics
endpoint. This method proved effective. To figure out the appropriate URL queries, I analyzed the queries used in the data visualizer tab when downloading JSON files.
The analytics documentation was particularly helpful. Important note: If you use this method, be sure to add a semicolon ; at the end of your query. For some reason, api.get()
appends .json
to every query, and adding the semicolon ensures the query is processed correctly.
I apologize if this is a duplicate post or if I’m posting under the wrong tags—this is my first time posting here. I hope this helps others who are navigating similar challenges!
# Import necessary libraries
import pandas as pd # For data manipulation and analysis
import getpass # For secure password input
import re # For regular expressions
import ast # For evaluating string representations of data structures
from dhis2 import Api # DHIS2 API for accessing data
from io import StringIO # To handle strings as file-like objects
from tqdm import tqdm # For displaying progress bars during loops
# Function to get credentials securely
def get_credentials():
try:
# Prompt the user for their username
username = input("Enter your username: ")
# Prompt the user for their password securely (input is hidden)
password = getpass.getpass("Enter your password: ")
return username, password
except Exception as e:
print(f"An error occurred: {e}")
# Get user credentials for DHIS2 API
username, password = get_credentials()
# Initialize DHIS2 API with provided credentials
api = Api(
'https://sl.dhis2.org/hmis23', # URL for the DHIS2 instance
username,
password
)
# Initialize an empty string to store scraped metadata containing CHW name and ID feilds
s = ''
# Loop through the paginated results of organisation units from the DHIS2 API
for page in api.get_paged('organisationUnits', page_size=100):
s = s + str(page) # Concatenate each page's data into the string 's'
# Define a regular expression pattern to match specific organisation units containing 'CHW' in the display name
pattern = r"\{'displayName': '[^']*CHW[^']*', 'id': '[^']*'\}"
# Find all matches in the data string
matches = re.findall(pattern, s)
# Convert each match (which is a string) into a dictionary
data_dicts = [ast.literal_eval(match) for match in matches]
# Create a pandas DataFrame from the list of dictionaries
df = pd.DataFrame(data_dicts)
# Define the period and data element for the API query
period = 'LAST_MONTH'
data_element = 'wfaQBIVQxWo' # Data element for "Fever Case (suspected malaria)"
# Add a new column to the DataFrame for storing suspected malaria cases
df["Fever Case (suspected malaria) in HTR and ETR"] = pd.NA
##################################################################
#
# This is the section that downloads the data
#
#
##################################################################
# Loop through each row of the DataFrame using a progress bar to query Fever data
for index, row in tqdm(df.iterrows()):
# Fetch analytics data for each organisation unit (row["id"]) using the DHIS2 API
r = api.get(f'analytics.csv?dimension=pe:{period}&dimension=dx:{data_element}&filter=ou:{row["id"]};')
# Convert the CSV response to a pandas DataFrame
data = pd.read_csv(StringIO(r.text))
try:
# Try to set the "Fever Case (suspected malaria)" value in the DataFrame
df.loc[index, "Fever Case (suspected malaria) in HTR and ETR"] = data.loc[0, "Value"]
except Exception as e:
# If an error occurs (e.g., missing data), skip to the next row
pass
##################################################################
#
# This is the section that downloads the data
#
# I realize I could probably do this as one request rather than a request for each data point but I haven't figured that part out yet
#
##################################################################
# Print the first few rows of the DataFrame
print(df.head())
# Save the DataFrame to a CSV file
df.to_csv('test.csv')