Accessing indicators or multiple dataElements within subExpressions

mv_evans · 20 September 2024 15:18

Hello,

I’m trying to do a calculation that is maybe the limit of what indicator analytics are meant to do (v. 2.40.3).

The goal is to have an indicator that represents how many health facilities have cases over a facility-specific threshold per health district for a specific three month period (current month +2). The threshold is based on the long term average of the case number for that same three month season over the prior three years, which is stored as a separate dataElement. I know technically subExpressions only take one dataElement, but it seems to be somewhat working with two and returning a Valid expression.

The way I think I should write out this formula would be:

subExpression(if((#{DE1} + #{DE1}.periodOffset(1) + #{DE1}.periodOffset(2)) > (#{DE2}.periodOffset(-12) +#{DE2}.periodOffset(-11) + #{DE2}.periodOffset(-10) + #{DE2}.periodOffset(-24) + #{DE2}.periodOffset(-23) + #{DE2}.periodOffset(-22) +#{DE2}.periodOffset(-36) + #{DE2}.periodOffset(-35) + #{DE2}.periodOffset(-34))/3,1,0)).aggregationType(SUM)

In English, for each facility, if the number of cases predicted over the next three months is greater than the average over the prior 3 years for the same period for the 2nd visit, return a 1. Then sum these to the district level, resulting in the number of facilities over this threshold.

The aggregation doesn’t seem to be working for this, as it’s still returning it at the district level as a 0, even when I can count that the values are different.

One solution I tried, if only to make this easier to manage and read was to create an indicator corresponding to each side of the equation in the if statement, but it seems like indicators don’t work within subExpressions. I would use a predictor, but I need to be able to specify these months in a very specific way that doesn’t seem possible with the sequential sampling functionality.

Any ideas/thoughts? Happy to share a seed of some fake data to play around with.

Gassim · 23 September 2024 11:30

Hi @mv_evans

The three months that you are comparing to the threshold are they months that already have data? If there is no data then using indicators is not an option to the best of my knowledge.

mv_evans · 24 September 2024 05:12

Hi @Gassim , yes they already have data.

We are making some DHIS2 indicators to support a forecasting application, so there are some dataElements that exist for the future. The thresholds are only made from historical data, however.

Gassim · 24 September 2024 08:06

Hi @mv_evans

Oh, okay, I think I get it better now, so in other words, we’re not using the DHIS2 Indicators to ‘predict’ since we’re already giving the predicted values, right?

Wouldn’t be easier if you used combined indicators instead of subexpressions?

Expression: count health facilities.
Filter: count (events or enrollments?) > threshold

mv_evans · 24 September 2024 12:32

Ah okay, so rather than a SUMIF statement we would have:

one indicator that represents the month x health facility threshold
one indicator that represents the forecasted counts
one indicator that is the filter of the facilities with forecasted values above its threhold
one indicator that is the count of the filtered facilities

Is this kind of what you are imagining? I’ve currently got some indicators for the first two to represent kind of intermediate calculations, but am having trouble getting the last ones because I can’t combine two indicators in a subExpression.

I’ve currently just wrapped it all in a Python script that GETs the relevant indicators, counts them and POSTs the count as a new dataElement. Probably not the the most elegant, but can be run with the rest of the Python updating scripts to simplify things.

Jim_Grace · 8 October 2024 23:57

Hi @mv_evans, sorry I missed seeing this until now. We had a problem with periodOffset inside subExpression, but we believed we had fixed this with 2.40.2 (see DHIS2-15874). The expression you gave looks good to me and should have worked on 2.40.3.

It should also be possible to do this with multiple predictors, but this would be rather convoluted. Your solution with the Python script is probably a better idea than that. (You could make one predictor to sum the predicted cases in the current month plus the next two, store that value 2 months from now because predictors can only access current or prior periods, then make 3 more predictors to get the case numbers for the prior three years using a combination of sequential sample count and sequential skip count. Then you could compare the two values using an indicator with a subExpression and periodOffset(2), to bring the value back to the period of interest.)

But your original expression should have worked, and that should be the best way to do this. If you would send me some fake data I would be willing to look at why it’s not working.

mv_evans · 10 October 2024 12:15

Hi @Jim_Grace Thanks for your response. Trying to get this working with a combination of indicators is on the back-burner for now as we have the Python script that we run when we do our monthly update of predictions anyways. But I plan to get back to it sometime in the future. It may very well be an artefact of our specific datasets, or it may just be that I got tangled in the combination of multiple lagged indicators because, as you mentioned, it gets a bit convoluted trying to force them to do something they aren’t really made for. I’ll let you know when I have an update.