Modeling App Questions

Hello community!

My team is currently exploring the use of the Modeling app. We have a few questions related to data modeling, configuration, and best practices, and would appreciate your guidance:

  1. Is there a standard or acceptable radius for how far a weather station’s measurements can be considered representative of a community? Or, what is the maximum size or radius should a weather station cover accurately?

  2. Is it possible to configure modeling so that cases can be analyzed based on patients’ residential location (e.g., coded at community level using an option set) instead of—or in addition to—the reporting facility (org unit), and can option sets be used as a selectable alternative to org units for this purpose?

  3. Is it possible to limit the data sets processed by the analytics table by program (aggregate, tracker event) so that data sets like climate data are only processed during off-hours (e.g. 12am to 7am)

4.What is the value of using the local climate dataset versus global datasets for predictive modeling, and should we advise the country to implement local sensors?

1 Like

Hi @Reanne.Jean

In addition to triaging this to the team, I might as well provide you with some of the related resources:

Regarding acceptable radius this seems to differ from case to case (elevation and other factors) and which dataset is being used (CHIRPs, ERA5-Land…etc).

I do recommend giving more details about the use-case and why it’s important to handle this using optionsets instead or using a data element with OU because it’s necessary to know whether this will work for the final ‘analytics’ you’re expecting.

I’m pretty sure for this, you can schedule a job in the schedular app and you can select which programs to skip under the ‘skip programs’ dropdown

Good question, I’ll leave this for expert discussion!

Thanks!

Hi @Gassim , thanks for responses so far. To elaborate on number 2, case notifications are captured through a program at the facility (org unit) level, while the patient’s residential location is recorded using an option set. At the same time, the population data used in models is aggregated at the community level.

The core issue is that, within the Modeling app, evaluations and predictions are constrained to the org unit hierarchy, which means:

We can only aggregate and analyze cases by the reporting facility, not by patient residence
This creates a mismatch when aligning case data with population denominators, since those are structured by community

What we are trying to achieve is:

Either the ability to analyze cases by a non-org unit dimension (i.e., patient residence captured via option set), or
A way to map or treat option set values as a geographic/organizational dimension usable in modeling

From what we understand, option sets are not currently treated as a substitute for org units in analytics or modeling, which limits their usability for geographically meaningful analysis when the reporting location differs from the patient’s residence.

1 Like

Hi @Reanne.Jean

For questions 1 and 4:

I would recommend contacting your national meteorological office for more specific guidance, as it is difficult to provide a one-size-fits-all answer. The acceptable distance over which a weather station’s measurements remain representative depends heavily on local terrain and topography (e.g., elevation changes, proximity to water bodies, and urban vs. rural environments).

In general, you should aim to use the most detailed and high-quality datasets available. Local data is often preferable because it better captures site-specific conditions and can help build local ownership and technical capacity. Whether additional local sensors are necessary depends on the specific use case, required accuracy, and existing data coverage.

Local observations can also be valuable for improving global datasets—for example, through bias correction or validation. At the same time, global datasets are useful for quickly developing prototypes and initiating modeling work, especially in areas where local data is sparse or unavailable.

2 Likes

Hi @Bjorn_Sandvik, thank you for the informative response!

1 Like