Scalability issue - assignment of organisationUnits to dataSets

Hi team,

We have a scalability issue that we’d like to discuss with the developer community along with potential solutions.

The problem

  • The number of organisationUnits (DHIS2 demo has ~2000) usually far outnumbers the number of dataSets (DHIS2 demo has ~20).

  • When you assign an organisationUnit to a dataSet, the lastUpdated timestamp is only updated on the dataSet, not the organisationUnit.

  • This means that when we modify one of the 2000 organisationUnits to have an additional dataSet, we are not able to use the lastUpdated filter to download the aforementioned organisationUnit including its associated dataSet IDs. We are only able to use the lastUpdated filter to download the updated dataSet along with the organisationUnit IDs of all the other (potentially up to 2000) organisationUnits already assigned to the dataSet. The organisationUnit IDs make up ~85% of the dataSets payload on the DHIS2 demo.

  • Since organisationUnits are far greater in number they are more likely to be created or modified on a more frequent basis. If those modifications include dataSet assignments, then the dataSets will be “updated” each time and downloaded repeatedly.

  • The payloads are still relatively small for moderate internet connections. With gzip compression, downloading all dataSets is approximately 80KB. However for poor internet connections in remote locations an 80KB payload can be problematic especially if it occurs on a recurring basis.
    Potential solutions

  • When assigning an organisationUnit to a dataSet, update the lastUpdated timestamp on the organisationUnit as well as the dataSet. This doesn’t solve the problem of dataSets being downloaded repeatedly, but it would allow API consumers to exclude the organisationUnit IDs from the dataSets payload (~85% of the payload size) and mitigate the payload size issue on poor internet connections.

  • When assigning an organisationUnit to a dataSet, only update the *lastUpdated *timestamp on the organisationUnit (and not on the dataSet). Whilst this solves the issue of downloading dataSets repeatedly, it presents backwards compatibility issues for existing consumers of the dataSets API.

  • Update the lastUpdated timestamp of either the dataSet or the organisationUnit, depending upon the API resource used*.* In the DHIS2 demo you are able to assign/remove organisationUnits to/from dataSets using POST/DELETE requests to either /api/dataSets//organisationUnits/ or /api/organisationUnits//dataSets/. If you use the former (the dataSets resource) then update the timestamp on the dataSet, whereas if you use the latter (the organisationUnits resource) then update the timestamp on the organisationUnit. This has the advantage of solving the issue of downloading dataSets repeatedly whilst also not introducing backwards compatibility issues for consumers of the dataSets API.

···

Thoughts?

Cheers,

-doh