Question on aggregation periods

jason · 19 April 2010 09:10

Hi Lars,
I have been looking at the aggregation process and have a few questions.

I chose a range of months (January-December 2009) and aggregated all
data from facilities to districts. Monitoring the logs, I saw that a
greater number of periods were aggregated during the process for some
reason, which seemed strange to me. I dug deeper and pulled out the
resulting set of periods [1]. As you can see, there are a number of
periods, that overlap, but which are not contained in any of the
requested aggregation periods. [2]

First, it would seem that DHIS2 is asking for all data that "overlaps"
the desired range of time periods, but actually, it should be the time
periods requested, and any time periods that the desired aggregation
time periods are composed of. For instance I have a quarterly time
period with periodid 20684. This query gives me a list of all periods
that this period may be composed of

SELECT periodid from period where startdate >= (SELECT startdate from
period where periodid = 20684)
and enddate <= (SELECT enddate from period where periodid = 20684)
and periodid <> 20684

So, I would expect that the aggregation engine would analyze the
requested time periods and data elements, and then decompose those
individually via a query such as given above, to get all "dependent"
time periods. This process would repeat itself until the time periods
could not be decomposed any further or until it does not make sense to
do so. I would expect that for the indicators/data elements chosen by
the user, that only the periodicity of the data set would be used to
determine the base time period which to begin aggregation. For
instance, I might chose to aggregate data which has been entered with
a monthly periodicity, and aggregate it to quarters. The aggregation
engine should know that based on my choice of data elements (e.g.
monthly) that all periods for that given indicator/data element need
to be retrieved and then aggregated into the destination time period.

Could you maybe comment on why this happens this way? It would seem to
be wasteful, as one of the most limiting steps in terms of
performance, seems to be related to input/output of the data. If we
can decrease this, it should speed things up a bit.

Regards,
Jason

[1] http://pastebin.com/XAEtCdzY
[2] http://pastebin.com/uFMgcxtT

···

--
--
Jason P. Pickering
email: jason.p.pickering@gmail.com
tel:+260968395190