Performance of Tracker API events

API GET: /api/tracker/events
Parameters:
occurredAfter: 2024-04-22
occurredBefore: 2024-07-14
ouMode: DESCENDANTS
orgUnit: <org_unit>
totalPages: true
pageSize: 100
page: 1
fields: event
order: event

Context:
We have currently moved from DHIS2 v2.37.8.1 to v2.40.3.2 and have started facing issue with performance with above API, we looked at the patch fix for v2.40.4 and cherry picked the query update(fix: Tracker event query is slow by lucaCambi77 · Pull Request #16965 · dhis2/dhis2-core · GitHub) and even after that we can see the performance degradation persists.

Performance issue Summary:
Tracker API is performing badly giving response time more than 1min when fetching with more data. (~ 3415391 program stage instance).

Details:

  1. Tracker events GET API within 3 months date range is getting response time more than 1min.
  2. We have observed 2 internal jdbc query which is taking more processing time.
  3. First Query: select * from (select ou.uid as ou_identifier, p.uid as p_identifier, ps.uid as ps_identifier, coc_agg.uid as coc_identifier, psi.uid as psi_uid, ou.uid as ou_uid, p.uid as p_uid, psi.programstageinstanceid as psi_id, psi.status as psi_status, psi.executiondate as psi_executiondate, psi.eventdatavalues as psi_eventdatavalues, psi.duedate as psi_duedate, psi.completedby as psi_completedby, psi.storedby as psi_storedby, psi.created as psi_created, psi.createdbyuserinfo as psi_createdbyuserinfo, psi.lastupdated as psi_lastupdated, psi.lastupdatedbyuserinfo as psi_lastupdatedbyuserinfo, psi.completeddate as psi_completeddate, psi.deleted as psi_deleted, ST_AsText( psi.geometry ) as psi_geometry, au.uid as user_assigned, (au.firstName || ' ' || au.surName) as user_assigned_name,au.firstName as user_assigned_first_name, au.surName as user_assigned_surname, au.username as user_assigned_username,coc_agg.uid as coc_uid, coc_agg.co_uids AS co_uids, coc_agg.co_count AS option_size, pi.uid as pi_uid, pi.status as pi_status, pi.followup as pi_followup, pi.enrollmentdate as pi_enrollmentdate, pi.incidentdate as pi_incidentdate, p.type as p_type, ps.uid as ps_uid, ou.name as ou_name, tei.trackedentityinstanceid as tei_id, tei.uid as tei_uid, teiou.uid as tei_ou, teiou.name as tei_ou_name, tei.created as tei_created, tei.inactive as tei_inactive from programstageinstance psi inner join programinstance pi on pi.programinstanceid=psi.programinstanceid inner join program p on p.programid=pi.programid inner join programstage ps on ps.programstageid=psi.programstageid inner join organisationunit ou on psi.organisationunitid=ou.organisationunitid left join trackedentityinstance tei on tei.trackedentityinstanceid=pi.trackedentityinstanceid left join organisationunit teiou on (tei.organisationunitid=teiou.organisationunitid) left join userinfo au on (psi.assigneduserid=au.userinfoid) inner join (select coc.uid, coc.attributevalues, coc.code, coc.categoryoptioncomboid as id, string_agg(co.uid, ';') as co_uids, count(co.categoryoptionid) as co_count from categoryoptioncombo coc inner join categoryoptioncombos_categoryoptions cocco on coc.categoryoptioncomboid = cocco.categoryoptioncomboid inner join dataelementcategoryoption co on cocco.categoryoptionid = co.categoryoptionid group by coc.categoryoptioncomboid ) as coc_agg on coc_agg.id = psi.attributeoptioncomboid where ( (EXISTS(SELECT ss.organisationunitid FROM userteisearchorgunits ss JOIN userinfo u ON u.userinfoid = ss.userinfoid JOIN organisationunit orgunit ON orgunit.organisationunitid = ss.organisationunitid WHERE u.uid = ? AND ou.path like CONCAT(?, '%' ) AND p.accesslevel in ('OPEN', 'AUDITED')) OR EXISTS(SELECT cs.organisationunitid FROM usermembership cs JOIN userinfo u ON u.userinfoid = cs.userinfoid JOIN organisationunit orgunit ON orgunit.organisationunitid = cs.organisationunitid WHERE u.uid = ? AND ou.path like CONCAT(?, '%' ) )) ) and (psi.executiondate >= ? or (psi.executiondate is null and psi.duedate >= ? )) and (psi.executiondate < ? or (psi.executiondate is null and psi.duedate < ? )) and psi.deleted is false order by psi_uid asc limit 50 offset 7250 ) as event left join (select psic.programstageinstanceid as psic_id, psinote.trackedentitycommentid as psinote_id, psinote.commenttext as psinote_value, psinote.created as psinote_storeddate, psinote.creator as psinote_storedby, psinote.uid as psinote_uid, psinote.lastupdated as psinote_lastupdated, userinfo.userinfoid as usernote_id, userinfo.code as usernote_code, userinfo.uid as usernote_uid, userinfo.username as usernote_username, userinfo.firstname as userinfo_firstname, userinfo.surname as userinfo_surname from programstageinstancecomments psic inner join trackedentitycomment psinote on psic.trackedentitycommentid = psinote.trackedentitycommentid left join userinfo on psinote.lastupdatedby = userinfo.userinfoid ) as cm on event.psi_id=cm.psic_id left join (select ri.programstageinstanceid as ri_psi_id, json_agg(ri.relationshipid) as psi_rl FROM relationshipitem ri GROUP by ri_psi_id) as fgh on fgh.ri_psi_id=event.psi_id order by psi_uid asc
  4. Total execution time: 31,235.2 ms (This sometime goes to more than 1min)
  5. Rows returned: 68
  6. Second Query: select count(*) from programstageinstance psi inner join programinstance pi on pi.programinstanceid=psi.programinstanceid inner join program p on p.programid=pi.programid inner join programstage ps on ps.programstageid=psi.programstageid inner join organisationunit ou on psi.organisationunitid=ou.organisationunitid left join trackedentityinstance tei on tei.trackedentityinstanceid=pi.trackedentityinstanceid left join organisationunit teiou on (tei.organisationunitid=teiou.organisationunitid) left join userinfo au on (psi.assigneduserid=au.userinfoid) inner join categoryoptioncombo coc on coc.categoryoptioncomboid = psi.attributeoptioncomboid inner join lateral (select coc.categoryoptioncomboid as id, string_agg(co.uid, ';') as co_uids, count(co.categoryoptionid) as co_count from categoryoptioncombo coc inner join categoryoptioncombos_categoryoptions cocco on coc.categoryoptioncomboid = cocco.categoryoptioncomboid inner join dataelementcategoryoption co on cocco.categoryoptionid = co.categoryoptionid where psi.attributeoptioncomboid = coc.categoryoptioncomboid group by coc.categoryoptioncomboid ) as coc_agg on coc_agg.id = psi.attributeoptioncomboid where ( (EXISTS(SELECT ss.organisationunitid FROM userteisearchorgunits ss JOIN userinfo u ON u.userinfoid = ss.userinfoid JOIN organisationunit orgunit ON orgunit.organisationunitid = ss.organisationunitid WHERE u.uid = ? AND ou.path like CONCAT(?, '%' ) AND p.accesslevel in ('OPEN', 'AUDITED')) OR EXISTS(SELECT cs.organisationunitid FROM usermembership cs JOIN userinfo u ON u.userinfoid = cs.userinfoid JOIN organisationunit orgunit ON orgunit.organisationunitid = cs.organisationunitid WHERE u.uid = ? AND ou.path like CONCAT(?, '%' ) )) ) and (psi.executiondate >= ? or (psi.executiondate is null and psi.duedate >= ? )) and (psi.executiondate < ? or (psi.executiondate is null and psi.duedate < ? )) and psi.deleted is false
  7. Total execution time: 1,718.8 ms
  8. Rows returned: 1

System specifications:

Volume size: 128 GB
CPU: 4 cores
RAM: 16 GB
Autoscaling: turned off

We have raised a card for the second query: DHIS2-17732

cc: @gassim @Kris_Reinhardt @muhima.mohamed

1 Like

Hi @Supriya_Muppiri

Sorry to hear about this issue. Did you try to restart the instance and clear the server cache to see if the performance improves?

Thanks!

This is also affecting other servers (also a migration from 2.38 to 40.3.2). Actually, the amount of time in the analytics for the events has been increased a lot.

Hi @Gassim We have tried to remove cache but no improvement on the performance.
We also explored further optimisations on query1, query 2 mentioned above and discovered that correlated queries, especially those checking for existence in user membership and userteisearchorgunits, were causing issues.
Following our changes, we observed a major improvement, with execution times decreasing from 7 seconds to 60 milliseconds. The initial run after the change takes 4 seconds, while subsequent runs complete in 60 milliseconds.

select count(*) from (
 (
SELECT
    psi.uid
FROM
    programstageinstance psi
    INNER JOIN programinstance pi ON pi.programinstanceid = psi.programinstanceid
    INNER JOIN program p ON p.programid = pi.programid
    INNER JOIN programstage ps ON ps.programstageid = psi.programstageid
    INNER JOIN organisationunit ou ON psi.organisationunitid = ou.organisationunitid
    LEFT JOIN trackedentityinstance tei ON tei.trackedentityinstanceid = pi.trackedentityinstanceid
    LEFT JOIN organisationunit teiou ON (tei.organisationunitid = teiou.organisationunitid)
    LEFT JOIN userinfo au ON (psi.assigneduserid = au.userinfoid)
    INNER JOIN (
        SELECT
            coc.uid,
            coc.attributevalues,
            coc.code,
            coc.categoryoptioncomboid AS id,
            string_agg (co.uid, ';') AS co_uids,
            count(co.categoryoptionid) AS co_count
        FROM
            categoryoptioncombo coc
            INNER JOIN categoryoptioncombos_categoryoptions cocco ON coc.categoryoptioncomboid = cocco.categoryoptioncomboid
            INNER JOIN dataelementcategoryoption co ON cocco.categoryoptionid = co.categoryoptionid
        GROUP BY
            coc.categoryoptioncomboid
    ) AS coc_agg ON coc_agg.id = psi.attributeoptioncomboid
WHERE
    pi.status = ?
    AND (
        (
            EXISTS (
                SELECT
                    ss.organisationunitid
                FROM
                    userteisearchorgunits ss
                    JOIN userinfo u ON u.userinfoid = ss.userinfoid
                    JOIN organisationunit orgunit ON orgunit.organisationunitid = ss.organisationunitid
                WHERE
                    u.uid = 'dabyvlLTFll'
                    AND ou.path LIKE CONCAT (?,'%')
                    AND p.accesslevel IN ('OPEN', 'AUDITED')
            )
        )
    )
    AND (
        psi.executiondate < ?
        OR (
            psi.executiondate IS null
            AND psi.duedate < ?
        )
    )
    AND psi.deleted IS false
) 
UNION 
(
SELECT
    psi.uid 
FROM
    programstageinstance psi
    INNER JOIN programinstance pi ON pi.programinstanceid = psi.programinstanceid
    INNER JOIN program p ON p.programid = pi.programid
    INNER JOIN programstage ps ON ps.programstageid = psi.programstageid
    INNER JOIN organisationunit ou ON psi.organisationunitid = ou.organisationunitid
    LEFT JOIN trackedentityinstance tei ON tei.trackedentityinstanceid = pi.trackedentityinstanceid
    LEFT JOIN organisationunit teiou ON (tei.organisationunitid = teiou.organisationunitid)
    LEFT JOIN userinfo au ON (psi.assigneduserid = au.userinfoid)
    INNER JOIN (
        SELECT
            coc.uid,
            coc.attributevalues,
            coc.code,
            coc.categoryoptioncomboid AS id,
            string_agg (co.uid, ';') AS co_uids,
            count(co.categoryoptionid) AS co_count
        FROM
            categoryoptioncombo coc
            INNER JOIN categoryoptioncombos_categoryoptions cocco ON coc.categoryoptioncomboid = cocco.categoryoptioncomboid
            INNER JOIN dataelementcategoryoption co ON cocco.categoryoptionid = co.categoryoptionid
        GROUP BY
            coc.categoryoptioncomboid
    ) AS coc_agg ON coc_agg.id = psi.attributeoptioncomboid
WHERE
    pi.status = ?
    AND (
        (
            EXISTS (
                SELECT
                    cs.organisationunitid
                FROM
                    usermembership cs
                    JOIN userinfo u ON u.userinfoid = cs.userinfoid
                    JOIN organisationunit orgunit ON orgunit.organisationunitid = cs.organisationunitid
                WHERE
                    u.uid = ?
                    AND ou.path LIKE CONCAT (?,'%')
            )
        )
    )
    AND (
        psi.executiondate < ?
        OR (
            psi.executiondate IS null
            AND psi.duedate < ?
        )
    )
    AND psi.deleted IS false
) 
) a;

Seeing similar issues in our push from 2.37 to 40.3.2; the /events endpoint on 40.4 is faster after an initial attempt (i.e. first GET takes 4 minutes, subsequent take <3 seconds). We have seen the tracker/events to be slower.
@ctejo we are also seeing analytics taking about 1.5 times as long in v40 as compared to 2.37.

1 Like

@Matthew_Boddie yes, we have seen something similar. At this point, we are trying to mitigate the issue through db indexes and db configuration

1 Like

If you’re experiencing the same or a similar issue, please consider upvoting this Jira card: https://dhis2.atlassian.net/browse/DHIS2-17732. Your support could help prioritize it.

1 Like

@Supriya_Muppiri You could try disabling JIT by setting jit=off in your PostgreSQL settings to see if that improves the performance.