High server load from background searches in tracker registration

imwota · 28 January 2019 09:38

Hello Mike and team,

“There will be improvements to the Search in 2.29 and 2.30 that will help us
as well”
The search in 2.29, has indeed worked perfect to improve on reduplication, however
It has cause heavy performance issues, on server side… We have been using the tracker with 700,000 TEI’s, and multiple stages for each.

Is there an option to disabling the integrated auto search functionality, back to manual search, so that performance is improved now, as we look forward to step-up our servers?

Thank you
Isaac

Markus · 8 February 2019 09:10

Hi there @imwota!
Thank you for the post.

On the configuration side the main thing you can do is to make sure that only the most sensible tracked entity attributes is marked as “searchable” in the program configuration. Furthermore, the same should be the case for the Tracked entity type attributes in the tracked entity type configuration.
In addition, you should set sensible values for “Minimum number of attributes required to search” in program and tracked entity type. Increasing this number to 2 will prevent your registration forms from searching until at least 2 of the searchable attributes is filled in, for example.
The last configuration option to look at is the “Maximum number of tracked entity instances to return in a search”, for program and tracked entity type. I would urge you to set this number as low as you can(but not 0, which means no limit). Setting the number to 5 for example will allow searches that returns 5 or fewer results. If a search would return more than 5 results, the user is told to be more specific.

We are looking at the efficiency of the queries being made, and hopefully we can also improve the code to become more efficient somehow. We might reach out to you to ask for more insight into your server and setup.

Best regards, Markus

bobj · 8 February 2019 09:58

Hi Isaac

There is a postgresql setting you can make which allows you to record long running queries into the postgresql log file. Sometimes this is useful to analyze performance issues.

For example to collect all queries taking more than one minute, add the following line to your postgresql.conf file:

log_min_duration_statement = 60000

You don’t need to restart the database. Just reload will do. eg on ubuntu you would do something like ‘sudo service postgresql reload’.

If it is possible for you to do this, then I would ask you to send the resulting postgresql log file to me and I will see if there are any useful insights we can glean. Don’t post it here but you can send by email to bob@dhis2.org. Depending how busy your system is, the file might be quite big but should compress well.

After you have done this you can disable the setting. What I often do on production settings is to keep it there with a value of say 300000 (5 minutes). That way you will record really troublesome queries without impacting your day to day performance too much.

(Note that there is some risk that demographic data could leak into the log file when you do this. Please do check through this before sending by mail)

Cheers
Bob

jomutsani · 9 April 2019 08:01

Hi @imwota,

Was the support from @Markus and @bobj helpful in sorting you out?

Best,
James.

imwota · 10 April 2019 13:46

Greetings, @Markus, @bobj & @jomutsani,

Thanks for the support, this was very helpful, server performance was improved.

We removed all searchable fields and left the Unique ID field only. however as we continue to scale-up, new users are added, the problem comes back partly, mainly during pick-time 8 AM to midday.

We are monitoring the long-running queries, most of them are SELECT statements from the search function, these queues up very first due to many users(1000+), hence maxing the server processors.

Regards
Isaac M

jomutsani · 10 April 2019 13:56

This is great news @imwota!
Thanks for your feedback and please, don’t forget to extend your support to someone else in the commuity when you have a minute… there’s always enough to share

Best,
James.

imwota · 10 April 2019 14:33

ohh yeah… sure i will

Markus · 11 April 2019 09:38

There has also been extensive work on improving the performance the last weeks. What version of DHIS2 are you on? It might help further to upgrade your server to the lastest build of your DHIS2 version.

skalyati · 16 April 2019 11:08

This is good information.However,if my searchable attribute is for instance a Unique Identifier but i want to be able to search only part of the Unique Identifier for example my UIC is MH15221GHKL01 and instead of searching for the entire code,can i just type GHKL and it shows me the possible TEI’s? If its possible how do i configure? i have upgraded to 2.31.1 latest build from 2.30 which this was working properly.Thanks

Markus · 10 May 2019 09:53

Hey @skalyati
The use case for searching using only one part of a unique indentifier has not been properly documented and understood by us. It would be great if you can create a Jira issue describing the short user story for it. Please use the jira type for “user story” and submit the ticket.

If an attribute is unique, we currently only support exact matches when searching, but reading what you write I see how it can also make sense with partial matches.

Markus