Will importing old data into dhis2 system make it slower?

ahmed685 · 2 September 2022 15:34

We have a dhis2 instance which currently has the following stats :

More than 76 million data values
More than 9,400 organisation units
More than 11,600 users

More than 12 million data values per month are being entered in system currently.

Our server performance looks slow already, with the user load reaching at 4,000 concurrent online users.
The 64 GB ram already looks overloaded and swap memory of 2 GB looks exhausted.

Now the department wishes to import the historical data ( data in another system which was reported before dhis2 implementation ) into dhis2, the estimated data values to be imported are 60 million.

Our instance is dhis2 version 2.36.3 ( We tried to upgrade to 2.37.7.1 , but it caused long queues and halted the server. Details at :https://community.dhis2.org/t/slow-response-of-a-certain-request-creates-backlog-and-slows-down-server/48388 )

My four questions are :

How can we improve server performance ? Is something wrong with swap memory ? Can we be ok by improving RAM ?
Should we take risk of importing the previous data and put more load on server ?
Will we be able to upgrade from 2.36.3 ever?
Is this the largest instance of dhis2 ? if not then ,i need to know which country has the largest instance w.r.t users and data values ?

Norah_Stoops · 3 September 2022 10:01

Hi Ahmed
I approach this from a different perspective. Does your current data base have lots of gender and age disaggregated data? Do you keep the non essential zeros, or are they deleted frequently. Is some 85% of the database consist of zeros if not deleted. If this is the case consider deleting the zeros where possible. See if that helps. I know this is not a ‘technical’ solution, but it may help with size and speed.
Is this all aggregated data or tracker data?
Norah

ahmed685 · 4 September 2022 14:34

Hi @Norah_Stoops ,
Thanks for your reply.
Yes our data is disaggregated into gender and multiple age groups ?
Yes we are currently keeping zeros in database , as it is required by the department.

Norah_Stoops · 4 September 2022 14:54

Hi Ahmed,
I assumed that there was disaggregation.
You are between a rock and a hard place.
There is a thought that you can collect everything and it can be stored until you need it, Now you are discovering that that is not true. Storing large amounts of data, and zeros can be a very costly exercise. When decisions are made about what to collect and what to keep, no thought is given to the reality that this will cost a lot of money. There is a balance between what you can collect/store and use and how much computing power you have available.
I have no solutions except to tell the MOH to rethink the impact of the decisions they make, gently.
Good luck
Norah
+

phil · 5 September 2022 08:26

Hi @ahmed685 ,

Someone from our @dhis2-backend team may have some ideas about ways to squeeze a little more performance out of your implementation, but to add to Norah’s comments, I would say that you are really pushing the limits of the system and may need to adapt the setup and/or processes.
If you have the available infrastructure capacity, you may wish to consider horizontal scaling such as read replicas or clusters.

Kind regards,
Phil