25 hours in completing Analytic

Dear team,

In one of the Indian state we are using DHIS 2.23 and the database size is going around 100 GB and once we run analytic it reaches to 500 GB and analytic is taking around 25 hours to complete. We are using Postgres 9.4.

Has anyone experienced the situation with 100 GB database or more?

How did you deal with the time of analytic, Is there any way so that the time of analytic can be reduced?

···


Thanks,
Neeraj Gupta

Hi Neeraj,

what usually helps to improve runtime is:

  • ssd ()

  • number of CPUs

···

On Wed, Oct 19, 2016 at 7:13 AM, Neeraj Gupta neeraj.hisp@gmail.com wrote:

Dear team,

In one of the Indian state we are using DHIS 2.23 and the database size is going around 100 GB and once we run analytic it reaches to 500 GB and analytic is taking around 25 hours to complete. We are using Postgres 9.4.

Has anyone experienced the situation with 100 GB database or more?

Thanks,
Neeraj Gupta

How did you deal with the time of analytic, Is there any way so that the time of analytic can be reduced?


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

Lars Helge Øverland

Lead developer, DHIS 2

University of Oslo

Skype: larshelgeoverland

lars@dhis2.org

http://www.dhis2.org

Hi Neeraj,

what usually helps to improve runtime is to improve/increase:

  • ssd (read and write speed)

  • number of CPUs

···
  • using latest postgresql (9.6 claims to have even better indexing performance than 9.5)

  • tuning of postgresql

regards,

Lars

Lars Helge Øverland

Lead developer, DHIS 2

University of Oslo

Skype: larshelgeoverland

lars@dhis2.org

http://www.dhis2.org

Thanks Lars, we will give this a try and let you know.
How much time do you expect to run for 100 GB database in ideal conditions?

Thanks,

Neeraj

···

On Wed, Oct 19, 2016 at 12:28 PM, Lars Helge Øverland lars@dhis2.org wrote:

Hi Neeraj,

what usually helps to improve runtime is to improve/increase:

  • ssd (read and write speed)
  • number of CPUs

  • using latest postgresql (9.6 claims to have even better indexing performance than 9.5)

  • tuning of postgresql

regards,

Lars


Lars Helge Øverland

Lead developer, DHIS 2

University of Oslo

Skype: larshelgeoverland

lars@dhis2.org

http://www.dhis2.org

Thanks,
Neeraj Gupta

Just a heads-up that there seems to be a JDBC issue with Postgres 9.6, so perhaps you should try upgrading to 9.5 first.

···

On Wed, Oct 19, 2016 at 8:58 AM, Lars Helge Øverland lars@dhis2.org wrote:

Hi Neeraj,

what usually helps to improve runtime is to improve/increase:

  • ssd (read and write speed)
  • number of CPUs

Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

  • using latest postgresql (9.6 claims to have even better indexing performance than 9.5)

  • tuning of postgresql

regards,

Lars


Lars Helge Øverland

Lead developer, DHIS 2

University of Oslo

Skype: larshelgeoverland

lars@dhis2.org

http://www.dhis2.org

Knut Staring

Dept. of Informatics, University of Oslo

Norway: +4791880522

Skype: knutstar

http://dhis2.org

Dear Neeraj,

The physical database size doesn’t matter much, even the number of records don’t matter. In my experience the biggest problem that one can going to run in to is not size, but the number of queries you can handle at a time instance specially during analytic functionality execution. Most probably you should going to have to move to a master/slave configuration of your database, so that the read queries can run against the slaves and the write queries run against the master. However, if you and your database management team are not ready for this than, you can tweak your indexes for the queries you are running to speed up the response times. Also there is a lot of tweaking you can do to the network stack and kernel in Linux where MySQL Server has been installed that will help.Perhaps, I would focus first on your indexes, then have a server admin look at your OS, and if all that doesn’t help it might be time to implement a master/slave configuration. The most important scalability factor is RAM. If the indexes of your tables fit into memory and your queries are highly optimized in analytic functionality, you can serve a reasonable amount of requests with a average machine. The number of records do matter, depending of how your tables look like. It’s a difference to have a lot of varchar fields or only a couple of ints or longs. The physical size of the database matters as well, think of backups, for instance. Depending on your engine, your physical db files on grow, but don’t shrink, for instance with innodb. So deleting a lot of rows, doesn’t help to shrink your physical files. Thus the database size does matter. If you have more than one table with more than a million records, then performance starts indeed to degrade. Indexig is one of the important stand need to take care, If you hit one million records you will get performance problems, if the indices are not set right (for example no indices for fields in “WHERE statements” or “ON conditions” in joins). If you hit 10 million records, you will start to get performance problems even if you have all your indices right. Hardware upgrades - adding more memory and more processor power, especially memory - often help to reduce the most severe problems by increasing the performance again, at least to a certain degree.

···

On Wed, Oct 19, 2016 at 12:35 PM, Knut Staring knutst@gmail.com wrote:

Just a heads-up that there seems to be a JDBC issue with Postgres 9.6, so perhaps you should try upgrading to 9.5 first.


Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

On Wed, Oct 19, 2016 at 8:58 AM, Lars Helge Øverland lars@dhis2.org wrote:

Hi Neeraj,

what usually helps to improve runtime is to improve/increase:

  • ssd (read and write speed)
  • number of CPUs

Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

Knut Staring

Dept. of Informatics, University of Oslo

Norway: +4791880522

Skype: knutstar

http://dhis2.org

  • using latest postgresql (9.6 claims to have even better indexing performance than 9.5)

  • tuning of postgresql

regards,

Lars


Lars Helge Øverland

Lead developer, DHIS 2

University of Oslo

Skype: larshelgeoverland

lars@dhis2.org

http://www.dhis2.org

Best Regards,

Brajesh Murari,

Postgraduate, Department of Computer Science and Engineering,

Chaudhary Devi Lal University, Sirsa,

India.

The three basic dimensions of human development: a long and healthy life, access to knowledge, and a decent standard of living.

Thanks Knut and Brajesh,

Lars we are using 12 Core CPu with SSD and having 48GB RAM but using Postgres 9.4.

We will try with Postgres 9.5 now and let you know the results.

Do you recommend anything else other than this?

Thanks,

Neeraj

···

On Wed, Oct 19, 2016 at 12:35 PM, Knut Staring knutst@gmail.com wrote:

Just a heads-up that there seems to be a JDBC issue with Postgres 9.6, so perhaps you should try upgrading to 9.5 first.

On Wed, Oct 19, 2016 at 8:58 AM, Lars Helge Øverland lars@dhis2.org wrote:

Hi Neeraj,

what usually helps to improve runtime is to improve/increase:

  • ssd (read and write speed)
  • number of CPUs

Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

Knut Staring

Dept. of Informatics, University of Oslo

Norway: +4791880522

Skype: knutstar

http://dhis2.org

  • using latest postgresql (9.6 claims to have even better indexing performance than 9.5)

  • tuning of postgresql

regards,

Lars


Lars Helge Øverland

Lead developer, DHIS 2

University of Oslo

Skype: larshelgeoverland

lars@dhis2.org

http://www.dhis2.org

Thanks,
Neeraj Gupta

Hi Neeraj,

not sure about the relevance of this detail: when I upgraded from 9.4 to 9.5 I
found out that there was a postgres parameter (wal_buffers) removed from the
current DHIS2 implementer's guide (relating to 9.5), which was still in there up
to 2.24 (relating to 9.4). Also I had the feeling that during the upgrade
process from 9.4 to 9.5 not all my old changes were migrated by the Ubuntu
upgrader. So better verify your postgresql.conf with the DHIS guide after
upgrading and don't just look for the parameters that are mentioned in the
guide, but also check if there are old parameters left from old configurations,
that are no longer present in the guide.

Upgrading reduced our time for aggregates by a third, but increased the time for
posting dataValues via api. Not yet sure about the reasons as I did an upgrade
from Ubuntu 14.4 to 16.4, PostgreSQL 9.4 to 9.5 and DHIS2.22 to DHIS2.24 at the
same time.

Maybe you also want to consider reading through a thread from last week:
https://lists.launchpad.net/dhis2-users/msg11529.html

Regards, Uwe

···

Neeraj Gupta <neeraj.hisp@gmail.com> hat am 19. Oktober 2016 um 10:35
geschrieben:

Thanks Knut and Brajesh,

Lars we are using 12 Core CPu with SSD and having 48GB RAM but using
Postgres 9.4.

We will try with Postgres 9.5 now and let you know the results.
Do you recommend anything else other than this?

Thanks,
Neeraj

On Wed, Oct 19, 2016 at 12:35 PM, Knut Staring <knutst@gmail.com> wrote:

> Just a heads-up that there seems to be a JDBC issue with Postgres 9.6, so
> perhaps you should try upgrading to 9.5 first.
>
> On Wed, Oct 19, 2016 at 8:58 AM, Lars Helge Øverland <lars@dhis2.org> > > wrote:
>
>>
>> Hi Neeraj,
>>
>> what usually helps to improve runtime is to improve/increase:
>>
>> - ssd (read and write speed)
>> - number of CPUs
>> - using latest postgresql (9.6 claims to have even better indexing
>> performance <https://www.postgresql.org/docs/9.6/static/release-9-6.html&gt;
>> than 9.5)
>> - tuning
>> <https://dhis2.github.io/dhis2-docs/master/en/implementer/html/install_server_setup.html#install_postgresql_performance_tuning&gt;
>> of postgresql
>>
>>
>> regards,
>>
>> Lars
>>
>>
>>
>> --
>> Lars Helge Øverland
>> Lead developer, DHIS 2
>> University of Oslo
>> Skype: larshelgeoverland
>> lars@dhis2.org
>> http://www.dhis2.org <https://www.dhis2.org/&gt;
>>
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~dhis2-users
>> Post to : dhis2-users@lists.launchpad.net
>> Unsubscribe : https://launchpad.net/~dhis2-users
>> More help : https://help.launchpad.net/ListHelp
>>
>>
>
>
> --
> Knut Staring
> Dept. of Informatics, University of Oslo
> Norway: +4791880522
> Skype: knutstar
> http://dhis2.org
>

--
Thanks,
Neeraj Gupta
_______________________________________________
Mailing list: https://launchpad.net/~dhis2-users
Post to : dhis2-users@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-users
More help : https://help.launchpad.net/ListHelp

Dear Uwe,

Thanks for your valuable suggestion, we will try with PostgreSQL 9.5 and let you know the progress.

···

On Wed, Oct 19, 2016 at 2:01 PM, Uwe Wahser uwe@wahser.de wrote:

Hi Neeraj,

not sure about the relevance of this detail: when I upgraded from 9.4 to 9.5 I

found out that there was a postgres parameter (wal_buffers) removed from the

current DHIS2 implementer’s guide (relating to 9.5), which was still in there up

to 2.24 (relating to 9.4). Also I had the feeling that during the upgrade

process from 9.4 to 9.5 not all my old changes were migrated by the Ubuntu

upgrader. So better verify your postgresql.conf with the DHIS guide after

upgrading and don’t just look for the parameters that are mentioned in the

guide, but also check if there are old parameters left from old configurations,

that are no longer present in the guide.

Upgrading reduced our time for aggregates by a third, but increased the time for

posting dataValues via api. Not yet sure about the reasons as I did an upgrade

from Ubuntu 14.4 to 16.4, PostgreSQL 9.4 to 9.5 and DHIS2.22 to DHIS2.24 at the

same time.

Maybe you also want to consider reading through a thread from last week:

https://lists.launchpad.net/dhis2-users/msg11529.html

Regards, Uwe

Neeraj Gupta neeraj.hisp@gmail.com hat am 19. Oktober 2016 um 10:35

geschrieben:

Thanks Knut and Brajesh,

Lars we are using 12 Core CPu with SSD and having 48GB RAM but using

Postgres 9.4.

We will try with Postgres 9.5 now and let you know the results.

Do you recommend anything else other than this?

Thanks,

Neeraj

On Wed, Oct 19, 2016 at 12:35 PM, Knut Staring knutst@gmail.com wrote:

Just a heads-up that there seems to be a JDBC issue with Postgres 9.6, so

perhaps you should try upgrading to 9.5 first.

On Wed, Oct 19, 2016 at 8:58 AM, Lars Helge Øverland lars@dhis2.org

wrote:

Hi Neeraj,

what usually helps to improve runtime is to improve/increase:

  • ssd (read and write speed)
  • number of CPUs
  • using latest postgresql (9.6 claims to have even better indexing

performance <https://www.postgresql.org/docs/9.6/static/release-9-6.html>

than 9.5)

  • tuning

<https://dhis2.github.io/dhis2-docs/master/en/implementer/html/install_server_setup.html#install_postgresql_performance_tuning>

of postgresql

regards,

Lars

Lars Helge Øverland

Lead developer, DHIS 2

University of Oslo

Skype: larshelgeoverland

lars@dhis2.org

http://www.dhis2.org <https://www.dhis2.org/>


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

Knut Staring

Dept. of Informatics, University of Oslo

Norway: +4791880522

Skype: knutstar

http://dhis2.org

Thanks,

Neeraj Gupta


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

Thanks & Regards
Arvind rao chikne

9818979986

Think Green!!! Please consider your environmental responsibility before printing this e-mail

Hi Neeraj,

Using VACUUM and ANALYZE

Like Brajesh, my background is MySQL, and one database admin task that is often overlooked in MySQL is OPTIMIZE TABLEs. This reclaims unused space (we’ve had 100Gb databases files drop to half their size) and refreshes index statistics (if the shape of your data has changed over time, this can make indices run faster).

I’m new to PostgreSQL, but the core principles are the same, and a quick bit of Googling shows that the equivalents in PostgreSQL are the VACUUM and ANALYZE commands. If your database isn’t set to automatically do VACUUMs (the default DHIS2 postgres config doesn’t seem to be), you might want to try VACUUM FULL, which will literally rewrite all of your database tables and indices into smaller, more efficient files (note, however, that on a 500Gb database this could take a looong time – perhaps test on a backup first?). The following forum post is a really nice, plain-English explanation of what VACUUM does:

http://dba.stackexchange.com/questions/126258/what-is-table-bloating-in-databases

As I mentioned, my background is MySQL rather than Postgres, so someone with more specific Postgres experience might like to also chime in here.

Cheers, Sam.

···

From: Dhis2-users dhis2-users-bounces+samuel.johnson=qebo.co.uk@lists.launchpad.net on behalf of Brajesh Murari brajesh.murari@gmail.com

Date: Wednesday, 19 October 2016 at 08:28

To: Knut Staring knutst@gmail.com

Cc: DHIS 2 Users list dhis2-users@lists.launchpad.net, DHIS2 Developers dhis2-devs@lists.launchpad.net

Subject: Re: [Dhis2-users] [Dhis2-devs] 25 hours in completing Analytic

Dear Neeraj,

The physical database size doesn’t matter much, even the number of records don’t matter. In my experience the biggest problem that one can going to run in to is not size, but the number of queries you can handle at a time instance specially during analytic functionality execution. Most probably you should going to have to move to a master/slave configuration of your database, so that the read queries can run against the slaves and the write queries run against the master. However, if you and your database management team are not ready for this than, you can tweak your indexes for the queries you are running to speed up the response times. Also there is a lot of tweaking you can do to the network stack and kernel in Linux where MySQL Server has been installed that will help.Perhaps, I would focus first on your indexes, then have a server admin look at your OS, and if all that doesn’t help it might be time to implement a master/slave configuration. The most important scalability factor is RAM. If the indexes of your tables fit into memory and your queries are highly optimized in analytic functionality, you can serve a reasonable amount of requests with a average machine. The number of records do matter, depending of how your tables look like. It’s a difference to have a lot of varchar fields or only a couple of ints or longs. The physical size of the database matters as well, think of backups, for instance. Depending on your engine, your physical db files on grow, but don’t shrink, for instance with innodb. So deleting a lot of rows, doesn’t help to shrink your physical files. Thus the database size does matter. If you have more than one table with more than a million records, then performance starts indeed to degrade. Indexig is one of the important stand need to take care, If you hit one million records you will get performance problems, if the indices are not set right (for example no indices for fields in “WHERE statements” or “ON conditions” in joins). If you hit 10 million records, you will start to get performance problems even if you have all your indices right. Hardware upgrades - adding more memory and more processor power, especially memory - often help to reduce the most severe problems by increasing the performance again, at least to a certain degree.

On Wed, Oct 19, 2016 at 12:35 PM, Knut Staring knutst@gmail.com wrote:

Just a heads-up that there seems to be a JDBC issue with Postgres 9.6, so perhaps you should try upgrading to 9.5 first.

On Wed, Oct 19, 2016 at 8:58 AM, Lars Helge Øverland lars@dhis2.org wrote:

Hi Neeraj,

what usually helps to improve runtime is to improve/increase:

  • ssd (read and write speed)
  • number of CPUs
  • using latest postgresql (9.6 claims to have even better indexing
    performance
    than 9.5)

regards,

Lars

Lars Helge Øverland

Lead developer, DHIS 2

University of Oslo

Skype: larshelgeoverland

lars@dhis2.org

http://www.dhis2.org


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

Knut Staring

Dept. of Informatics, University of Oslo

Norway: +4791880522

Skype: knutstar

http://dhis2.org


Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Best Regards,

Brajesh Murari,

Postgraduate, Department of Computer Science and Engineering,

Chaudhary Devi Lal University, Sirsa,

India.

The three basic dimensions of human development: a long and healthy life, access to knowledge, and a decent standard of living.

One other point to mention: if you do try runing VACUUM FULL, you should do it before running the analytics (ie on your initial 100Gb database, not the 500Gb database you end up with after running the analytics), as my guess is that the DHIS2 analytics tables are dropped and re-created anyway, so there wouldn’t be any point in VACUUMing them.

Cheers, Sam.

···

From: Dhis2-users dhis2-users-bounces+samuel.johnson=qebo.co.uk@lists.launchpad.net on behalf of Sam Johnson samuel.johnson@qebo.co.uk

Date: Wednesday, 19 October 2016 at 12:28

To: DHIS 2 Users list dhis2-users@lists.launchpad.net, DHIS2 Developers dhis2-devs@lists.launchpad.net

Subject: Re: [Dhis2-users] [Dhis2-devs] 25 hours in completing Analytic

Hi Neeraj,

Using VACUUM and ANALYZE

Like Brajesh, my background is MySQL, and one database admin task that is often overlooked in MySQL is OPTIMIZE TABLEs. This reclaims unused space (we’ve had 100Gb databases files drop to half their size) and refreshes index statistics (if the shape of your data has changed over time, this can make indices run faster).

I’m new to PostgreSQL, but the core principles are the same, and a quick bit of Googling shows that the equivalents in PostgreSQL are the VACUUM and ANALYZE commands. If your database isn’t set to automatically do VACUUMs (the default DHIS2 postgres config doesn’t seem to be), you might want to try VACUUM FULL, which will literally rewrite all of your database tables and indices into smaller, more efficient files (note, however, that on a 500Gb database this could take a looong time – perhaps test on a backup first?). The following forum post is a really nice, plain-English explanation of what VACUUM does:

http://dba.stackexchange.com/questions/126258/what-is-table-bloating-in-databases

As I mentioned, my background is MySQL rather than Postgres, so someone with more specific Postgres experience might like to also chime in here.

Cheers, Sam.

From: Dhis2-users dhis2-users-bounces+samuel.johnson=qebo.co.uk@lists.launchpad.net on behalf of Brajesh Murari brajesh.murari@gmail.com

Date: Wednesday, 19 October 2016 at 08:28

To: Knut Staring knutst@gmail.com

Cc: DHIS 2 Users list dhis2-users@lists.launchpad.net, DHIS2 Developers dhis2-devs@lists.launchpad.net

Subject: Re: [Dhis2-users] [Dhis2-devs] 25 hours in completing Analytic

Dear Neeraj,

The physical database size doesn’t matter much, even the number of records don’t matter. In my experience the biggest problem that one can going to run in to is not size, but the number of queries you can handle at a time instance specially during analytic functionality execution. Most probably you should going to have to move to a master/slave configuration of your database, so that the read queries can run against the slaves and the write queries run against the master. However, if you and your database management team are not ready for this than, you can tweak your indexes for the queries you are running to speed up the response times. Also there is a lot of tweaking you can do to the network stack and kernel in Linux where MySQL Server has been installed that will help.Perhaps, I would focus first on your indexes, then have a server admin look at your OS, and if all that doesn’t help it might be time to implement a master/slave configuration. The most important scalability factor is RAM. If the indexes of your tables fit into memory and your queries are highly optimized in analytic functionality, you can serve a reasonable amount of requests with a average machine. The number of records do matter, depending of how your tables look like. It’s a difference to have a lot of varchar fields or only a couple of ints or longs. The physical size of the database matters as well, think of backups, for instance. Depending on your engine, your physical db files on grow, but don’t shrink, for instance with innodb. So deleting a lot of rows, doesn’t help to shrink your physical files. Thus the database size does matter. If you have more than one table with more than a million records, then performance starts indeed to degrade. Indexig is one of the important stand need to take care, If you hit one million records you will get performance problems, if the indices are not set right (for example no indices for fields in “WHERE statements” or “ON conditions” in joins). If you hit 10 million records, you will start to get performance problems even if you have all your indices right. Hardware upgrades - adding more memory and more processor power, especially memory - often help to reduce the most severe problems by increasing the performance again, at least to a certain degree.

On Wed, Oct 19, 2016 at 12:35 PM, Knut Staring knutst@gmail.com wrote:

Just a heads-up that there seems to be a JDBC issue with Postgres 9.6, so perhaps you should try upgrading to 9.5 first.

On Wed, Oct 19, 2016 at 8:58 AM, Lars Helge Øverland lars@dhis2.org wrote:

Hi Neeraj,

what usually helps to improve runtime is to improve/increase:

  • ssd (read and write speed)
  • number of CPUs
  • using latest postgresql (9.6 claims to have even better indexing
    performance
    than 9.5)

regards,

Lars

Lars Helge Øverland

Lead developer, DHIS 2

University of Oslo

Skype: larshelgeoverland

lars@dhis2.org

http://www.dhis2.org


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

Knut Staring

Dept. of Informatics, University of Oslo

Norway: +4791880522

Skype: knutstar

http://dhis2.org


Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Best Regards,

Brajesh Murari,

Postgraduate, Department of Computer Science and Engineering,

Chaudhary Devi Lal University, Sirsa,

India.

The three basic dimensions of human development: a long and healthy life, access to knowledge, and a decent standard of living.

Neeraj,

It’s always an element of uncertainty linked to database sizes - ref Sam’s post over. So indicating the number of records you have in the datavalue table & key meta-data tables would be useful + indicating whether you are running other instances on the same server. Some comments - I’ve been doing a lot of similar optimising work recently:

  1. Upgrading to 9.5.4 is strongly recommended (and don’t use 9.6 before the worst bugs are fixed and it has stabilised).

  2. Carefully check your postgres.conf against the recommended settings. The guide is a bit superficial in the sense that it has recommended “fixed” values only and no explanations around ranges below or above those, but you can experiment a bit yourself (e.g. the recommended “max_connections = 200” might not be sufficient for a really large system like what you have.

  3. If your server is running that single instance only, then 48GB or RAM should be sufficient. Our servers are all having 128GB RAM so we experimented quite a bit earlier this year with giving a DHIS2 instance large amounts or RAM (up to 60-70gb), with negligible impact on performance. According to Lars, the DHIS2 cannot really utilize more than around 16gb RAM (at least that is how I understood his communication at the time). So 48GB should be sufficient for a single instance.

  4. I’ve been doing performance optimizing recently on an instance with

  • 4-core server with 2x 512gb ssd, 12gb allocated to DHIS2

  • 31,000 Orgunits

  • 420 data elements

  • 250 indicators

  • around 100 mill datavalue records

  • total size around 140gb with analytics tables.

So the size is only 25% of your 500GB, but RUNNING ANALYTICS ON THAT DATABASE INSTANCE IS TAKING JUST OVER 1 HOUR. Fundamentally, if the analytics engine is designed well, I would expect a nearly linear relationship between database size and the time analytics takes to run. So running analytics on your database on our server should in theory take 4-5 hours.

We are obviously comparing oranges and nectarines here, in the sense that there might be other aspects of our server and database that is different from yours (type of CPU, no of OUs, no of DEs/Indicators, whether your instance have lots of tracker data, etc etc). I have not seen any scientific/quantified comparative performance values related to specific parameters like number of CPUs and/or number of cores, but 12 cores SHOULD improve analytics performance quite a bit - assuming around 30% then it means running analytics on your database/server should take around 3 hours…

I tried getting comparative, quantitative data on various configurations of hardware and software (e.g. some users prefer CentOS, others Ubuntu) during the academy in August, but did not get much - it seems most users/providers have found a setup that works for them for now and nobody is doing any systematic performance testing (some of the international NGOs/companies using DHIS2 might have, but as with internally developed apps they are not that keen on sharing). So it would be highly appreciated if you would post the results on analytics time with every upgrade / tweak you do - starting with the upgrade to Pg 9.5.4

Best regards

Calle

···

On 19 October 2016 at 13:28, Sam Johnson samuel.johnson@qebo.co.uk wrote:

Hi Neeraj,

Using VACUUM and ANALYZE

Like Brajesh, my background is MySQL, and one database admin task that is often overlooked in MySQL is OPTIMIZE TABLEs. This reclaims unused space (we’ve had 100Gb databases files drop to half their size) and refreshes index statistics (if the shape of your data has changed over time, this can make indices run faster).

I’m new to PostgreSQL, but the core principles are the same, and a quick bit of Googling shows that the equivalents in PostgreSQL are the VACUUM and ANALYZE commands. If your database isn’t set to automatically do VACUUMs (the default DHIS2 postgres config doesn’t seem to be), you might want to try VACUUM FULL, which will literally rewrite all of your database tables and indices into smaller, more efficient files (note, however, that on a 500Gb database this could take a looong time – perhaps test on a backup first?). The following forum post is a really nice, plain-English explanation of what VACUUM does:

http://dba.stackexchange.com/questions/126258/what-is-table-bloating-in-databases

As I mentioned, my background is MySQL rather than Postgres, so someone with more specific Postgres experience might like to also chime in here.

Cheers, Sam.

From: Dhis2-users dhis2-users-bounces+samuel.johnson=qebo.co.uk@lists.launchpad.net on behalf of Brajesh Murari brajesh.murari@gmail.com

Date: Wednesday, 19 October 2016 at 08:28

To: Knut Staring knutst@gmail.com

Cc: DHIS 2 Users list dhis2-users@lists.launchpad.net, DHIS2 Developers dhis2-devs@lists.launchpad.net

Subject: Re: [Dhis2-users] [Dhis2-devs] 25 hours in completing Analytic

Dear Neeraj,

The physical database size doesn’t matter much, even the number of records don’t matter. In my experience the biggest problem that one can going to run in to is not size, but the number of queries you can handle at a time instance specially during analytic functionality execution. Most probably you should going to have to move to a master/slave configuration of your database, so that the read queries can run against the slaves and the write queries run against the master. However, if you and your database management team are not ready for this than, you can tweak your indexes for the queries you are running to speed up the response times. Also there is a lot of tweaking you can do to the network stack and kernel in Linux where MySQL Server has been installed that will help.Perhaps, I would focus first on your indexes, then have a server admin look at your OS, and if all that doesn’t help it might be time to implement a master/slave configuration. The most important scalability factor is RAM. If the indexes of your tables fit into memory and your queries are highly optimized in analytic functionality, you can serve a reasonable amount of requests with a average machine. The number of records do matter, depending of how your tables look like. It’s a difference to have a lot of varchar fields or only a couple of ints or longs. The physical size of the database matters as well, think of backups, for instance. Depending on your engine, your physical db files on grow, but don’t shrink, for instance with innodb. So deleting a lot of rows, doesn’t help to shrink your physical files. Thus the database size does matter. If you have more than one table with more than a million records, then performance starts indeed to degrade. Indexig is one of the important stand need to take care, If you hit one million records you will get performance problems, if the indices are not set right (for example no indices for fields in “WHERE statements” or “ON conditions” in joins). If you hit 10 million records, you will start to get performance problems even if you have all your indices right. Hardware upgrades - adding more memory and more processor power, especially memory - often help to reduce the most severe problems by increasing the performance again, at least to a certain degree.

On Wed, Oct 19, 2016 at 12:35 PM, Knut Staring knutst@gmail.com wrote:

Just a heads-up that there seems to be a JDBC issue with Postgres 9.6, so perhaps you should try upgrading to 9.5 first.

On Wed, Oct 19, 2016 at 8:58 AM, Lars Helge Øverland lars@dhis2.org wrote:

Hi Neeraj,

what usually helps to improve runtime is to improve/increase:

  • ssd (read and write speed)
  • number of CPUs
  • using latest postgresql (9.6 claims to have even better indexing
    performance
    than 9.5)

regards,

Lars

Lars Helge Øverland

Lead developer, DHIS 2

University of Oslo

Skype: larshelgeoverland

lars@dhis2.org

http://www.dhis2.org


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

Knut Staring

Dept. of Informatics, University of Oslo

Norway: +4791880522

Skype: knutstar

http://dhis2.org


Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Best Regards,

Brajesh Murari,

Postgraduate, Department of Computer Science and Engineering,

Chaudhary Devi Lal University, Sirsa,

India.

The three basic dimensions of human development: a long and healthy life, access to knowledge, and a decent standard of living.


Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp


Calle Hedberg

46D Alma Road, 7700 Rosebank, SOUTH AFRICA

Tel/fax (home): +27-21-685-6472

Cell: +27-82-853-5352

Iridium SatPhone: +8816-315-19119

Email: calle.hedberg@gmail.com

Skype: calle_hedberg


Dear Sir,

···

I had issues in connecting to PoastgreSQL 9.6 due to HSBC error has that been resolved now. Please let me know.

I believe 9.6 can handle parallel queries & will have better handling of vacuum, which might be usesful for large databases as handled by Neeraj.

Dr. Sunil Gavaskar ParthaSarathy. M.B., B.S., D.A.,

Medical Officer

Tamil Nadu Health Systems Project

Sent from my iPhone

On 19-Oct-2016, at 12:28 PM, Lars Helge Øverland lars@dhis2.org wrote:

Hi Neeraj,

what usually helps to improve runtime is to improve/increase:

  • ssd (read and write speed)
  • number of CPUs
  • using latest postgresql (9.6 claims to have even better indexing performance than 9.5)

regards,

Lars


Lars Helge Øverland

Lead developer, DHIS 2

University of Oslo

Skype: larshelgeoverland

lars@dhis2.org

http://www.dhis2.org


Mailing list: https://launchpad.net/~dhis2-users
Post to : dhis2-users@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-users
More help : https://help.launchpad.net/ListHelp

Thank you everyone. I will give try to each options you gave and post the result on the mail.

Thanks,

Neeraj

···

On Wed, Oct 19, 2016 at 6:46 PM, Calle Hedberg calle.hedberg@gmail.com wrote:

Neeraj,

It’s always an element of uncertainty linked to database sizes - ref Sam’s post over. So indicating the number of records you have in the datavalue table & key meta-data tables would be useful + indicating whether you are running other instances on the same server. Some comments - I’ve been doing a lot of similar optimising work recently:

  1. Upgrading to 9.5.4 is strongly recommended (and don’t use 9.6 before the worst bugs are fixed and it has stabilised).
  1. Carefully check your postgres.conf against the recommended settings. The guide is a bit superficial in the sense that it has recommended “fixed” values only and no explanations around ranges below or above those, but you can experiment a bit yourself (e.g. the recommended “max_connections = 200” might not be sufficient for a really large system like what you have.
  1. If your server is running that single instance only, then 48GB or RAM should be sufficient. Our servers are all having 128GB RAM so we experimented quite a bit earlier this year with giving a DHIS2 instance large amounts or RAM (up to 60-70gb), with negligible impact on performance. According to Lars, the DHIS2 cannot really utilize more than around 16gb RAM (at least that is how I understood his communication at the time). So 48GB should be sufficient for a single instance.
  1. I’ve been doing performance optimizing recently on an instance with
  • 4-core server with 2x 512gb ssd, 12gb allocated to DHIS2
  • 31,000 Orgunits
  • 420 data elements
  • 250 indicators
  • around 100 mill datavalue records
  • total size around 140gb with analytics tables.

So the size is only 25% of your 500GB, but RUNNING ANALYTICS ON THAT DATABASE INSTANCE IS TAKING JUST OVER 1 HOUR. Fundamentally, if the analytics engine is designed well, I would expect a nearly linear relationship between database size and the time analytics takes to run. So running analytics on your database on our server should in theory take 4-5 hours.

We are obviously comparing oranges and nectarines here, in the sense that there might be other aspects of our server and database that is different from yours (type of CPU, no of OUs, no of DEs/Indicators, whether your instance have lots of tracker data, etc etc). I have not seen any scientific/quantified comparative performance values related to specific parameters like number of CPUs and/or number of cores, but 12 cores SHOULD improve analytics performance quite a bit - assuming around 30% then it means running analytics on your database/server should take around 3 hours…

I tried getting comparative, quantitative data on various configurations of hardware and software (e.g. some users prefer CentOS, others Ubuntu) during the academy in August, but did not get much - it seems most users/providers have found a setup that works for them for now and nobody is doing any systematic performance testing (some of the international NGOs/companies using DHIS2 might have, but as with internally developed apps they are not that keen on sharing). So it would be highly appreciated if you would post the results on analytics time with every upgrade / tweak you do - starting with the upgrade to Pg 9.5.4

Best regards

Calle


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

On 19 October 2016 at 13:28, Sam Johnson samuel.johnson@qebo.co.uk wrote:

Hi Neeraj,

Using VACUUM and ANALYZE

Like Brajesh, my background is MySQL, and one database admin task that is often overlooked in MySQL is OPTIMIZE TABLEs. This reclaims unused space (we’ve had 100Gb databases files drop to half their size) and refreshes index statistics (if the shape of your data has changed over time, this can make indices run faster).

I’m new to PostgreSQL, but the core principles are the same, and a quick bit of Googling shows that the equivalents in PostgreSQL are the VACUUM and ANALYZE commands. If your database isn’t set to automatically do VACUUMs (the default DHIS2 postgres config doesn’t seem to be), you might want to try VACUUM FULL, which will literally rewrite all of your database tables and indices into smaller, more efficient files (note, however, that on a 500Gb database this could take a looong time – perhaps test on a backup first?). The following forum post is a really nice, plain-English explanation of what VACUUM does:

http://dba.stackexchange.com/questions/126258/what-is-table-bloating-in-databases

As I mentioned, my background is MySQL rather than Postgres, so someone with more specific Postgres experience might like to also chime in here.

Cheers, Sam.

From: Dhis2-users <dhis2-users-bounces+samuel.johnson=qebo.co.uk@lists.launchpad.net> on behalf of Brajesh Murari brajesh.murari@gmail.com

Date: Wednesday, 19 October 2016 at 08:28

To: Knut Staring knutst@gmail.com

Cc: DHIS 2 Users list dhis2-users@lists.launchpad.net, DHIS2 Developers dhis2-devs@lists.launchpad.net

Subject: Re: [Dhis2-users] [Dhis2-devs] 25 hours in completing Analytic

Dear Neeraj,

The physical database size doesn’t matter much, even the number of records don’t matter. In my experience the biggest problem that one can going to run in to is not size, but the number of queries you can handle at a time instance specially during analytic functionality execution. Most probably you should going to have to move to a master/slave configuration of your database, so that the read queries can run against the slaves and the write queries run against the master. However, if you and your database management team are not ready for this than, you can tweak your indexes for the queries you are running to speed up the response times. Also there is a lot of tweaking you can do to the network stack and kernel in Linux where MySQL Server has been installed that will help.Perhaps, I would focus first on your indexes, then have a server admin look at your OS, and if all that doesn’t help it might be time to implement a master/slave configuration. The most important scalability factor is RAM. If the indexes of your tables fit into memory and your queries are highly optimized in analytic functionality, you can serve a reasonable amount of requests with a average machine. The number of records do matter, depending of how your tables look like. It’s a difference to have a lot of varchar fields or only a couple of ints or longs. The physical size of the database matters as well, think of backups, for instance. Depending on your engine, your physical db files on grow, but don’t shrink, for instance with innodb. So deleting a lot of rows, doesn’t help to shrink your physical files. Thus the database size does matter. If you have more than one table with more than a million records, then performance starts indeed to degrade. Indexig is one of the important stand need to take care, If you hit one million records you will get performance problems, if the indices are not set right (for example no indices for fields in “WHERE statements” or “ON conditions” in joins). If you hit 10 million records, you will start to get performance problems even if you have all your indices right. Hardware upgrades - adding more memory and more processor power, especially memory - often help to reduce the most severe problems by increasing the performance again, at least to a certain degree.

On Wed, Oct 19, 2016 at 12:35 PM, Knut Staring knutst@gmail.com wrote:

Just a heads-up that there seems to be a JDBC issue with Postgres 9.6, so perhaps you should try upgrading to 9.5 first.

On Wed, Oct 19, 2016 at 8:58 AM, Lars Helge Øverland lars@dhis2.org wrote:

Hi Neeraj,

what usually helps to improve runtime is to improve/increase:

  • ssd (read and write speed)
  • number of CPUs
  • using latest postgresql (9.6 claims to have even better indexing
    performance
    than 9.5)

regards,

Lars

Lars Helge Øverland

Lead developer, DHIS 2

University of Oslo

Skype: larshelgeoverland

lars@dhis2.org

http://www.dhis2.org


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

Knut Staring

Dept. of Informatics, University of Oslo

Norway: +4791880522

Skype: knutstar

http://dhis2.org


Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Best Regards,

Brajesh Murari,

Postgraduate, Department of Computer Science and Engineering,

Chaudhary Devi Lal University, Sirsa,

India.

The three basic dimensions of human development: a long and healthy life, access to knowledge, and a decent standard of living.


Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp


Calle Hedberg

46D Alma Road, 7700 Rosebank, SOUTH AFRICA

Tel/fax (home): +27-21-685-6472

Cell: +27-82-853-5352

Iridium SatPhone: +8816-315-19119

Email: calle.hedberg@gmail.com

Skype: calle_hedberg


Thanks,
Neeraj Gupta

Dear Team, Thanks for all your suggestions.

Now the time of analytic is reduced to 10 hours 41 minutes.

We tried to VACUUM as Sam suggested but it didn’t help then we upgraded postgres from 9.4 to 9.5.4 and as Calle and Bob suggested we made some changes in configuration file of postgres and it reduced the time. But the database size is still same.

Thanks for all your help!

Thanks,

Neeraj

···

On Wed, Oct 19, 2016 at 6:46 PM, Calle Hedberg calle.hedberg@gmail.com wrote:

Neeraj,

It’s always an element of uncertainty linked to database sizes - ref Sam’s post over. So indicating the number of records you have in the datavalue table & key meta-data tables would be useful + indicating whether you are running other instances on the same server. Some comments - I’ve been doing a lot of similar optimising work recently:

  1. Upgrading to 9.5.4 is strongly recommended (and don’t use 9.6 before the worst bugs are fixed and it has stabilised).
  1. Carefully check your postgres.conf against the recommended settings. The guide is a bit superficial in the sense that it has recommended “fixed” values only and no explanations around ranges below or above those, but you can experiment a bit yourself (e.g. the recommended “max_connections = 200” might not be sufficient for a really large system like what you have.
  1. If your server is running that single instance only, then 48GB or RAM should be sufficient. Our servers are all having 128GB RAM so we experimented quite a bit earlier this year with giving a DHIS2 instance large amounts or RAM (up to 60-70gb), with negligible impact on performance. According to Lars, the DHIS2 cannot really utilize more than around 16gb RAM (at least that is how I understood his communication at the time). So 48GB should be sufficient for a single instance.
  1. I’ve been doing performance optimizing recently on an instance with
  • 4-core server with 2x 512gb ssd, 12gb allocated to DHIS2
  • 31,000 Orgunits
  • 420 data elements
  • 250 indicators
  • around 100 mill datavalue records
  • total size around 140gb with analytics tables.

So the size is only 25% of your 500GB, but RUNNING ANALYTICS ON THAT DATABASE INSTANCE IS TAKING JUST OVER 1 HOUR. Fundamentally, if the analytics engine is designed well, I would expect a nearly linear relationship between database size and the time analytics takes to run. So running analytics on your database on our server should in theory take 4-5 hours.

We are obviously comparing oranges and nectarines here, in the sense that there might be other aspects of our server and database that is different from yours (type of CPU, no of OUs, no of DEs/Indicators, whether your instance have lots of tracker data, etc etc). I have not seen any scientific/quantified comparative performance values related to specific parameters like number of CPUs and/or number of cores, but 12 cores SHOULD improve analytics performance quite a bit - assuming around 30% then it means running analytics on your database/server should take around 3 hours…

I tried getting comparative, quantitative data on various configurations of hardware and software (e.g. some users prefer CentOS, others Ubuntu) during the academy in August, but did not get much - it seems most users/providers have found a setup that works for them for now and nobody is doing any systematic performance testing (some of the international NGOs/companies using DHIS2 might have, but as with internally developed apps they are not that keen on sharing). So it would be highly appreciated if you would post the results on analytics time with every upgrade / tweak you do - starting with the upgrade to Pg 9.5.4

Best regards

Calle


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp


Thanks,
Neeraj Gupta

On 19 October 2016 at 13:28, Sam Johnson samuel.johnson@qebo.co.uk wrote:

Hi Neeraj,

Using VACUUM and ANALYZE

Like Brajesh, my background is MySQL, and one database admin task that is often overlooked in MySQL is OPTIMIZE TABLEs. This reclaims unused space (we’ve had 100Gb databases files drop to half their size) and refreshes index statistics (if the shape of your data has changed over time, this can make indices run faster).

I’m new to PostgreSQL, but the core principles are the same, and a quick bit of Googling shows that the equivalents in PostgreSQL are the VACUUM and ANALYZE commands. If your database isn’t set to automatically do VACUUMs (the default DHIS2 postgres config doesn’t seem to be), you might want to try VACUUM FULL, which will literally rewrite all of your database tables and indices into smaller, more efficient files (note, however, that on a 500Gb database this could take a looong time – perhaps test on a backup first?). The following forum post is a really nice, plain-English explanation of what VACUUM does:

http://dba.stackexchange.com/questions/126258/what-is-table-bloating-in-databases

As I mentioned, my background is MySQL rather than Postgres, so someone with more specific Postgres experience might like to also chime in here.

Cheers, Sam.

From: Dhis2-users <dhis2-users-bounces+samuel.johnson=qebo.co.uk@lists.launchpad.net> on behalf of Brajesh Murari brajesh.murari@gmail.com

Date: Wednesday, 19 October 2016 at 08:28

To: Knut Staring knutst@gmail.com

Cc: DHIS 2 Users list dhis2-users@lists.launchpad.net, DHIS2 Developers dhis2-devs@lists.launchpad.net

Subject: Re: [Dhis2-users] [Dhis2-devs] 25 hours in completing Analytic

Dear Neeraj,

The physical database size doesn’t matter much, even the number of records don’t matter. In my experience the biggest problem that one can going to run in to is not size, but the number of queries you can handle at a time instance specially during analytic functionality execution. Most probably you should going to have to move to a master/slave configuration of your database, so that the read queries can run against the slaves and the write queries run against the master. However, if you and your database management team are not ready for this than, you can tweak your indexes for the queries you are running to speed up the response times. Also there is a lot of tweaking you can do to the network stack and kernel in Linux where MySQL Server has been installed that will help.Perhaps, I would focus first on your indexes, then have a server admin look at your OS, and if all that doesn’t help it might be time to implement a master/slave configuration. The most important scalability factor is RAM. If the indexes of your tables fit into memory and your queries are highly optimized in analytic functionality, you can serve a reasonable amount of requests with a average machine. The number of records do matter, depending of how your tables look like. It’s a difference to have a lot of varchar fields or only a couple of ints or longs. The physical size of the database matters as well, think of backups, for instance. Depending on your engine, your physical db files on grow, but don’t shrink, for instance with innodb. So deleting a lot of rows, doesn’t help to shrink your physical files. Thus the database size does matter. If you have more than one table with more than a million records, then performance starts indeed to degrade. Indexig is one of the important stand need to take care, If you hit one million records you will get performance problems, if the indices are not set right (for example no indices for fields in “WHERE statements” or “ON conditions” in joins). If you hit 10 million records, you will start to get performance problems even if you have all your indices right. Hardware upgrades - adding more memory and more processor power, especially memory - often help to reduce the most severe problems by increasing the performance again, at least to a certain degree.

On Wed, Oct 19, 2016 at 12:35 PM, Knut Staring knutst@gmail.com wrote:

Just a heads-up that there seems to be a JDBC issue with Postgres 9.6, so perhaps you should try upgrading to 9.5 first.

On Wed, Oct 19, 2016 at 8:58 AM, Lars Helge Øverland lars@dhis2.org wrote:

Hi Neeraj,

what usually helps to improve runtime is to improve/increase:

  • ssd (read and write speed)
  • number of CPUs
  • using latest postgresql (9.6 claims to have even better indexing
    performance
    than 9.5)

regards,

Lars

Lars Helge Øverland

Lead developer, DHIS 2

University of Oslo

Skype: larshelgeoverland

lars@dhis2.org

http://www.dhis2.org


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

Knut Staring

Dept. of Informatics, University of Oslo

Norway: +4791880522

Skype: knutstar

http://dhis2.org


Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Best Regards,

Brajesh Murari,

Postgraduate, Department of Computer Science and Engineering,

Chaudhary Devi Lal University, Sirsa,

India.

The three basic dimensions of human development: a long and healthy life, access to knowledge, and a decent standard of living.


Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp


Calle Hedberg

46D Alma Road, 7700 Rosebank, SOUTH AFRICA

Tel/fax (home): +27-21-685-6472

Cell: +27-82-853-5352

Iridium SatPhone: +8816-315-19119

Email: calle.hedberg@gmail.com

Skype: calle_hedberg


Congratulation Neeraj and team …it much appreciated

···

On Mon, Oct 24, 2016 at 11:08 AM, Neeraj Gupta neeraj.hisp@gmail.com wrote:

Dear Team, Thanks for all your suggestions.

Now the time of analytic is reduced to 10 hours 41 minutes.

We tried to VACUUM as Sam suggested but it didn’t help then we upgraded postgres from 9.4 to 9.5.4 and as Calle and Bob suggested we made some changes in configuration file of postgres and it reduced the time. But the database size is still same.

Thanks for all your help!

Thanks,

Neeraj


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

On Wed, Oct 19, 2016 at 6:46 PM, Calle Hedberg calle.hedberg@gmail.com wrote:

Neeraj,

It’s always an element of uncertainty linked to database sizes - ref Sam’s post over. So indicating the number of records you have in the datavalue table & key meta-data tables would be useful + indicating whether you are running other instances on the same server. Some comments - I’ve been doing a lot of similar optimising work recently:

  1. Upgrading to 9.5.4 is strongly recommended (and don’t use 9.6 before the worst bugs are fixed and it has stabilised).
  1. Carefully check your postgres.conf against the recommended settings. The guide is a bit superficial in the sense that it has recommended “fixed” values only and no explanations around ranges below or above those, but you can experiment a bit yourself (e.g. the recommended “max_connections = 200” might not be sufficient for a really large system like what you have.
  1. If your server is running that single instance only, then 48GB or RAM should be sufficient. Our servers are all having 128GB RAM so we experimented quite a bit earlier this year with giving a DHIS2 instance large amounts or RAM (up to 60-70gb), with negligible impact on performance. According to Lars, the DHIS2 cannot really utilize more than around 16gb RAM (at least that is how I understood his communication at the time). So 48GB should be sufficient for a single instance.
  1. I’ve been doing performance optimizing recently on an instance with
  • 4-core server with 2x 512gb ssd, 12gb allocated to DHIS2
  • 31,000 Orgunits
  • 420 data elements
  • 250 indicators
  • around 100 mill datavalue records
  • total size around 140gb with analytics tables.

So the size is only 25% of your 500GB, but RUNNING ANALYTICS ON THAT DATABASE INSTANCE IS TAKING JUST OVER 1 HOUR. Fundamentally, if the analytics engine is designed well, I would expect a nearly linear relationship between database size and the time analytics takes to run. So running analytics on your database on our server should in theory take 4-5 hours.

We are obviously comparing oranges and nectarines here, in the sense that there might be other aspects of our server and database that is different from yours (type of CPU, no of OUs, no of DEs/Indicators, whether your instance have lots of tracker data, etc etc). I have not seen any scientific/quantified comparative performance values related to specific parameters like number of CPUs and/or number of cores, but 12 cores SHOULD improve analytics performance quite a bit - assuming around 30% then it means running analytics on your database/server should take around 3 hours…

I tried getting comparative, quantitative data on various configurations of hardware and software (e.g. some users prefer CentOS, others Ubuntu) during the academy in August, but did not get much - it seems most users/providers have found a setup that works for them for now and nobody is doing any systematic performance testing (some of the international NGOs/companies using DHIS2 might have, but as with internally developed apps they are not that keen on sharing). So it would be highly appreciated if you would post the results on analytics time with every upgrade / tweak you do - starting with the upgrade to Pg 9.5.4

Best regards

Calle


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

Thanks,
Neeraj Gupta

On 19 October 2016 at 13:28, Sam Johnson samuel.johnson@qebo.co.uk wrote:

Hi Neeraj,

Using VACUUM and ANALYZE

Like Brajesh, my background is MySQL, and one database admin task that is often overlooked in MySQL is OPTIMIZE TABLEs. This reclaims unused space (we’ve had 100Gb databases files drop to half their size) and refreshes index statistics (if the shape of your data has changed over time, this can make indices run faster).

I’m new to PostgreSQL, but the core principles are the same, and a quick bit of Googling shows that the equivalents in PostgreSQL are the VACUUM and ANALYZE commands. If your database isn’t set to automatically do VACUUMs (the default DHIS2 postgres config doesn’t seem to be), you might want to try VACUUM FULL, which will literally rewrite all of your database tables and indices into smaller, more efficient files (note, however, that on a 500Gb database this could take a looong time – perhaps test on a backup first?). The following forum post is a really nice, plain-English explanation of what VACUUM does:

http://dba.stackexchange.com/questions/126258/what-is-table-bloating-in-databases

As I mentioned, my background is MySQL rather than Postgres, so someone with more specific Postgres experience might like to also chime in here.

Cheers, Sam.

From: Dhis2-users <dhis2-users-bounces+samuel.johnson=qebo.co.uk@lists.launchpad.net> on behalf of Brajesh Murari brajesh.murari@gmail.com

Date: Wednesday, 19 October 2016 at 08:28

To: Knut Staring knutst@gmail.com

Cc: DHIS 2 Users list dhis2-users@lists.launchpad.net, DHIS2 Developers dhis2-devs@lists.launchpad.net

Subject: Re: [Dhis2-users] [Dhis2-devs] 25 hours in completing Analytic

Dear Neeraj,

The physical database size doesn’t matter much, even the number of records don’t matter. In my experience the biggest problem that one can going to run in to is not size, but the number of queries you can handle at a time instance specially during analytic functionality execution. Most probably you should going to have to move to a master/slave configuration of your database, so that the read queries can run against the slaves and the write queries run against the master. However, if you and your database management team are not ready for this than, you can tweak your indexes for the queries you are running to speed up the response times. Also there is a lot of tweaking you can do to the network stack and kernel in Linux where MySQL Server has been installed that will help.Perhaps, I would focus first on your indexes, then have a server admin look at your OS, and if all that doesn’t help it might be time to implement a master/slave configuration. The most important scalability factor is RAM. If the indexes of your tables fit into memory and your queries are highly optimized in analytic functionality, you can serve a reasonable amount of requests with a average machine. The number of records do matter, depending of how your tables look like. It’s a difference to have a lot of varchar fields or only a couple of ints or longs. The physical size of the database matters as well, think of backups, for instance. Depending on your engine, your physical db files on grow, but don’t shrink, for instance with innodb. So deleting a lot of rows, doesn’t help to shrink your physical files. Thus the database size does matter. If you have more than one table with more than a million records, then performance starts indeed to degrade. Indexig is one of the important stand need to take care, If you hit one million records you will get performance problems, if the indices are not set right (for example no indices for fields in “WHERE statements” or “ON conditions” in joins). If you hit 10 million records, you will start to get performance problems even if you have all your indices right. Hardware upgrades - adding more memory and more processor power, especially memory - often help to reduce the most severe problems by increasing the performance again, at least to a certain degree.

On Wed, Oct 19, 2016 at 12:35 PM, Knut Staring knutst@gmail.com wrote:

Just a heads-up that there seems to be a JDBC issue with Postgres 9.6, so perhaps you should try upgrading to 9.5 first.

On Wed, Oct 19, 2016 at 8:58 AM, Lars Helge Øverland lars@dhis2.org wrote:

Hi Neeraj,

what usually helps to improve runtime is to improve/increase:

  • ssd (read and write speed)
  • number of CPUs
  • using latest postgresql (9.6 claims to have even better indexing
    performance
    than 9.5)

regards,

Lars

Lars Helge Øverland

Lead developer, DHIS 2

University of Oslo

Skype: larshelgeoverland

lars@dhis2.org

http://www.dhis2.org


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

Knut Staring

Dept. of Informatics, University of Oslo

Norway: +4791880522

Skype: knutstar

http://dhis2.org


Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Best Regards,

Brajesh Murari,

Postgraduate, Department of Computer Science and Engineering,

Chaudhary Devi Lal University, Sirsa,

India.

The three basic dimensions of human development: a long and healthy life, access to knowledge, and a decent standard of living.


Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp


Calle Hedberg

46D Alma Road, 7700 Rosebank, SOUTH AFRICA

Tel/fax (home): +27-21-685-6472

Cell: +27-82-853-5352

Iridium SatPhone: +8816-315-19119

Email: calle.hedberg@gmail.com

Skype: calle_hedberg


Best Regards,

Brajesh Murari,

Postgraduate, Dept of CSE,

Chaudhary Devi Lal University, Sirsa,

India.

The three basic dimensions of human development: a long and healthy life, access to knowledge, and a decent standard of living.

Neeraj,

You never stated the number of records you have in the datavalue table - what is it?

In the same context: anybody have a rough idea of how many datavalue records there are in the global DATIM database - which I think currently might be the largest DHIS2 instance around?

Given our own recent work on performance + what Neeraj has reported, I’ve been thinking of creating one test instance with let us say 500 mill datavalue records and another with let us say 1 billion, then use them to identify key bottlenecks in various processes AND use them to ensure that DHIS2 analytics performance is as linear as possible in terms of database size. Postgresql has introduced a number of new indexing algorithms in recent versions, and I’m not sure if DHIS2 is taking full advantage of them.

Best regards

Calle

···

On 24 October 2016 at 07:53, Brajesh Murari brajesh.murari@gmail.com wrote:

Congratulation Neeraj and team …it much appreciated

On Mon, Oct 24, 2016 at 11:08 AM, Neeraj Gupta neeraj.hisp@gmail.com wrote:

Dear Team, Thanks for all your suggestions.

Now the time of analytic is reduced to 10 hours 41 minutes.

We tried to VACUUM as Sam suggested but it didn’t help then we upgraded postgres from 9.4 to 9.5.4 and as Calle and Bob suggested we made some changes in configuration file of postgres and it reduced the time. But the database size is still same.

Thanks for all your help!

Thanks,

Neeraj


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

Best Regards,

Brajesh Murari,

Postgraduate, Dept of CSE,

Chaudhary Devi Lal University, Sirsa,

India.

The three basic dimensions of human development: a long and healthy life, access to knowledge, and a decent standard of living.

On Wed, Oct 19, 2016 at 6:46 PM, Calle Hedberg calle.hedberg@gmail.com wrote:

Neeraj,

It’s always an element of uncertainty linked to database sizes - ref Sam’s post over. So indicating the number of records you have in the datavalue table & key meta-data tables would be useful + indicating whether you are running other instances on the same server. Some comments - I’ve been doing a lot of similar optimising work recently:

  1. Upgrading to 9.5.4 is strongly recommended (and don’t use 9.6 before the worst bugs are fixed and it has stabilised).
  1. Carefully check your postgres.conf against the recommended settings. The guide is a bit superficial in the sense that it has recommended “fixed” values only and no explanations around ranges below or above those, but you can experiment a bit yourself (e.g. the recommended “max_connections = 200” might not be sufficient for a really large system like what you have.
  1. If your server is running that single instance only, then 48GB or RAM should be sufficient. Our servers are all having 128GB RAM so we experimented quite a bit earlier this year with giving a DHIS2 instance large amounts or RAM (up to 60-70gb), with negligible impact on performance. According to Lars, the DHIS2 cannot really utilize more than around 16gb RAM (at least that is how I understood his communication at the time). So 48GB should be sufficient for a single instance.
  1. I’ve been doing performance optimizing recently on an instance with
  • 4-core server with 2x 512gb ssd, 12gb allocated to DHIS2
  • 31,000 Orgunits
  • 420 data elements
  • 250 indicators
  • around 100 mill datavalue records
  • total size around 140gb with analytics tables.

So the size is only 25% of your 500GB, but RUNNING ANALYTICS ON THAT DATABASE INSTANCE IS TAKING JUST OVER 1 HOUR. Fundamentally, if the analytics engine is designed well, I would expect a nearly linear relationship between database size and the time analytics takes to run. So running analytics on your database on our server should in theory take 4-5 hours.

We are obviously comparing oranges and nectarines here, in the sense that there might be other aspects of our server and database that is different from yours (type of CPU, no of OUs, no of DEs/Indicators, whether your instance have lots of tracker data, etc etc). I have not seen any scientific/quantified comparative performance values related to specific parameters like number of CPUs and/or number of cores, but 12 cores SHOULD improve analytics performance quite a bit - assuming around 30% then it means running analytics on your database/server should take around 3 hours…

I tried getting comparative, quantitative data on various configurations of hardware and software (e.g. some users prefer CentOS, others Ubuntu) during the academy in August, but did not get much - it seems most users/providers have found a setup that works for them for now and nobody is doing any systematic performance testing (some of the international NGOs/companies using DHIS2 might have, but as with internally developed apps they are not that keen on sharing). So it would be highly appreciated if you would post the results on analytics time with every upgrade / tweak you do - starting with the upgrade to Pg 9.5.4

Best regards

Calle


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

Thanks,
Neeraj Gupta

On 19 October 2016 at 13:28, Sam Johnson samuel.johnson@qebo.co.uk wrote:

Hi Neeraj,

Using VACUUM and ANALYZE

Like Brajesh, my background is MySQL, and one database admin task that is often overlooked in MySQL is OPTIMIZE TABLEs. This reclaims unused space (we’ve had 100Gb databases files drop to half their size) and refreshes index statistics (if the shape of your data has changed over time, this can make indices run faster).

I’m new to PostgreSQL, but the core principles are the same, and a quick bit of Googling shows that the equivalents in PostgreSQL are the VACUUM and ANALYZE commands. If your database isn’t set to automatically do VACUUMs (the default DHIS2 postgres config doesn’t seem to be), you might want to try VACUUM FULL, which will literally rewrite all of your database tables and indices into smaller, more efficient files (note, however, that on a 500Gb database this could take a looong time – perhaps test on a backup first?). The following forum post is a really nice, plain-English explanation of what VACUUM does:

http://dba.stackexchange.com/questions/126258/what-is-table-bloating-in-databases

As I mentioned, my background is MySQL rather than Postgres, so someone with more specific Postgres experience might like to also chime in here.

Cheers, Sam.

From: Dhis2-users <dhis2-users-bounces+samuel.johnson=qebo.co.uk@lists.launchpad.net> on behalf of Brajesh Murari brajesh.murari@gmail.com

Date: Wednesday, 19 October 2016 at 08:28

To: Knut Staring knutst@gmail.com

Cc: DHIS 2 Users list dhis2-users@lists.launchpad.net, DHIS2 Developers dhis2-devs@lists.launchpad.net

Subject: Re: [Dhis2-users] [Dhis2-devs] 25 hours in completing Analytic

Dear Neeraj,

The physical database size doesn’t matter much, even the number of records don’t matter. In my experience the biggest problem that one can going to run in to is not size, but the number of queries you can handle at a time instance specially during analytic functionality execution. Most probably you should going to have to move to a master/slave configuration of your database, so that the read queries can run against the slaves and the write queries run against the master. However, if you and your database management team are not ready for this than, you can tweak your indexes for the queries you are running to speed up the response times. Also there is a lot of tweaking you can do to the network stack and kernel in Linux where MySQL Server has been installed that will help.Perhaps, I would focus first on your indexes, then have a server admin look at your OS, and if all that doesn’t help it might be time to implement a master/slave configuration. The most important scalability factor is RAM. If the indexes of your tables fit into memory and your queries are highly optimized in analytic functionality, you can serve a reasonable amount of requests with a average machine. The number of records do matter, depending of how your tables look like. It’s a difference to have a lot of varchar fields or only a couple of ints or longs. The physical size of the database matters as well, think of backups, for instance. Depending on your engine, your physical db files on grow, but don’t shrink, for instance with innodb. So deleting a lot of rows, doesn’t help to shrink your physical files. Thus the database size does matter. If you have more than one table with more than a million records, then performance starts indeed to degrade. Indexig is one of the important stand need to take care, If you hit one million records you will get performance problems, if the indices are not set right (for example no indices for fields in “WHERE statements” or “ON conditions” in joins). If you hit 10 million records, you will start to get performance problems even if you have all your indices right. Hardware upgrades - adding more memory and more processor power, especially memory - often help to reduce the most severe problems by increasing the performance again, at least to a certain degree.

On Wed, Oct 19, 2016 at 12:35 PM, Knut Staring knutst@gmail.com wrote:

Just a heads-up that there seems to be a JDBC issue with Postgres 9.6, so perhaps you should try upgrading to 9.5 first.

On Wed, Oct 19, 2016 at 8:58 AM, Lars Helge Øverland lars@dhis2.org wrote:

Hi Neeraj,

what usually helps to improve runtime is to improve/increase:

  • ssd (read and write speed)
  • number of CPUs
  • using latest postgresql (9.6 claims to have even better indexing
    performance
    than 9.5)

regards,

Lars

Lars Helge Øverland

Lead developer, DHIS 2

University of Oslo

Skype: larshelgeoverland

lars@dhis2.org

http://www.dhis2.org


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

Knut Staring

Dept. of Informatics, University of Oslo

Norway: +4791880522

Skype: knutstar

http://dhis2.org


Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Best Regards,

Brajesh Murari,

Postgraduate, Department of Computer Science and Engineering,

Chaudhary Devi Lal University, Sirsa,

India.

The three basic dimensions of human development: a long and healthy life, access to knowledge, and a decent standard of living.


Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp


Calle Hedberg

46D Alma Road, 7700 Rosebank, SOUTH AFRICA

Tel/fax (home): +27-21-685-6472

Cell: +27-82-853-5352

Iridium SatPhone: +8816-315-19119

Email: calle.hedberg@gmail.com

Skype: calle_hedberg



Calle Hedberg

46D Alma Road, 7700 Rosebank, SOUTH AFRICA

Tel/fax (home): +27-21-685-6472

Cell: +27-82-853-5352

Iridium SatPhone: +8816-315-19119

Email: calle.hedberg@gmail.com

Skype: calle_hedberg


Calle,

We have around 500 million record in database with 3666 data elements having 26 category combinations and 201 indicators and there are 14398 organisation units.

Thanks,

Neeraj

···

On Mon, Oct 24, 2016 at 1:50 PM, Calle Hedberg calle.hedberg@gmail.com wrote:

Neeraj,

You never stated the number of records you have in the datavalue table - what is it?

In the same context: anybody have a rough idea of how many datavalue records there are in the global DATIM database - which I think currently might be the largest DHIS2 instance around?

Given our own recent work on performance + what Neeraj has reported, I’ve been thinking of creating one test instance with let us say 500 mill datavalue records and another with let us say 1 billion, then use them to identify key bottlenecks in various processes AND use them to ensure that DHIS2 analytics performance is as linear as possible in terms of database size. Postgresql has introduced a number of new indexing algorithms in recent versions, and I’m not sure if DHIS2 is taking full advantage of them.

Best regards

Calle

On 24 October 2016 at 07:53, Brajesh Murari brajesh.murari@gmail.com wrote:

Congratulation Neeraj and team …it much appreciated


Calle Hedberg

46D Alma Road, 7700 Rosebank, SOUTH AFRICA

Tel/fax (home): +27-21-685-6472

Cell: +27-82-853-5352

Iridium SatPhone: +8816-315-19119

Email: calle.hedberg@gmail.com

Skype: calle_hedberg


On Mon, Oct 24, 2016 at 11:08 AM, Neeraj Gupta neeraj.hisp@gmail.com wrote:

Dear Team, Thanks for all your suggestions.

Now the time of analytic is reduced to 10 hours 41 minutes.

We tried to VACUUM as Sam suggested but it didn’t help then we upgraded postgres from 9.4 to 9.5.4 and as Calle and Bob suggested we made some changes in configuration file of postgres and it reduced the time. But the database size is still same.

Thanks for all your help!

Thanks,

Neeraj


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

Best Regards,

Brajesh Murari,

Postgraduate, Dept of CSE,

Chaudhary Devi Lal University, Sirsa,

India.

The three basic dimensions of human development: a long and healthy life, access to knowledge, and a decent standard of living.

On Wed, Oct 19, 2016 at 6:46 PM, Calle Hedberg calle.hedberg@gmail.com wrote:

Neeraj,

It’s always an element of uncertainty linked to database sizes - ref Sam’s post over. So indicating the number of records you have in the datavalue table & key meta-data tables would be useful + indicating whether you are running other instances on the same server. Some comments - I’ve been doing a lot of similar optimising work recently:

  1. Upgrading to 9.5.4 is strongly recommended (and don’t use 9.6 before the worst bugs are fixed and it has stabilised).
  1. Carefully check your postgres.conf against the recommended settings. The guide is a bit superficial in the sense that it has recommended “fixed” values only and no explanations around ranges below or above those, but you can experiment a bit yourself (e.g. the recommended “max_connections = 200” might not be sufficient for a really large system like what you have.
  1. If your server is running that single instance only, then 48GB or RAM should be sufficient. Our servers are all having 128GB RAM so we experimented quite a bit earlier this year with giving a DHIS2 instance large amounts or RAM (up to 60-70gb), with negligible impact on performance. According to Lars, the DHIS2 cannot really utilize more than around 16gb RAM (at least that is how I understood his communication at the time). So 48GB should be sufficient for a single instance.
  1. I’ve been doing performance optimizing recently on an instance with
  • 4-core server with 2x 512gb ssd, 12gb allocated to DHIS2
  • 31,000 Orgunits
  • 420 data elements
  • 250 indicators
  • around 100 mill datavalue records
  • total size around 140gb with analytics tables.

So the size is only 25% of your 500GB, but RUNNING ANALYTICS ON THAT DATABASE INSTANCE IS TAKING JUST OVER 1 HOUR. Fundamentally, if the analytics engine is designed well, I would expect a nearly linear relationship between database size and the time analytics takes to run. So running analytics on your database on our server should in theory take 4-5 hours.

We are obviously comparing oranges and nectarines here, in the sense that there might be other aspects of our server and database that is different from yours (type of CPU, no of OUs, no of DEs/Indicators, whether your instance have lots of tracker data, etc etc). I have not seen any scientific/quantified comparative performance values related to specific parameters like number of CPUs and/or number of cores, but 12 cores SHOULD improve analytics performance quite a bit - assuming around 30% then it means running analytics on your database/server should take around 3 hours…

I tried getting comparative, quantitative data on various configurations of hardware and software (e.g. some users prefer CentOS, others Ubuntu) during the academy in August, but did not get much - it seems most users/providers have found a setup that works for them for now and nobody is doing any systematic performance testing (some of the international NGOs/companies using DHIS2 might have, but as with internally developed apps they are not that keen on sharing). So it would be highly appreciated if you would post the results on analytics time with every upgrade / tweak you do - starting with the upgrade to Pg 9.5.4

Best regards

Calle


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

Thanks,
Neeraj Gupta

On 19 October 2016 at 13:28, Sam Johnson samuel.johnson@qebo.co.uk wrote:

Hi Neeraj,

Using VACUUM and ANALYZE

Like Brajesh, my background is MySQL, and one database admin task that is often overlooked in MySQL is OPTIMIZE TABLEs. This reclaims unused space (we’ve had 100Gb databases files drop to half their size) and refreshes index statistics (if the shape of your data has changed over time, this can make indices run faster).

I’m new to PostgreSQL, but the core principles are the same, and a quick bit of Googling shows that the equivalents in PostgreSQL are the VACUUM and ANALYZE commands. If your database isn’t set to automatically do VACUUMs (the default DHIS2 postgres config doesn’t seem to be), you might want to try VACUUM FULL, which will literally rewrite all of your database tables and indices into smaller, more efficient files (note, however, that on a 500Gb database this could take a looong time – perhaps test on a backup first?). The following forum post is a really nice, plain-English explanation of what VACUUM does:

http://dba.stackexchange.com/questions/126258/what-is-table-bloating-in-databases

As I mentioned, my background is MySQL rather than Postgres, so someone with more specific Postgres experience might like to also chime in here.

Cheers, Sam.

From: Dhis2-users <dhis2-users-bounces+samuel.johnson=qebo.co.uk@lists.launchpad.net> on behalf of Brajesh Murari brajesh.murari@gmail.com

Date: Wednesday, 19 October 2016 at 08:28

To: Knut Staring knutst@gmail.com

Cc: DHIS 2 Users list dhis2-users@lists.launchpad.net, DHIS2 Developers dhis2-devs@lists.launchpad.net

Subject: Re: [Dhis2-users] [Dhis2-devs] 25 hours in completing Analytic

Dear Neeraj,

The physical database size doesn’t matter much, even the number of records don’t matter. In my experience the biggest problem that one can going to run in to is not size, but the number of queries you can handle at a time instance specially during analytic functionality execution. Most probably you should going to have to move to a master/slave configuration of your database, so that the read queries can run against the slaves and the write queries run against the master. However, if you and your database management team are not ready for this than, you can tweak your indexes for the queries you are running to speed up the response times. Also there is a lot of tweaking you can do to the network stack and kernel in Linux where MySQL Server has been installed that will help.Perhaps, I would focus first on your indexes, then have a server admin look at your OS, and if all that doesn’t help it might be time to implement a master/slave configuration. The most important scalability factor is RAM. If the indexes of your tables fit into memory and your queries are highly optimized in analytic functionality, you can serve a reasonable amount of requests with a average machine. The number of records do matter, depending of how your tables look like. It’s a difference to have a lot of varchar fields or only a couple of ints or longs. The physical size of the database matters as well, think of backups, for instance. Depending on your engine, your physical db files on grow, but don’t shrink, for instance with innodb. So deleting a lot of rows, doesn’t help to shrink your physical files. Thus the database size does matter. If you have more than one table with more than a million records, then performance starts indeed to degrade. Indexig is one of the important stand need to take care, If you hit one million records you will get performance problems, if the indices are not set right (for example no indices for fields in “WHERE statements” or “ON conditions” in joins). If you hit 10 million records, you will start to get performance problems even if you have all your indices right. Hardware upgrades - adding more memory and more processor power, especially memory - often help to reduce the most severe problems by increasing the performance again, at least to a certain degree.

On Wed, Oct 19, 2016 at 12:35 PM, Knut Staring knutst@gmail.com wrote:

Just a heads-up that there seems to be a JDBC issue with Postgres 9.6, so perhaps you should try upgrading to 9.5 first.

On Wed, Oct 19, 2016 at 8:58 AM, Lars Helge Øverland lars@dhis2.org wrote:

Hi Neeraj,

what usually helps to improve runtime is to improve/increase:

  • ssd (read and write speed)
  • number of CPUs
  • using latest postgresql (9.6 claims to have even better indexing
    performance
    than 9.5)

regards,

Lars

Lars Helge Øverland

Lead developer, DHIS 2

University of Oslo

Skype: larshelgeoverland

lars@dhis2.org

http://www.dhis2.org


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

Knut Staring

Dept. of Informatics, University of Oslo

Norway: +4791880522

Skype: knutstar

http://dhis2.org


Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Best Regards,

Brajesh Murari,

Postgraduate, Department of Computer Science and Engineering,

Chaudhary Devi Lal University, Sirsa,

India.

The three basic dimensions of human development: a long and healthy life, access to knowledge, and a decent standard of living.


Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp


Calle Hedberg

46D Alma Road, 7700 Rosebank, SOUTH AFRICA

Tel/fax (home): +27-21-685-6472

Cell: +27-82-853-5352

Iridium SatPhone: +8816-315-19119

Email: calle.hedberg@gmail.com

Skype: calle_hedberg


Thanks,
Neeraj Gupta

For the benefit of others on this thread, this is the tweaking I shared with Neeraj -

He had 3200MB shared_buffers set (as per the online manual). I suggested with his 48G of RAM he tries:

shared_buffers = 12GB

effective_cache_size = 20GB

(remember the manual is based on an exampe machine with 12G)

I predicted that change would lead to the biggest speedup.

Also I suggested manually setting the jvm max heap size to 16G rather than depending on java 8 to pick a good default. Or at least verify that the default was ok with

java -XX:+PrintFlagsFinal -version | grep MaxHeapSize

These are still instinctive thumbsuck figures from me. Neeraj you can still try some further adjustment (try upping postgres shared_buffers to 20G for example) but I think you are only going to get marginal improvements from here. Maybe getting from 11 hours to 8. You are still faced with the hard limit of the underlying disk speed and also the database size and structure. I’d be curious to know whether people have experience in slicing up the database to run smaller analytics jobs in parallel - for example by time or by space. And the effect of allocating different numbers of worker threads depending on available cpu cores.

What really needs to be done is to run some sort of measurements while the analytics is running to understand where the main bottlenecks lie. To answer the question of which processes are starved and are they starved for RAM, CPU or waiting on disk.

···

On 24 October 2016 at 09:39, Neeraj Gupta neeraj.hisp@gmail.com wrote:

Calle,

We have around 500 million record in database with 3666 data elements having 26 category combinations and 201 indicators and there are 14398 organisation units.

Thanks,

Neeraj


Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

On Mon, Oct 24, 2016 at 1:50 PM, Calle Hedberg calle.hedberg@gmail.com wrote:

Neeraj,

You never stated the number of records you have in the datavalue table - what is it?

In the same context: anybody have a rough idea of how many datavalue records there are in the global DATIM database - which I think currently might be the largest DHIS2 instance around?

Given our own recent work on performance + what Neeraj has reported, I’ve been thinking of creating one test instance with let us say 500 mill datavalue records and another with let us say 1 billion, then use them to identify key bottlenecks in various processes AND use them to ensure that DHIS2 analytics performance is as linear as possible in terms of database size. Postgresql has introduced a number of new indexing algorithms in recent versions, and I’m not sure if DHIS2 is taking full advantage of them.

Best regards

Calle

Thanks,
Neeraj Gupta

On 24 October 2016 at 07:53, Brajesh Murari brajesh.murari@gmail.com wrote:

Congratulation Neeraj and team …it much appreciated


Calle Hedberg

46D Alma Road, 7700 Rosebank, SOUTH AFRICA

Tel/fax (home): +27-21-685-6472

Cell: +27-82-853-5352

Iridium SatPhone: +8816-315-19119

Email: calle.hedberg@gmail.com

Skype: calle_hedberg


On Mon, Oct 24, 2016 at 11:08 AM, Neeraj Gupta neeraj.hisp@gmail.com wrote:

Dear Team, Thanks for all your suggestions.

Now the time of analytic is reduced to 10 hours 41 minutes.

We tried to VACUUM as Sam suggested but it didn’t help then we upgraded postgres from 9.4 to 9.5.4 and as Calle and Bob suggested we made some changes in configuration file of postgres and it reduced the time. But the database size is still same.

Thanks for all your help!

Thanks,

Neeraj


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

Best Regards,

Brajesh Murari,

Postgraduate, Dept of CSE,

Chaudhary Devi Lal University, Sirsa,

India.

The three basic dimensions of human development: a long and healthy life, access to knowledge, and a decent standard of living.

On Wed, Oct 19, 2016 at 6:46 PM, Calle Hedberg calle.hedberg@gmail.com wrote:

Neeraj,

It’s always an element of uncertainty linked to database sizes - ref Sam’s post over. So indicating the number of records you have in the datavalue table & key meta-data tables would be useful + indicating whether you are running other instances on the same server. Some comments - I’ve been doing a lot of similar optimising work recently:

  1. Upgrading to 9.5.4 is strongly recommended (and don’t use 9.6 before the worst bugs are fixed and it has stabilised).
  1. Carefully check your postgres.conf against the recommended settings. The guide is a bit superficial in the sense that it has recommended “fixed” values only and no explanations around ranges below or above those, but you can experiment a bit yourself (e.g. the recommended “max_connections = 200” might not be sufficient for a really large system like what you have.
  1. If your server is running that single instance only, then 48GB or RAM should be sufficient. Our servers are all having 128GB RAM so we experimented quite a bit earlier this year with giving a DHIS2 instance large amounts or RAM (up to 60-70gb), with negligible impact on performance. According to Lars, the DHIS2 cannot really utilize more than around 16gb RAM (at least that is how I understood his communication at the time). So 48GB should be sufficient for a single instance.
  1. I’ve been doing performance optimizing recently on an instance with
  • 4-core server with 2x 512gb ssd, 12gb allocated to DHIS2
  • 31,000 Orgunits
  • 420 data elements
  • 250 indicators
  • around 100 mill datavalue records
  • total size around 140gb with analytics tables.

So the size is only 25% of your 500GB, but RUNNING ANALYTICS ON THAT DATABASE INSTANCE IS TAKING JUST OVER 1 HOUR. Fundamentally, if the analytics engine is designed well, I would expect a nearly linear relationship between database size and the time analytics takes to run. So running analytics on your database on our server should in theory take 4-5 hours.

We are obviously comparing oranges and nectarines here, in the sense that there might be other aspects of our server and database that is different from yours (type of CPU, no of OUs, no of DEs/Indicators, whether your instance have lots of tracker data, etc etc). I have not seen any scientific/quantified comparative performance values related to specific parameters like number of CPUs and/or number of cores, but 12 cores SHOULD improve analytics performance quite a bit - assuming around 30% then it means running analytics on your database/server should take around 3 hours…

I tried getting comparative, quantitative data on various configurations of hardware and software (e.g. some users prefer CentOS, others Ubuntu) during the academy in August, but did not get much - it seems most users/providers have found a setup that works for them for now and nobody is doing any systematic performance testing (some of the international NGOs/companies using DHIS2 might have, but as with internally developed apps they are not that keen on sharing). So it would be highly appreciated if you would post the results on analytics time with every upgrade / tweak you do - starting with the upgrade to Pg 9.5.4

Best regards

Calle


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

Thanks,
Neeraj Gupta

On 19 October 2016 at 13:28, Sam Johnson samuel.johnson@qebo.co.uk wrote:

Hi Neeraj,

Using VACUUM and ANALYZE

Like Brajesh, my background is MySQL, and one database admin task that is often overlooked in MySQL is OPTIMIZE TABLEs. This reclaims unused space (we’ve had 100Gb databases files drop to half their size) and refreshes index statistics (if the shape of your data has changed over time, this can make indices run faster).

I’m new to PostgreSQL, but the core principles are the same, and a quick bit of Googling shows that the equivalents in PostgreSQL are the VACUUM and ANALYZE commands. If your database isn’t set to automatically do VACUUMs (the default DHIS2 postgres config doesn’t seem to be), you might want to try VACUUM FULL, which will literally rewrite all of your database tables and indices into smaller, more efficient files (note, however, that on a 500Gb database this could take a looong time – perhaps test on a backup first?). The following forum post is a really nice, plain-English explanation of what VACUUM does:

http://dba.stackexchange.com/questions/126258/what-is-table-bloating-in-databases

As I mentioned, my background is MySQL rather than Postgres, so someone with more specific Postgres experience might like to also chime in here.

Cheers, Sam.

From: Dhis2-users <dhis2-users-bounces+samuel.johnson=qebo.co.uk@lists.launchpad.net> on behalf of Brajesh Murari brajesh.murari@gmail.com

Date: Wednesday, 19 October 2016 at 08:28

To: Knut Staring knutst@gmail.com

Cc: DHIS 2 Users list dhis2-users@lists.launchpad.net, DHIS2 Developers dhis2-devs@lists.launchpad.net

Subject: Re: [Dhis2-users] [Dhis2-devs] 25 hours in completing Analytic

Dear Neeraj,

The physical database size doesn’t matter much, even the number of records don’t matter. In my experience the biggest problem that one can going to run in to is not size, but the number of queries you can handle at a time instance specially during analytic functionality execution. Most probably you should going to have to move to a master/slave configuration of your database, so that the read queries can run against the slaves and the write queries run against the master. However, if you and your database management team are not ready for this than, you can tweak your indexes for the queries you are running to speed up the response times. Also there is a lot of tweaking you can do to the network stack and kernel in Linux where MySQL Server has been installed that will help.Perhaps, I would focus first on your indexes, then have a server admin look at your OS, and if all that doesn’t help it might be time to implement a master/slave configuration. The most important scalability factor is RAM. If the indexes of your tables fit into memory and your queries are highly optimized in analytic functionality, you can serve a reasonable amount of requests with a average machine. The number of records do matter, depending of how your tables look like. It’s a difference to have a lot of varchar fields or only a couple of ints or longs. The physical size of the database matters as well, think of backups, for instance. Depending on your engine, your physical db files on grow, but don’t shrink, for instance with innodb. So deleting a lot of rows, doesn’t help to shrink your physical files. Thus the database size does matter. If you have more than one table with more than a million records, then performance starts indeed to degrade. Indexig is one of the important stand need to take care, If you hit one million records you will get performance problems, if the indices are not set right (for example no indices for fields in “WHERE statements” or “ON conditions” in joins). If you hit 10 million records, you will start to get performance problems even if you have all your indices right. Hardware upgrades - adding more memory and more processor power, especially memory - often help to reduce the most severe problems by increasing the performance again, at least to a certain degree.

On Wed, Oct 19, 2016 at 12:35 PM, Knut Staring knutst@gmail.com wrote:

Just a heads-up that there seems to be a JDBC issue with Postgres 9.6, so perhaps you should try upgrading to 9.5 first.

On Wed, Oct 19, 2016 at 8:58 AM, Lars Helge Øverland lars@dhis2.org wrote:

Hi Neeraj,

what usually helps to improve runtime is to improve/increase:

  • ssd (read and write speed)
  • number of CPUs
  • using latest postgresql (9.6 claims to have even better indexing
    performance
    than 9.5)

regards,

Lars

Lars Helge Øverland

Lead developer, DHIS 2

University of Oslo

Skype: larshelgeoverland

lars@dhis2.org

http://www.dhis2.org


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

Knut Staring

Dept. of Informatics, University of Oslo

Norway: +4791880522

Skype: knutstar

http://dhis2.org


Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Best Regards,

Brajesh Murari,

Postgraduate, Department of Computer Science and Engineering,

Chaudhary Devi Lal University, Sirsa,

India.

The three basic dimensions of human development: a long and healthy life, access to knowledge, and a decent standard of living.


Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp


Calle Hedberg

46D Alma Road, 7700 Rosebank, SOUTH AFRICA

Tel/fax (home): +27-21-685-6472

Cell: +27-82-853-5352

Iridium SatPhone: +8816-315-19119

Email: calle.hedberg@gmail.com

Skype: calle_hedberg


It would be very interesting to know of other similarly large installations. DATIM, Bangladesh, PSI perhaps - others?

···

On Mon, Oct 24, 2016 at 10:39 AM, Neeraj Gupta neeraj.hisp@gmail.com wrote:

Calle,

We have around 500 million record in database with 3666 data elements having 26 category combinations and 201 indicators and there are 14398 organisation units.

Thanks,

Neeraj


Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

On Mon, Oct 24, 2016 at 1:50 PM, Calle Hedberg calle.hedberg@gmail.com wrote:

Neeraj,

You never stated the number of records you have in the datavalue table - what is it?

In the same context: anybody have a rough idea of how many datavalue records there are in the global DATIM database - which I think currently might be the largest DHIS2 instance around?

Given our own recent work on performance + what Neeraj has reported, I’ve been thinking of creating one test instance with let us say 500 mill datavalue records and another with let us say 1 billion, then use them to identify key bottlenecks in various processes AND use them to ensure that DHIS2 analytics performance is as linear as possible in terms of database size. Postgresql has introduced a number of new indexing algorithms in recent versions, and I’m not sure if DHIS2 is taking full advantage of them.

Best regards

Calle

Thanks,
Neeraj Gupta

On 24 October 2016 at 07:53, Brajesh Murari brajesh.murari@gmail.com wrote:

Congratulation Neeraj and team …it much appreciated


Calle Hedberg

46D Alma Road, 7700 Rosebank, SOUTH AFRICA

Tel/fax (home): +27-21-685-6472

Cell: +27-82-853-5352

Iridium SatPhone: +8816-315-19119

Email: calle.hedberg@gmail.com

Skype: calle_hedberg


On Mon, Oct 24, 2016 at 11:08 AM, Neeraj Gupta neeraj.hisp@gmail.com wrote:

Dear Team, Thanks for all your suggestions.

Now the time of analytic is reduced to 10 hours 41 minutes.

We tried to VACUUM as Sam suggested but it didn’t help then we upgraded postgres from 9.4 to 9.5.4 and as Calle and Bob suggested we made some changes in configuration file of postgres and it reduced the time. But the database size is still same.

Thanks for all your help!

Thanks,

Neeraj


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

Best Regards,

Brajesh Murari,

Postgraduate, Dept of CSE,

Chaudhary Devi Lal University, Sirsa,

India.

The three basic dimensions of human development: a long and healthy life, access to knowledge, and a decent standard of living.

On Wed, Oct 19, 2016 at 6:46 PM, Calle Hedberg calle.hedberg@gmail.com wrote:

Neeraj,

It’s always an element of uncertainty linked to database sizes - ref Sam’s post over. So indicating the number of records you have in the datavalue table & key meta-data tables would be useful + indicating whether you are running other instances on the same server. Some comments - I’ve been doing a lot of similar optimising work recently:

  1. Upgrading to 9.5.4 is strongly recommended (and don’t use 9.6 before the worst bugs are fixed and it has stabilised).
  1. Carefully check your postgres.conf against the recommended settings. The guide is a bit superficial in the sense that it has recommended “fixed” values only and no explanations around ranges below or above those, but you can experiment a bit yourself (e.g. the recommended “max_connections = 200” might not be sufficient for a really large system like what you have.
  1. If your server is running that single instance only, then 48GB or RAM should be sufficient. Our servers are all having 128GB RAM so we experimented quite a bit earlier this year with giving a DHIS2 instance large amounts or RAM (up to 60-70gb), with negligible impact on performance. According to Lars, the DHIS2 cannot really utilize more than around 16gb RAM (at least that is how I understood his communication at the time). So 48GB should be sufficient for a single instance.
  1. I’ve been doing performance optimizing recently on an instance with
  • 4-core server with 2x 512gb ssd, 12gb allocated to DHIS2
  • 31,000 Orgunits
  • 420 data elements
  • 250 indicators
  • around 100 mill datavalue records
  • total size around 140gb with analytics tables.

So the size is only 25% of your 500GB, but RUNNING ANALYTICS ON THAT DATABASE INSTANCE IS TAKING JUST OVER 1 HOUR. Fundamentally, if the analytics engine is designed well, I would expect a nearly linear relationship between database size and the time analytics takes to run. So running analytics on your database on our server should in theory take 4-5 hours.

We are obviously comparing oranges and nectarines here, in the sense that there might be other aspects of our server and database that is different from yours (type of CPU, no of OUs, no of DEs/Indicators, whether your instance have lots of tracker data, etc etc). I have not seen any scientific/quantified comparative performance values related to specific parameters like number of CPUs and/or number of cores, but 12 cores SHOULD improve analytics performance quite a bit - assuming around 30% then it means running analytics on your database/server should take around 3 hours…

I tried getting comparative, quantitative data on various configurations of hardware and software (e.g. some users prefer CentOS, others Ubuntu) during the academy in August, but did not get much - it seems most users/providers have found a setup that works for them for now and nobody is doing any systematic performance testing (some of the international NGOs/companies using DHIS2 might have, but as with internally developed apps they are not that keen on sharing). So it would be highly appreciated if you would post the results on analytics time with every upgrade / tweak you do - starting with the upgrade to Pg 9.5.4

Best regards

Calle


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

Thanks,
Neeraj Gupta

On 19 October 2016 at 13:28, Sam Johnson samuel.johnson@qebo.co.uk wrote:

Hi Neeraj,

Using VACUUM and ANALYZE

Like Brajesh, my background is MySQL, and one database admin task that is often overlooked in MySQL is OPTIMIZE TABLEs. This reclaims unused space (we’ve had 100Gb databases files drop to half their size) and refreshes index statistics (if the shape of your data has changed over time, this can make indices run faster).

I’m new to PostgreSQL, but the core principles are the same, and a quick bit of Googling shows that the equivalents in PostgreSQL are the VACUUM and ANALYZE commands. If your database isn’t set to automatically do VACUUMs (the default DHIS2 postgres config doesn’t seem to be), you might want to try VACUUM FULL, which will literally rewrite all of your database tables and indices into smaller, more efficient files (note, however, that on a 500Gb database this could take a looong time – perhaps test on a backup first?). The following forum post is a really nice, plain-English explanation of what VACUUM does:

http://dba.stackexchange.com/questions/126258/what-is-table-bloating-in-databases

As I mentioned, my background is MySQL rather than Postgres, so someone with more specific Postgres experience might like to also chime in here.

Cheers, Sam.

From: Dhis2-users <dhis2-users-bounces+samuel.johnson=qebo.co.uk@lists.launchpad.net> on behalf of Brajesh Murari brajesh.murari@gmail.com

Date: Wednesday, 19 October 2016 at 08:28

To: Knut Staring knutst@gmail.com

Cc: DHIS 2 Users list dhis2-users@lists.launchpad.net, DHIS2 Developers dhis2-devs@lists.launchpad.net

Subject: Re: [Dhis2-users] [Dhis2-devs] 25 hours in completing Analytic

Dear Neeraj,

The physical database size doesn’t matter much, even the number of records don’t matter. In my experience the biggest problem that one can going to run in to is not size, but the number of queries you can handle at a time instance specially during analytic functionality execution. Most probably you should going to have to move to a master/slave configuration of your database, so that the read queries can run against the slaves and the write queries run against the master. However, if you and your database management team are not ready for this than, you can tweak your indexes for the queries you are running to speed up the response times. Also there is a lot of tweaking you can do to the network stack and kernel in Linux where MySQL Server has been installed that will help.Perhaps, I would focus first on your indexes, then have a server admin look at your OS, and if all that doesn’t help it might be time to implement a master/slave configuration. The most important scalability factor is RAM. If the indexes of your tables fit into memory and your queries are highly optimized in analytic functionality, you can serve a reasonable amount of requests with a average machine. The number of records do matter, depending of how your tables look like. It’s a difference to have a lot of varchar fields or only a couple of ints or longs. The physical size of the database matters as well, think of backups, for instance. Depending on your engine, your physical db files on grow, but don’t shrink, for instance with innodb. So deleting a lot of rows, doesn’t help to shrink your physical files. Thus the database size does matter. If you have more than one table with more than a million records, then performance starts indeed to degrade. Indexig is one of the important stand need to take care, If you hit one million records you will get performance problems, if the indices are not set right (for example no indices for fields in “WHERE statements” or “ON conditions” in joins). If you hit 10 million records, you will start to get performance problems even if you have all your indices right. Hardware upgrades - adding more memory and more processor power, especially memory - often help to reduce the most severe problems by increasing the performance again, at least to a certain degree.

On Wed, Oct 19, 2016 at 12:35 PM, Knut Staring knutst@gmail.com wrote:

Just a heads-up that there seems to be a JDBC issue with Postgres 9.6, so perhaps you should try upgrading to 9.5 first.

On Wed, Oct 19, 2016 at 8:58 AM, Lars Helge Øverland lars@dhis2.org wrote:

Hi Neeraj,

what usually helps to improve runtime is to improve/increase:

  • ssd (read and write speed)
  • number of CPUs
  • using latest postgresql (9.6 claims to have even better indexing
    performance
    than 9.5)

regards,

Lars

Lars Helge Øverland

Lead developer, DHIS 2

University of Oslo

Skype: larshelgeoverland

lars@dhis2.org

http://www.dhis2.org


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

Knut Staring

Dept. of Informatics, University of Oslo

Norway: +4791880522

Skype: knutstar

http://dhis2.org


Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Best Regards,

Brajesh Murari,

Postgraduate, Department of Computer Science and Engineering,

Chaudhary Devi Lal University, Sirsa,

India.

The three basic dimensions of human development: a long and healthy life, access to knowledge, and a decent standard of living.


Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp


Calle Hedberg

46D Alma Road, 7700 Rosebank, SOUTH AFRICA

Tel/fax (home): +27-21-685-6472

Cell: +27-82-853-5352

Iridium SatPhone: +8816-315-19119

Email: calle.hedberg@gmail.com

Skype: calle_hedberg


Knut Staring

Dept. of Informatics, University of Oslo

Norway: +4791880522

Skype: knutstar

http://dhis2.org