Migrating from MySQL to PostgreSQL

Hello DHIS users and devs,

Apologies for the long email… Should have probably been multiple blogpost to reduce its length…

I recently encountered a situation with a very large implementation of DHIS2 having problems generating data mart.

Thus no reports were generated and only data entry was being done. I thought I’d share some of the experiences to solve these issues, so that it might be useful to other implementers.

Some changes will be needed in the DHIS2 source, so sending this to the dev list also, where dev-related discussions can follow-up.

While PostgreSQL is our recommended database, many implementations have also used MySQL.

My findings clearly highlight that DHIS2 performs much better on PostgreSQL and there are also some bugs related to MySQL dialect.
Total org units - 26303
Total Monthly datasets - 9
Total Daily dataset - 1
Total Yearly dataset - 3

The implementation has about 34 million datavalues (non-zero)… but I pruned it for my benchmarking. I added 1-million datavalues and ran the data mart.

The results are from my fairly good laptop (quad-core i7; 8GB RAM; tuned JVM; tuned MySQL 5.5 (4GB RAM); tuned PostgreSQL 9.2 (4GB RAM); 240GB SSD)
Using DHIS2 2.11. When doing MySQL benchmark turned off all services including postgres and vice versa.

Java Opts: -Xmx3G -Xms768m -XX:MaxPermSize=512m
Java version: 1.7.0_21 x64
Java vendor: Oracle Corporation
OS name: WindowsMySQL = datamart completed in 3hrs 46min 12sec
PostgreSQL = datamart completed in 2hrs 5min 16sec

So, it is obvious that PostgreSQL is doing datamart much faster. The advantages might scale better if larger number of datavalues

One could argue MySQL 5.6 has many performance improvements, I didn’t have time to explore that.

The migration to PostgreSQL has some challenges. Following are steps I followed:

  • Take the mysqldump
  • replace bit(1) to tinyint(1) in the SQL file
  • You’ll see that column names are camelCase. This is an issue because postgres will added a double quotes around to get case-sensitivity, which MySQL by default nicely excludes.

So you’ll have to make all column names to lowercase and remove the quote characters. I did this with a simple java program. There are 150-odd column names that need changes.

  • Used Navcat premium (trial version or SQLSquirrel also has this feature). “Data transfer” is the name of the feature that will move data from MySQL to Postgres

  • In MySQL non-standard use of boolean (which came only a few yrs back), its converted to smallint in Postgres. I wrote a JDBC program to change column type from smallint to boolean. A single table example is as follows that can be made into a looping procedure as well in pure PSQL.

ALTER TABLE indicator ALTER COLUMN annualized TYPE boolean
USING CASE WHEN annualized = 0 THEN FALSE
WHEN annualized = 1 THEN TRUE
ELSE NULL
END;

  • remove NULL values from minimumvalue column of minmaxdatalement table

DELETE from minmaxdatalement WHERE minimumvalue=NULL

  • remove NULL values from maximumvalue column of minmaxdatalement table
    DELETE from minmaxdatalement WHERE maximumvalue=NULL

  • remove NULL values from name column of relationshiptype table
    DELETE from relationshiptype WHERE name=NULL

  • blobs to bytea conversion is a mess and I had to truncate. Probably a JDBC based connector program will do better conversion, but I just truncated it and accepted the data loss to systemsetting and usersetting :slight_smile:

Devs:
We need to make all column names lowercase in hbm.xml files in code. This will ensure portability and is generally a good practice.
We should also have a convention of using last_updated instead of lastUpdated in column names, as is the common practice.

···

Regards,

Saptarshi PURKAYASTHA

My Tech Blog: http://sunnytalkstech.blogspot.com
You Live by CHOICE, Not by CHANCE

Wow! Weldone!

I do not have a huge database as yours. And yes I have migrated from MySQL to postgreSQL because I felt that the DHIS community does not support enough MySQL. Anyway getting back to your journey I must ask what means tunned MySQL or tunned PostgreSQL? What you tune might dictate the results.

Are the MySQL/PostgreSQL run in Windows? The multiple choices of filesystem and more possibilities to tune in Linux can give another picture to this. You have just migrated everything you will need to do it again but it would be nice to test the same in Linux too.

Anyway, thank you! This is immensely great.

Caveman

···

On Tue, May 7, 2013 at 2:53 PM, Saptarshi Purkayastha sunbiz@gmail.com wrote:

Hello DHIS users and devs,

Apologies for the long email… Should have probably been multiple blogpost to reduce its length…

I recently encountered a situation with a very large implementation of DHIS2 having problems generating data mart.

Thus no reports were generated and only data entry was being done. I thought I’d share some of the experiences to solve these issues, so that it might be useful to other implementers.

Some changes will be needed in the DHIS2 source, so sending this to the dev list also, where dev-related discussions can follow-up.

While PostgreSQL is our recommended database, many implementations have also used MySQL.

My findings clearly highlight that DHIS2 performs much better on PostgreSQL and there are also some bugs related to MySQL dialect.
Total org units - 26303
Total Monthly datasets - 9
Total Daily dataset - 1
Total Yearly dataset - 3

The implementation has about 34 million datavalues (non-zero)… but I pruned it for my benchmarking. I added 1-million datavalues and ran the data mart.

The results are from my fairly good laptop (quad-core i7; 8GB RAM; tuned JVM; tuned MySQL 5.5 (4GB RAM); tuned PostgreSQL 9.2 (4GB RAM); 240GB SSD)
Using DHIS2 2.11. When doing MySQL benchmark turned off all services including postgres and vice versa.
Java Opts: -Xmx3G -Xms768m -XX:MaxPermSize=512m
Java version: 1.7.0_21 x64
Java vendor: Oracle Corporation
OS name: WindowsMySQL = datamart completed in 3hrs 46min 12sec
PostgreSQL = datamart completed in 2hrs 5min 16sec

So, it is obvious that PostgreSQL is doing datamart much faster. The advantages might scale better if larger number of datavalues

One could argue MySQL 5.6 has many performance improvements, I didn’t have time to explore that.

The migration to PostgreSQL has some challenges. Following are steps I followed:

  • Take the mysqldump
  • replace bit(1) to tinyint(1) in the SQL file
  • You’ll see that column names are camelCase. This is an issue because postgres will added a double quotes around to get case-sensitivity, which MySQL by default nicely excludes.

So you’ll have to make all column names to lowercase and remove the quote characters. I did this with a simple java program. There are 150-odd column names that need changes.

  • Used Navcat premium (trial version or SQLSquirrel also has this feature). “Data transfer” is the name of the feature that will move data from MySQL to Postgres

  • In MySQL non-standard use of boolean (which came only a few yrs back), its converted to smallint in Postgres. I wrote a JDBC program to change column type from smallint to boolean. A single table example is as follows that can be made into a looping procedure as well in pure PSQL.

ALTER TABLE indicator ALTER COLUMN annualized TYPE boolean
USING CASE WHEN annualized = 0 THEN FALSE
WHEN annualized = 1 THEN TRUE
ELSE NULL
END;

  • remove NULL values from minimumvalue column of minmaxdatalement table

DELETE from minmaxdatalement WHERE minimumvalue=NULL

  • remove NULL values from maximumvalue column of minmaxdatalement table
    DELETE from minmaxdatalement WHERE maximumvalue=NULL

  • remove NULL values from name column of relationshiptype table
    DELETE from relationshiptype WHERE name=NULL

  • blobs to bytea conversion is a mess and I had to truncate. Probably a JDBC based connector program will do better conversion, but I just truncated it and accepted the data loss to systemsetting and usersetting :slight_smile:

Devs:
We need to make all column names lowercase in hbm.xml files in code. This will ensure portability and is generally a good practice.
We should also have a convention of using last_updated instead of lastUpdated in column names, as is the common practice.


Regards,

Saptarshi PURKAYASTHA

My Tech Blog: http://sunnytalkstech.blogspot.com
You Live by CHOICE, Not by CHANCE


Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

PS: DO NOT COPY THESE AS IS IN YOUR PRODUCTION SYSTEM. THESE ARE SPECIFIC TO MY DEVELOPER INSTALLATION

Yes, these are on Windows and the results may or may not be different on Linux.
The DHIS2 behavior would remain the same in its interaction with the database, since we are using the Type4 JDBC drivers

The tuning is very specific and I’ve tried these on a few applications and reached these numbers for my MySQL and PostgreSQL

For MySQL:
There are many tuning options in my.cnf that I’ve done from here - https://tools.percona.com/wizard

The following are the useful ones to look from my setup

CACHES AND LIMITS

tmp_table_size = 32M
max_heap_table_size = 32M
query_cache_type = 0
query_cache_size = 0

max_connections = 500
thread_cache_size = 50
open_files_limit = 65535
table_definition_cache = 4096
table_open_cache = 4096

INNODB

innodb_log_files_in_group = 2
innodb_log_file_size = 256M
innodb_flush_log_at_trx_commit = 1
innodb_file_per_table = 1
innodb_buffer_pool_size = 4G

For PostgreSQL:

Used pgtune for this… - http://pgfoundry.org/projects/pgtune
But also some other changes as well… the important ones are below:

shared_buffers = 2048MB

effective_cache_size = 3096MB
default_statistics_target = 800
synchronous_commit = local
wal_writer_delay = 600
full_page_writes = false
autovacuum = true
constraint_exclusion = partition

···

On Tue, May 7, 2013 at 2:53 PM, Saptarshi Purkayastha sunbiz@gmail.com wrote:

Hello DHIS users and devs,

Apologies for the long email… Should have probably been multiple blogpost to reduce its length…

I recently encountered a situation with a very large implementation of DHIS2 having problems generating data mart.

Thus no reports were generated and only data entry was being done. I thought I’d share some of the experiences to solve these issues, so that it might be useful to other implementers.

Some changes will be needed in the DHIS2 source, so sending this to the dev list also, where dev-related discussions can follow-up.

While PostgreSQL is our recommended database, many implementations have also used MySQL.

My findings clearly highlight that DHIS2 performs much better on PostgreSQL and there are also some bugs related to MySQL dialect.
Total org units - 26303
Total Monthly datasets - 9
Total Daily dataset - 1
Total Yearly dataset - 3

The implementation has about 34 million datavalues (non-zero)… but I pruned it for my benchmarking. I added 1-million datavalues and ran the data mart.

The results are from my fairly good laptop (quad-core i7; 8GB RAM; tuned JVM; tuned MySQL 5.5 (4GB RAM); tuned PostgreSQL 9.2 (4GB RAM); 240GB SSD)
Using DHIS2 2.11. When doing MySQL benchmark turned off all services including postgres and vice versa.
Java Opts: -Xmx3G -Xms768m -XX:MaxPermSize=512m
Java version: 1.7.0_21 x64
Java vendor: Oracle Corporation
OS name: WindowsMySQL = datamart completed in 3hrs 46min 12sec
PostgreSQL = datamart completed in 2hrs 5min 16sec

So, it is obvious that PostgreSQL is doing datamart much faster. The advantages might scale better if larger number of datavalues

One could argue MySQL 5.6 has many performance improvements, I didn’t have time to explore that.

The migration to PostgreSQL has some challenges. Following are steps I followed:

  • Take the mysqldump
  • replace bit(1) to tinyint(1) in the SQL file
  • You’ll see that column names are camelCase. This is an issue because postgres will added a double quotes around to get case-sensitivity, which MySQL by default nicely excludes.

So you’ll have to make all column names to lowercase and remove the quote characters. I did this with a simple java program. There are 150-odd column names that need changes.

  • Used Navcat premium (trial version or SQLSquirrel also has this feature). “Data transfer” is the name of the feature that will move data from MySQL to Postgres

  • In MySQL non-standard use of boolean (which came only a few yrs back), its converted to smallint in Postgres. I wrote a JDBC program to change column type from smallint to boolean. A single table example is as follows that can be made into a looping procedure as well in pure PSQL.

ALTER TABLE indicator ALTER COLUMN annualized TYPE boolean
USING CASE WHEN annualized = 0 THEN FALSE
WHEN annualized = 1 THEN TRUE
ELSE NULL
END;

  • remove NULL values from minimumvalue column of minmaxdatalement table

DELETE from minmaxdatalement WHERE minimumvalue=NULL

  • remove NULL values from maximumvalue column of minmaxdatalement table
    DELETE from minmaxdatalement WHERE maximumvalue=NULL

  • remove NULL values from name column of relationshiptype table
    DELETE from relationshiptype WHERE name=NULL

  • blobs to bytea conversion is a mess and I had to truncate. Probably a JDBC based connector program will do better conversion, but I just truncated it and accepted the data loss to systemsetting and usersetting :slight_smile:

Devs:
We need to make all column names lowercase in hbm.xml files in code. This will ensure portability and is generally a good practice.
We should also have a convention of using last_updated instead of lastUpdated in column names, as is the common practice.


Regards,

Saptarshi PURKAYASTHA

My Tech Blog: http://sunnytalkstech.blogspot.com
You Live by CHOICE, Not by CHANCE


Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Thanks Saptarshi

I am among of many who runs large DHIS2 database on MySQL. I always think to move back to PostgreSQL. This communication is of great enlightening to me.

Congratulations

Lungo

···

From: Saptarshi Purkayastha sunbiz@gmail.com
To: DHIS 2 developers dhis2-devs@lists.launchpad.net; dhis2-users@lists.launchpad.net
Sent:
Tuesday, May 7, 2013 3:53 PM
Subject: [Dhis2-devs] Migrating from MySQL to PostgreSQL

Hello DHIS users and devs,

Apologies for the long email… Should have probably been multiple blogpost to reduce its length…

I recently encountered a situation with a very large implementation of DHIS2 having problems generating data mart.

Thus no reports were generated and only data entry was being done. I thought I’d share some of the experiences to solve these issues, so that it might be useful to other implementers.

Some changes will be needed in the DHIS2 source, so sending this to the dev list also, where dev-related discussions can follow-up.

While PostgreSQL is our recommended database, many implementations have also used MySQL.

My findings clearly highlight that DHIS2 performs much better on PostgreSQL and there are also some bugs related to MySQL dialect.
Total org units - 26303
Total Monthly datasets - 9
Total Daily dataset - 1
Total Yearly dataset - 3

The implementation has about 34 million datavalues (non-zero)… but I pruned it for my benchmarking. I added 1-million datavalues and ran the data mart.

The results are from my fairly good laptop (quad-core i7; 8GB RAM; tuned JVM; tuned MySQL 5.5 (4GB RAM); tuned PostgreSQL 9.2 (4GB RAM); 240GB SSD)
Using DHIS2 2.11. When doing MySQL benchmark turned off all services including postgres and vice versa.
Java Opts: -Xmx3G -Xms768m -XX:MaxPermSize=512m
Java version: 1.7.0_21 x64
Java vendor: Oracle Corporation
OS name: WindowsMySQL = datamart completed in 3hrs 46min 12sec
PostgreSQL = datamart completed in 2hrs 5min 16sec

So, it is obvious that PostgreSQL is doing datamart much faster. The advantages might scale better if larger number of datavalues

One could argue MySQL 5.6 has many performance improvements, I didn’t have time to explore that.

The migration to PostgreSQL has some challenges. Following are steps I followed:

  • Take the mysqldump
  • replace bit(1) to tinyint(1) in the SQL file
  • You’ll see that column names are camelCase. This is an issue because postgres will added a double quotes around to get case-sensitivity, which MySQL by default nicely excludes.

So you’ll have to make all column names to lowercase and remove the quote characters. I did this with a simple java program. There are 150-odd column names that need changes.

  • Used Navcat premium (trial version or SQLSquirrel also has this feature). “Data transfer” is the name of the feature that will move data from MySQL to Postgres

  • In MySQL non-standard use of boolean (which came only a few yrs back), its converted to smallint in Postgres. I wrote a JDBC program to change column type from smallint to boolean. A single table example is as follows that can be made into a looping procedure as well in pure PSQL.

ALTER TABLE indicator ALTER COLUMN annualized TYPE boolean
USING CASE WHEN annualized = 0 THEN FALSE
WHEN annualized = 1 THEN TRUE
ELSE NULL
END;

  • remove NULL values from minimumvalue column of minmaxdatalement table

DELETE from minmaxdatalement WHERE minimumvalue=NULL

  • remove NULL values from maximumvalue column of minmaxdatalement table
    DELETE from minmaxdatalement WHERE maximumvalue=NULL

  • remove NULL values from name column of relationshiptype table
    DELETE from relationshiptype WHERE name=NULL

  • blobs to bytea conversion is a mess and I had to truncate. Probably a JDBC based connector program will do better conversion, but I just truncated it and accepted the data loss to systemsetting and usersetting :slight_smile:

Devs:
We need to make all column names lowercase in hbm.xml files in code. This will ensure portability and is generally a good practice.
We should also have a convention of using last_updated instead of lastUpdated in column names, as is the common practice.


Regards,

Saptarshi PURKAYASTHA

My Tech Blog: http://sunnytalkstech.blogspot.com
You Live by CHOICE, Not by CHANCE


Mailing list: https://launchpad.net/~dhis2-devs
Post to : dhis2-devs@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-devs
More help : https://help.launchpad.net/ListHelp

Thank you again!

Yes mysql might be better or not in another environment. Specially if we consider the number and sizes of files generated by your InnoDB settings. There is some discussion about how mysql could be affected by filesystem at http://serverfault.com/questions/29193/what-is-the-best-linux-filesystem-for-mysql-innodb. There are references for another big discussion on facebook.

This is not a debate of whether Postgres or MySQL is better than the other for the kind of tasks DHIS requires. No doubts postgres has advanced features which MySQL doesn’t at the moment. I am just adding more things to consider when doing these comparisons.

Orvalho

···

On Tue, May 7, 2013 at 9:58 PM, Saptarshi Purkayastha sunbiz@gmail.com wrote:

PS: DO NOT COPY THESE AS IS IN YOUR PRODUCTION SYSTEM. THESE ARE SPECIFIC TO MY DEVELOPER INSTALLATION

Yes, these are on Windows and the results may or may not be different on Linux.

The DHIS2 behavior would remain the same in its interaction with the database, since we are using the Type4 JDBC drivers

The tuning is very specific and I’ve tried these on a few applications and reached these numbers for my MySQL and PostgreSQL

For MySQL:
There are many tuning options in my.cnf that I’ve done from here - https://tools.percona.com/wizard

The following are the useful ones to look from my setup

CACHES AND LIMITS

tmp_table_size = 32M
max_heap_table_size = 32M
query_cache_type = 0
query_cache_size = 0

max_connections = 500
thread_cache_size = 50
open_files_limit = 65535
table_definition_cache = 4096
table_open_cache = 4096

INNODB

innodb_log_files_in_group = 2
innodb_log_file_size = 256M
innodb_flush_log_at_trx_commit = 1
innodb_file_per_table = 1
innodb_buffer_pool_size = 4G

For PostgreSQL:

Used pgtune for this… - http://pgfoundry.org/projects/pgtune
But also some other changes as well… the important ones are below:

shared_buffers = 2048MB

effective_cache_size = 3096MB
default_statistics_target = 800
synchronous_commit = local
wal_writer_delay = 600
full_page_writes = false
autovacuum = true
constraint_exclusion = partition


Regards,
Saptarshi PURKAYASTHA

My Tech Blog: http://sunnytalkstech.blogspot.com
You Live by CHOICE, Not by CHANCE

On 7 May 2013 20:36, Orvalho Augusto orvaquim@gmail.com wrote:

Wow! Weldone!

I do not have a huge database as yours. And yes I have migrated from MySQL to postgreSQL because I felt that the DHIS community does not support enough MySQL. Anyway getting back to your journey I must ask what means tunned MySQL or tunned PostgreSQL? What you tune might dictate the results.

Are the MySQL/PostgreSQL run in Windows? The multiple choices of filesystem and more possibilities to tune in Linux can give another picture to this. You have just migrated everything you will need to do it again but it would be nice to test the same in Linux too.

Anyway, thank you! This is immensely great.

Caveman

On Tue, May 7, 2013 at 2:53 PM, Saptarshi Purkayastha sunbiz@gmail.com wrote:

Hello DHIS users and devs,

Apologies for the long email… Should have probably been multiple blogpost to reduce its length…

I recently encountered a situation with a very large implementation of DHIS2 having problems generating data mart.

Thus no reports were generated and only data entry was being done. I thought I’d share some of the experiences to solve these issues, so that it might be useful to other implementers.

Some changes will be needed in the DHIS2 source, so sending this to the dev list also, where dev-related discussions can follow-up.

While PostgreSQL is our recommended database, many implementations have also used MySQL.

My findings clearly highlight that DHIS2 performs much better on PostgreSQL and there are also some bugs related to MySQL dialect.
Total org units - 26303
Total Monthly datasets - 9
Total Daily dataset - 1
Total Yearly dataset - 3

The implementation has about 34 million datavalues (non-zero)… but I pruned it for my benchmarking. I added 1-million datavalues and ran the data mart.

The results are from my fairly good laptop (quad-core i7; 8GB RAM; tuned JVM; tuned MySQL 5.5 (4GB RAM); tuned PostgreSQL 9.2 (4GB RAM); 240GB SSD)
Using DHIS2 2.11. When doing MySQL benchmark turned off all services including postgres and vice versa.
Java Opts: -Xmx3G -Xms768m -XX:MaxPermSize=512m
Java version: 1.7.0_21 x64
Java vendor: Oracle Corporation
OS name: WindowsMySQL = datamart completed in 3hrs 46min 12sec
PostgreSQL = datamart completed in 2hrs 5min 16sec

So, it is obvious that PostgreSQL is doing datamart much faster. The advantages might scale better if larger number of datavalues

One could argue MySQL 5.6 has many performance improvements, I didn’t have time to explore that.

The migration to PostgreSQL has some challenges. Following are steps I followed:

  • Take the mysqldump
  • replace bit(1) to tinyint(1) in the SQL file
  • You’ll see that column names are camelCase. This is an issue because postgres will added a double quotes around to get case-sensitivity, which MySQL by default nicely excludes.

So you’ll have to make all column names to lowercase and remove the quote characters. I did this with a simple java program. There are 150-odd column names that need changes.

  • Used Navcat premium (trial version or SQLSquirrel also has this feature). “Data transfer” is the name of the feature that will move data from MySQL to Postgres

  • In MySQL non-standard use of boolean (which came only a few yrs back), its converted to smallint in Postgres. I wrote a JDBC program to change column type from smallint to boolean. A single table example is as follows that can be made into a looping procedure as well in pure PSQL.

ALTER TABLE indicator ALTER COLUMN annualized TYPE boolean
USING CASE WHEN annualized = 0 THEN FALSE
WHEN annualized = 1 THEN TRUE
ELSE NULL
END;

  • remove NULL values from minimumvalue column of minmaxdatalement table

DELETE from minmaxdatalement WHERE minimumvalue=NULL

  • remove NULL values from maximumvalue column of minmaxdatalement table
    DELETE from minmaxdatalement WHERE maximumvalue=NULL

  • remove NULL values from name column of relationshiptype table
    DELETE from relationshiptype WHERE name=NULL

  • blobs to bytea conversion is a mess and I had to truncate. Probably a JDBC based connector program will do better conversion, but I just truncated it and accepted the data loss to systemsetting and usersetting :slight_smile:

Devs:
We need to make all column names lowercase in hbm.xml files in code. This will ensure portability and is generally a good practice.
We should also have a convention of using last_updated instead of lastUpdated in column names, as is the common practice.


Regards,

Saptarshi PURKAYASTHA

My Tech Blog: http://sunnytalkstech.blogspot.com
You Live by CHOICE, Not by CHANCE


Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Thanks Saptarshi, this is very helpful.

···

On Tue, May 7, 2013 at 6:23 PM, Saptarshi Purkayastha sunbiz@gmail.com wrote:

Hello DHIS users and devs,

Apologies for the long email… Should have probably been multiple blogpost to reduce its length…

I recently encountered a situation with a very large implementation of DHIS2 having problems generating data mart.

Thus no reports were generated and only data entry was being done. I thought I’d share some of the experiences to solve these issues, so that it might be useful to other implementers.

Some changes will be needed in the DHIS2 source, so sending this to the dev list also, where dev-related discussions can follow-up.

While PostgreSQL is our recommended database, many implementations have also used MySQL.

My findings clearly highlight that DHIS2 performs much better on PostgreSQL and there are also some bugs related to MySQL dialect.
Total org units - 26303
Total Monthly datasets - 9
Total Daily dataset - 1
Total Yearly dataset - 3

The implementation has about 34 million datavalues (non-zero)… but I pruned it for my benchmarking. I added 1-million datavalues and ran the data mart.

The results are from my fairly good laptop (quad-core i7; 8GB RAM; tuned JVM; tuned MySQL 5.5 (4GB RAM); tuned PostgreSQL 9.2 (4GB RAM); 240GB SSD)
Using DHIS2 2.11. When doing MySQL benchmark turned off all services including postgres and vice versa.
Java Opts: -Xmx3G -Xms768m -XX:MaxPermSize=512m
Java version: 1.7.0_21 x64
Java vendor: Oracle Corporation
OS name: WindowsMySQL = datamart completed in 3hrs 46min 12sec
PostgreSQL = datamart completed in 2hrs 5min 16sec

So, it is obvious that PostgreSQL is doing datamart much faster. The advantages might scale better if larger number of datavalues

One could argue MySQL 5.6 has many performance improvements, I didn’t have time to explore that.

The migration to PostgreSQL has some challenges. Following are steps I followed:

  • Take the mysqldump
  • replace bit(1) to tinyint(1) in the SQL file
  • You’ll see that column names are camelCase. This is an issue because postgres will added a double quotes around to get case-sensitivity, which MySQL by default nicely excludes.

So you’ll have to make all column names to lowercase and remove the quote characters. I did this with a simple java program. There are 150-odd column names that need changes.

  • Used Navcat premium (trial version or SQLSquirrel also has this feature). “Data transfer” is the name of the feature that will move data from MySQL to Postgres

  • In MySQL non-standard use of boolean (which came only a few yrs back), its converted to smallint in Postgres. I wrote a JDBC program to change column type from smallint to boolean. A single table example is as follows that can be made into a looping procedure as well in pure PSQL.

ALTER TABLE indicator ALTER COLUMN annualized TYPE boolean
USING CASE WHEN annualized = 0 THEN FALSE
WHEN annualized = 1 THEN TRUE
ELSE NULL
END;

  • remove NULL values from minimumvalue column of minmaxdatalement table

DELETE from minmaxdatalement WHERE minimumvalue=NULL

  • remove NULL values from maximumvalue column of minmaxdatalement table
    DELETE from minmaxdatalement WHERE maximumvalue=NULL

  • remove NULL values from name column of relationshiptype table
    DELETE from relationshiptype WHERE name=NULL

  • blobs to bytea conversion is a mess and I had to truncate. Probably a JDBC based connector program will do better conversion, but I just truncated it and accepted the data loss to systemsetting and usersetting :slight_smile:

Devs:
We need to make all column names lowercase in hbm.xml files in code. This will ensure portability and is generally a good practice.
We should also have a convention of using last_updated instead of lastUpdated in column names, as is the common practice.


Regards,

Saptarshi PURKAYASTHA

My Tech Blog: http://sunnytalkstech.blogspot.com
You Live by CHOICE, Not by CHANCE


Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Regards,
Bharath Kumar. Ch

Dear Saptarshi

Thank you for helping to convert one of the Bangladesh database from MySQL to PostgreSQL. But there is still a long way to go.

So far all tables are converted and the metadata are exported from MySQL and imported into PostgreSQL. What we find that the import has lots of inconsistency and we are trying to find solution. Few problems we already solved but I need the reason. We are working on it and within few days I will write to you all for reason and help. But the man challenge is to convert 32 million data value and 26000 orgunit and associated tables because of this metadata conversion problem.

What I find the DHIS2 version 2.11 is not suitable for MySQL which is not fare. We just tuned our MySQL Server and 5.6 have significant improvement in performance. So I am requesting the community to reconsider the use of MySQL as well.

···

Regars

Hannan Khan

I don’t think I meant to convey that DHIS2 doesn’t work on MySQL.
Its just extremely difficult to test all features on all platforms for developers. This is true for all projects. Users have to take initiative to report issues of running DHIS2 on MySQL, if developers have missed something.

I just highlighted through testing that DHIS2 datamart ran faster on PostgreSQL 9.2. It might be that MariaDB community, which has been making good strides on MySQL, might perform on par or better than PostgreSQL for DHIS2. With SkySQL aligning teams, Oracle might be pushed to release more of their Enterprise-only features to the community version also. DHIS2 core developers just don’t have resources to test on all platforms. Implementations might have to discover these over time.

···

Regars

Hannan Khan

I understood your point and yes it is good to share what you have done.

What worries me is you say “DHIS2 datamart ran faster on PostgreSQL 9.2”. Your conditions are not the same for everyone. You did on Windows and most of DHIS are setup on Linux which is a quite different situation.

You are write about the facts on MySQL. No doubts it is a confusion these days to use MySQL.

Caveman

···

On Wed, May 8, 2013 at 12:13 PM, Saptarshi Purkayastha sunbiz@gmail.com wrote:

I don’t think I meant to convey that DHIS2 doesn’t work on MySQL.
Its just extremely difficult to test all features on all platforms for developers. This is true for all projects. Users have to take initiative to report issues of running DHIS2 on MySQL, if developers have missed something.

I just highlighted through testing that DHIS2 datamart ran faster on PostgreSQL 9.2. It might be that MariaDB community, which has been making good strides on MySQL, might perform on par or better than PostgreSQL for DHIS2. With SkySQL aligning teams, Oracle might be pushed to release more of their Enterprise-only features to the community version also. DHIS2 core developers just don’t have resources to test on all platforms. Implementations might have to discover these over time.


Regards,
Saptarshi PURKAYASTHA

My Tech Blog: http://sunnytalkstech.blogspot.com
You Live by CHOICE, Not by CHANCE

On 8 May 2013 10:32, Hannan Khan hannank@gmail.com wrote:

Dear Saptarshi

Thank you for helping to convert one of the Bangladesh database from MySQL to PostgreSQL. But there is still a long way to go.

So far all tables are converted and the metadata are exported from MySQL and imported into PostgreSQL. What we find that the import has lots of inconsistency and we are trying to find solution. Few problems we already solved but I need the reason. We are working on it and within few days I will write to you all for reason and help. But the man challenge is to convert 32 million data value and 26000 orgunit and associated tables because of this metadata conversion problem.

What I find the DHIS2 version 2.11 is not suitable for MySQL which is not fare. We just tuned our MySQL Server and 5.6 have significant improvement in performance. So I am requesting the community to reconsider the use of MySQL as well.

On Tue, May 7, 2013 at 6:53 PM, Saptarshi Purkayastha sunbiz@gmail.com wrote:

Hello DHIS users and devs,

Apologies for the long email… Should have probably been multiple blogpost to reduce its length…

I recently encountered a situation with a very large implementation of DHIS2 having problems generating data mart.

Thus no reports were generated and only data entry was being done. I thought I’d share some of the experiences to solve these issues, so that it might be useful to other implementers.

Some changes will be needed in the DHIS2 source, so sending this to the dev list also, where dev-related discussions can follow-up.

While PostgreSQL is our recommended database, many implementations have also used MySQL.

My findings clearly highlight that DHIS2 performs much better on PostgreSQL and there are also some bugs related to MySQL dialect.
Total org units - 26303
Total Monthly datasets - 9
Total Daily dataset - 1
Total Yearly dataset - 3

The implementation has about 34 million datavalues (non-zero)… but I pruned it for my benchmarking. I added 1-million datavalues and ran the data mart.

The results are from my fairly good laptop (quad-core i7; 8GB RAM; tuned JVM; tuned MySQL 5.5 (4GB RAM); tuned PostgreSQL 9.2 (4GB RAM); 240GB SSD)
Using DHIS2 2.11. When doing MySQL benchmark turned off all services including postgres and vice versa.
Java Opts: -Xmx3G -Xms768m -XX:MaxPermSize=512m
Java version: 1.7.0_21 x64
Java vendor: Oracle Corporation
OS name: WindowsMySQL = datamart completed in 3hrs 46min 12sec
PostgreSQL = datamart completed in 2hrs 5min 16sec

So, it is obvious that PostgreSQL is doing datamart much faster. The advantages might scale better if larger number of datavalues

One could argue MySQL 5.6 has many performance improvements, I didn’t have time to explore that.

The migration to PostgreSQL has some challenges. Following are steps I followed:

  • Take the mysqldump
  • replace bit(1) to tinyint(1) in the SQL file
  • You’ll see that column names are camelCase. This is an issue because postgres will added a double quotes around to get case-sensitivity, which MySQL by default nicely excludes.

So you’ll have to make all column names to lowercase and remove the quote characters. I did this with a simple java program. There are 150-odd column names that need changes.

  • Used Navcat premium (trial version or SQLSquirrel also has this feature). “Data transfer” is the name of the feature that will move data from MySQL to Postgres

  • In MySQL non-standard use of boolean (which came only a few yrs back), its converted to smallint in Postgres. I wrote a JDBC program to change column type from smallint to boolean. A single table example is as follows that can be made into a looping procedure as well in pure PSQL.

ALTER TABLE indicator ALTER COLUMN annualized TYPE boolean
USING CASE WHEN annualized = 0 THEN FALSE
WHEN annualized = 1 THEN TRUE
ELSE NULL
END;

  • remove NULL values from minimumvalue column of minmaxdatalement table

DELETE from minmaxdatalement WHERE minimumvalue=NULL

  • remove NULL values from maximumvalue column of minmaxdatalement table
    DELETE from minmaxdatalement WHERE maximumvalue=NULL

  • remove NULL values from name column of relationshiptype table
    DELETE from relationshiptype WHERE name=NULL

  • blobs to bytea conversion is a mess and I had to truncate. Probably a JDBC based connector program will do better conversion, but I just truncated it and accepted the data loss to systemsetting and usersetting :slight_smile:

Devs:
We need to make all column names lowercase in hbm.xml files in code. This will ensure portability and is generally a good practice.
We should also have a convention of using last_updated instead of lastUpdated in column names, as is the common practice.


Regards,

Saptarshi PURKAYASTHA

My Tech Blog: http://sunnytalkstech.blogspot.com
You Live by CHOICE, Not by CHANCE


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

Regars

Hannan Khan

I understood your point and yes it is good to share what you have done.

What worries me is you say "DHIS2 datamart ran faster on PostgreSQL 9.2".
Your conditions are not the same for everyone. You did on Windows and most
of DHIS are setup on Linux which is a quite different situation.

Yes that was maybe a brave statement. I am sure a well configured mysql
would outperform a poorly configured postgresql. Though I would guess that
Saptarshi's best effort configuration of both is probably reasonable.

A real cat amongst the pigeons would be if we dropped mysql support
altogether and focussed entirely on postgres, taking advantage of postgres
specific features and functions (eg crosstab function, postgis etc). Then
you could see order of magnitude performance improvements. But there are
lots of people on this thread who might shoot me for suggesting that so I
won't .. yet :slight_smile:

···

On 8 May 2013 12:33, Orvalho Augusto <orvaquim@gmail.com> wrote:

You are write about the facts on MySQL. No doubts it is a confusion these
days to use MySQL.

Caveman

On Wed, May 8, 2013 at 12:13 PM, Saptarshi Purkayastha <sunbiz@gmail.com>wrote:

I don't think I meant to convey that DHIS2 doesn't work on MySQL.
Its just extremely difficult to test all features on all platforms for
developers. This is true for all projects. Users have to take initiative to
report issues of running DHIS2 on MySQL, if developers have missed
something.

I just highlighted through testing that DHIS2 datamart ran faster on
PostgreSQL 9.2. It might be that MariaDB community, which has been making
good strides on MySQL, might perform on par or better than PostgreSQL for
DHIS2. With SkySQL aligning teams, Oracle might be pushed to release more
of their Enterprise-only features to the community version also. DHIS2 core
developers just don't have resources to test on all platforms.
Implementations might have to discover these over time.

---
Regards,
Saptarshi PURKAYASTHA

My Tech Blog: http://sunnytalkstech.blogspot.com
You Live by CHOICE, Not by CHANCE

On 8 May 2013 10:32, Hannan Khan <hannank@gmail.com> wrote:

Dear Saptarshi

Thank you for helping to convert one of the Bangladesh database from
MySQL to PostgreSQL. But there is still a long way to go.

So far all tables are converted and the metadata are exported from MySQL
and imported into PostgreSQL. What we find that the import has lots of
inconsistency and we are trying to find solution. Few problems we already
solved but I need the reason. We are working on it and within few days I
will write to you all for reason and help. But the man challenge is to
convert 32 million data value and 26000 orgunit and associated tables
because of this metadata conversion problem.

What I find the DHIS2 version 2.11 is not suitable for MySQL which is
not fare. We just tuned our MySQL Server and 5.6 have significant
improvement in performance.* So I am requesting the community to
reconsider the use of MySQL as well.*

Regars

Hannan Khan
On Tue, May 7, 2013 at 6:53 PM, Saptarshi Purkayastha <sunbiz@gmail.com>wrote:

Hello DHIS users and devs,

Apologies for the long email... Should have probably been multiple
blogpost to reduce its length...

I recently encountered a situation with a very large implementation of
DHIS2 having problems generating data mart.
Thus no reports were generated and only data entry was being done. I
thought I'd share some of the experiences to solve these issues, so that it
might be useful to other implementers.
Some changes will be needed in the DHIS2 source, so sending this to the
dev list also, where dev-related discussions can follow-up.

While PostgreSQL is our recommended database, many implementations have
also used MySQL.
My findings clearly highlight that DHIS2 performs much better on
PostgreSQL and there are also some bugs related to MySQL dialect.
Total org units - 26303
Total Monthly datasets - 9
Total Daily dataset - 1
Total Yearly dataset - 3

The implementation has about 34 million datavalues (non-zero)... but I
pruned it for my benchmarking. I added 1-million datavalues and ran the
data mart.
The results are from my fairly good laptop (quad-core i7; 8GB RAM;
tuned JVM; tuned MySQL 5.5 (4GB RAM); tuned PostgreSQL 9.2 (4GB RAM); 240GB
SSD)
Using DHIS2 2.11. When doing MySQL benchmark turned off all services
including postgres and vice versa.
Java Opts: -Xmx3G -Xms768m -XX:MaxPermSize=512m
Java version: 1.7.0_21 x64
Java vendor: Oracle Corporation OS name: WindowsMySQL = datamart
completed in 3hrs 46min 12sec
PostgreSQL = datamart completed in 2hrs 5min 16sec

So, it is obvious that PostgreSQL is doing datamart much faster. The
advantages might scale better if larger number of datavalues
One could argue MySQL 5.6 has many performance improvements, I didn't
have time to explore that.
The migration to PostgreSQL has some challenges. Following are steps I
followed:

- Take the mysqldump
- replace bit(1) to tinyint(1) in the SQL file
- You'll see that column names are camelCase. This is an issue because
postgres will added a double quotes around to get case-sensitivity, which
MySQL by default nicely excludes.
So you'll have to make all column names to lowercase and remove the
quote characters. I did this with a simple java program. There are 150-odd
column names that need changes.
- Used Navcat premium (trial version or SQLSquirrel also has this
feature). "Data transfer" is the name of the feature that will move data
from MySQL to Postgres
- In MySQL non-standard use of boolean (which came only a few yrs
back), its converted to smallint in Postgres. I wrote a JDBC program to
change column type from smallint to boolean. A single table example is as
follows that can be made into a looping procedure as well in pure PSQL.
ALTER TABLE indicator ALTER COLUMN annualized TYPE boolean
    USING CASE WHEN annualized = 0 THEN FALSE
           WHEN annualized = 1 THEN TRUE
           ELSE NULL
    END;
- remove NULL values from minimumvalue column of minmaxdatalement table
DELETE from minmaxdatalement WHERE minimumvalue=NULL
- remove NULL values from maximumvalue column of minmaxdatalement table
DELETE from minmaxdatalement WHERE maximumvalue=NULL
- remove NULL values from name column of relationshiptype table
DELETE from relationshiptype WHERE name=NULL
- blobs to bytea conversion is a mess and I had to truncate. Probably
a JDBC based connector program will do better conversion, but I just
truncated it and accepted the data loss to systemsetting and usersetting :slight_smile:
*
Devs*:
We need to make all column names lowercase in hbm.xml files in code.
This will ensure portability and is generally a good practice.
We should also have a convention of using last_updated instead of
lastUpdated in column names, as is the common practice.

---
Regards,
Saptarshi PURKAYASTHA

My Tech Blog: http://sunnytalkstech.blogspot.com
You Live by CHOICE, Not by CHANCE

_______________________________________________
Mailing list: DHIS 2 Users in Launchpad
Post to : dhis2-users@lists.launchpad.net
Unsubscribe : DHIS 2 Users in Launchpad
More help : ListHelp - Launchpad Help

_______________________________________________
Mailing list: DHIS 2 Users in Launchpad
Post to : dhis2-users@lists.launchpad.net
Unsubscribe : DHIS 2 Users in Launchpad
More help : ListHelp - Launchpad Help

_______________________________________________
Mailing list: DHIS 2 Users in Launchpad
Post to : dhis2-users@lists.launchpad.net
Unsubscribe : DHIS 2 Users in Launchpad
More help : ListHelp - Launchpad Help