Cross database querying for updating data

Can I safely update DHIS2 database A with data from DHIS database B both on the same PostgreSQL server by cross query. B is backup of A

From database B, I can retrieve and save the data that is required using:

psql B -c “\copy (select * from datavalue where periodid=‘xxxx’) TO STDOUT” > /tmp/data.tsv;

And to restore:

psql A -c “\copy datavaue (x, y, z, …) FROM /tmp/data.tsv”

Is this recommended, and what else should be cross transferred.

It may work, but in general, it will not because the internal IDs of different DHIS2 databases are not the same. So, it might work, but it requires both databases to be in essentially the same state. Better to use the data import/export functionality if you can.

Regards,

Jason

···

On Mon, Apr 20, 2015 at 3:28 PM Muyepa A. muyepaa@gmail.com wrote:

Can I safely update DHIS2 database A with data from DHIS database B both on the same PostgreSQL server by cross query. B is backup of A

From database B, I can retrieve and save the data that is required using:

psql B -c “\copy (select * from datavalue where periodid=‘xxxx’) TO STDOUT” > /tmp/data.tsv;

And to restore:

psql A -c “\copy datavaue (x, y, z, …) FROM /tmp/data.tsv”

Is this recommended, and what else should be cross transferred.


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

I had thought of data import/export functionality, however this will require two instances of tomcat to be running. (or one instance with one database at a time, however there will be an associated down time).
Since upgrading to 2.18, when exported, the resultant download zip file says invalid for all formats (xml, json, excel) and is only 2kb in size regardless of the hierarchy I choose.

Since one database is copy of the other, i thought I could easily update from the other as described.

···

On Mon, Apr 20, 2015 at 3:48 PM, Jason Pickering jason.p.pickering@gmail.com wrote:

It may work, but in general, it will not because the internal IDs of different DHIS2 databases are not the same. So, it might work, but it requires both databases to be in essentially the same state. Better to use the data import/export functionality if you can.

Regards,

Jason

On Mon, Apr 20, 2015 at 3:28 PM Muyepa A. muyepaa@gmail.com wrote:

Can I safely update DHIS2 database A with data from DHIS database B both on the same PostgreSQL server by cross query. B is backup of A

From database B, I can retrieve and save the data that is required using:

psql B -c “\copy (select * from datavalue where periodid=‘xxxx’) TO STDOUT” > /tmp/data.tsv;

And to restore:

psql A -c “\copy datavaue (x, y, z, …) FROM /tmp/data.tsv”

Is this recommended, and what else should be cross transferred.


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

If database B is restored off a backup of database A then it will
probably work with some caveats. Clearly they are not exactly the
same because you've got extra data in A which you are trying to push
into B. So it depends really on what else might have changed in A.

If you have made any changes to orgunits, datelements or periods then
that will cause at best a database integrity violation when you try to
import that way. At worst cause a very horrible mess.

Even though the process is fragile, there is a related use case where
one might need to do such a thing or something similar. If a country
system is hosted somewhere in the cloud (because they have poor or
expensive local bandwidth) then there is a real problem to solve about
getting data backups from the server back into the country.

In particular we'd like to be able to pull not all the datavalues but
only those which have been changed ie. do an incremental backup of the
datavalues table. The same could apply to events.

One possible approach might be to first do a metadata backup and
restore. And then pull the changed datavalues - though you would have
to do this through the api with uids or codes as the metadata backup
and restore would have ignored the database identifiers.

Yet another approach would be to take a reduced postgres backup
excluding the datavalues table, restore that and then try and pull in
the delta of datavalues table something like you describe.

Its a tough problem. Jason, what do you suggest? Downloading the
full postgres backup (with ephemeral tables like analytics excluded)
can be a serious challenge for many countries to do with regularity
and reliability. So how best to go about incremental backup and
restore of data?

···

On 20 April 2015 at 15:36, Muyepa A. <muyepaa@gmail.com> wrote:

I had thought of data import/export functionality, however this will
require two instances of tomcat to be running. (or one instance with one
database at a time, however there will be an associated down time).
Since upgrading to 2.18, when exported, the resultant download zip file says
invalid for all formats (xml, json, excel) and is only 2kb in size
regardless of the hierarchy I choose.

Since one database is copy of the other, i thought I could easily update
from the other as described.

On Mon, Apr 20, 2015 at 3:48 PM, Jason Pickering > <jason.p.pickering@gmail.com> wrote:

It may work, but in general, it will not because the internal IDs of
different DHIS2 databases are not the same. So, it might work, but it
requires both databases to be in essentially the same state. Better to use
the data import/export functionality if you can.

Regards,
Jason

On Mon, Apr 20, 2015 at 3:28 PM Muyepa A. <muyepaa@gmail.com> wrote:

Can I safely update DHIS2 database A with data from DHIS database B both
on the same PostgreSQL server by cross query. B is backup of A

From database B, I can retrieve and save the data that is required using:

psql B -c "\copy (select * from datavalue where periodid='xxxx') TO
STDOUT" > /tmp/data.tsv;

And to restore:
psql A -c "\copy datavaue (x, y, z, ...) FROM /tmp/data.tsv"

Is this recommended, and what else should be cross transferred.

_______________________________________________
Mailing list: https://launchpad.net/~dhis2-users
Post to : dhis2-users@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-users
More help : https://help.launchpad.net/ListHelp

_______________________________________________
Mailing list: https://launchpad.net/~dhis2-users
Post to : dhis2-users@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-users
More help : https://help.launchpad.net/ListHelp

Indeed Database A has extra data not available in B while B has data missing in A. No changes have been made to orgunits, dataelements, periods or users. The delta is in data-values. I would really want to get views on whether the operation described (all other parameters being equal) will bring the missing data into A, and whether it may affect report summary rates.
I cannot easily download the big databases. Something on command-line would be very suitable for this circumstance and transparent to users.

···

On Mon, Apr 20, 2015 at 5:08 PM, Bob Jolliffe bobjolliffe@gmail.com wrote:

If database B is restored off a backup of database A then it will

probably work with some caveats. Clearly they are not exactly the

same because you’ve got extra data in A which you are trying to push

into B. So it depends really on what else might have changed in A.

If you have made any changes to orgunits, datelements or periods then

that will cause at best a database integrity violation when you try to

import that way. At worst cause a very horrible mess.

Even though the process is fragile, there is a related use case where

one might need to do such a thing or something similar. If a country

system is hosted somewhere in the cloud (because they have poor or

expensive local bandwidth) then there is a real problem to solve about

getting data backups from the server back into the country.

In particular we’d like to be able to pull not all the datavalues but

only those which have been changed ie. do an incremental backup of the

datavalues table. The same could apply to events.

One possible approach might be to first do a metadata backup and

restore. And then pull the changed datavalues - though you would have

to do this through the api with uids or codes as the metadata backup

and restore would have ignored the database identifiers.

Yet another approach would be to take a reduced postgres backup

excluding the datavalues table, restore that and then try and pull in

the delta of datavalues table something like you describe.

Its a tough problem. Jason, what do you suggest? Downloading the

full postgres backup (with ephemeral tables like analytics excluded)

can be a serious challenge for many countries to do with regularity

and reliability. So how best to go about incremental backup and

restore of data?

On 20 April 2015 at 15:36, Muyepa A. muyepaa@gmail.com wrote:

I had thought of data import/export functionality, however this will

require two instances of tomcat to be running. (or one instance with one

database at a time, however there will be an associated down time).

Since upgrading to 2.18, when exported, the resultant download zip file says

invalid for all formats (xml, json, excel) and is only 2kb in size

regardless of the hierarchy I choose.

Since one database is copy of the other, i thought I could easily update

from the other as described.

On Mon, Apr 20, 2015 at 3:48 PM, Jason Pickering

jason.p.pickering@gmail.com wrote:

It may work, but in general, it will not because the internal IDs of

different DHIS2 databases are not the same. So, it might work, but it

requires both databases to be in essentially the same state. Better to use

the data import/export functionality if you can.

Regards,

Jason

On Mon, Apr 20, 2015 at 3:28 PM Muyepa A. muyepaa@gmail.com wrote:

Can I safely update DHIS2 database A with data from DHIS database B both

on the same PostgreSQL server by cross query. B is backup of A

From database B, I can retrieve and save the data that is required using:

psql B -c "\copy (select * from datavalue where periodid=‘xxxx’) TO

STDOUT" > /tmp/data.tsv;

And to restore:

psql A -c “\copy datavaue (x, y, z, …) FROM /tmp/data.tsv”

Is this recommended, and what else should be cross transferred.


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

I think I would trust UIDs or Codes more than database IDs

···

On Mon, Apr 20, 2015 at 10:48 PM, Muyepa A. muyepaa@gmail.com wrote:

Indeed Database A has extra data not available in B while B has data missing in A. No changes have been made to orgunits, dataelements, periods or users. The delta is in data-values. I would really want to get views on whether the operation described (all other parameters being equal) will bring the missing data into A, and whether it may affect report summary rates.
I cannot easily download the big databases. Something on command-line would be very suitable for this circumstance and transparent to users.


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

On Mon, Apr 20, 2015 at 5:08 PM, Bob Jolliffe bobjolliffe@gmail.com wrote:

If database B is restored off a backup of database A then it will

probably work with some caveats. Clearly they are not exactly the

same because you’ve got extra data in A which you are trying to push

into B. So it depends really on what else might have changed in A.

If you have made any changes to orgunits, datelements or periods then

that will cause at best a database integrity violation when you try to

import that way. At worst cause a very horrible mess.

Even though the process is fragile, there is a related use case where

one might need to do such a thing or something similar. If a country

system is hosted somewhere in the cloud (because they have poor or

expensive local bandwidth) then there is a real problem to solve about

getting data backups from the server back into the country.

In particular we’d like to be able to pull not all the datavalues but

only those which have been changed ie. do an incremental backup of the

datavalues table. The same could apply to events.

One possible approach might be to first do a metadata backup and

restore. And then pull the changed datavalues - though you would have

to do this through the api with uids or codes as the metadata backup

and restore would have ignored the database identifiers.

Yet another approach would be to take a reduced postgres backup

excluding the datavalues table, restore that and then try and pull in

the delta of datavalues table something like you describe.

Its a tough problem. Jason, what do you suggest? Downloading the

full postgres backup (with ephemeral tables like analytics excluded)

can be a serious challenge for many countries to do with regularity

and reliability. So how best to go about incremental backup and

restore of data?

On 20 April 2015 at 15:36, Muyepa A. muyepaa@gmail.com wrote:

I had thought of data import/export functionality, however this will

require two instances of tomcat to be running. (or one instance with one

database at a time, however there will be an associated down time).

Since upgrading to 2.18, when exported, the resultant download zip file says

invalid for all formats (xml, json, excel) and is only 2kb in size

regardless of the hierarchy I choose.

Since one database is copy of the other, i thought I could easily update

from the other as described.

On Mon, Apr 20, 2015 at 3:48 PM, Jason Pickering

jason.p.pickering@gmail.com wrote:

It may work, but in general, it will not because the internal IDs of

different DHIS2 databases are not the same. So, it might work, but it

requires both databases to be in essentially the same state. Better to use

the data import/export functionality if you can.

Regards,

Jason

On Mon, Apr 20, 2015 at 3:28 PM Muyepa A. muyepaa@gmail.com wrote:

Can I safely update DHIS2 database A with data from DHIS database B both

on the same PostgreSQL server by cross query. B is backup of A

From database B, I can retrieve and save the data that is required using:

psql B -c "\copy (select * from datavalue where periodid=‘xxxx’) TO

STDOUT" > /tmp/data.tsv;

And to restore:

psql A -c “\copy datavaue (x, y, z, …) FROM /tmp/data.tsv”

Is this recommended, and what else should be cross transferred.


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

Knut Staring

Dept. of Informatics, University of Oslo

Norway: +4791880522

Skype: knutstar

http://dhis2.org

Hi there. I think the big challenge here is what is not there, as opposed to what needs to be added. DHIS2 has no functionality for “soft deletes”, meaning when you mark a record with a column like “deletedon” with a timestamp for deletion, but you actually do not remove it from the database. This is a common problem with DXF syncronizations, as the upstream system does not transmit DELETES. It only transmits INSERTS and UPDATES.

At first glance, in your case, it would seem to be pretty easy, since you can use the “lastupdated” of the data value table to parse out values which have been added since your last synchronization. The problem however is to determine what has been DELETED from the data value table.Well, that is more tricky, and would require you diff both data values tables to determine what changes you need to transmit to your downstream system.

There are tools out there, to help determine the “diff” between two databases, which I think in your case would require you to have copy of your remote DB and then diff it against your local DB. You would then prepare a diff, make the update to your local copy and then you remote copy.

I think that would really be the only real way of doing this, using a tool like perhaps these (http://stackoverflow.com/questions/4804779/how-to-check-difference-between-two-databases-in-p) other other commercial alternatives to diff the two databases, and prepare a series of INSERT and DELETE statements. Never attempted it myself, but it might work.

Does not seem really that trivial. :frowning:

Regards,

Jason

···

On Mon, Apr 20, 2015 at 10:48 PM, Muyepa A. muyepaa@gmail.com wrote:

Indeed Database A has extra data not available in B while B has data missing in A. No changes have been made to orgunits, dataelements, periods or users. The delta is in data-values. I would really want to get views on whether the operation described (all other parameters being equal) will bring the missing data into A, and whether it may affect report summary rates.
I cannot easily download the big databases. Something on command-line would be very suitable for this circumstance and transparent to users.


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

On Mon, Apr 20, 2015 at 5:08 PM, Bob Jolliffe bobjolliffe@gmail.com wrote:

If database B is restored off a backup of database A then it will

probably work with some caveats. Clearly they are not exactly the

same because you’ve got extra data in A which you are trying to push

into B. So it depends really on what else might have changed in A.

If you have made any changes to orgunits, datelements or periods then

that will cause at best a database integrity violation when you try to

import that way. At worst cause a very horrible mess.

Even though the process is fragile, there is a related use case where

one might need to do such a thing or something similar. If a country

system is hosted somewhere in the cloud (because they have poor or

expensive local bandwidth) then there is a real problem to solve about

getting data backups from the server back into the country.

In particular we’d like to be able to pull not all the datavalues but

only those which have been changed ie. do an incremental backup of the

datavalues table. The same could apply to events.

One possible approach might be to first do a metadata backup and

restore. And then pull the changed datavalues - though you would have

to do this through the api with uids or codes as the metadata backup

and restore would have ignored the database identifiers.

Yet another approach would be to take a reduced postgres backup

excluding the datavalues table, restore that and then try and pull in

the delta of datavalues table something like you describe.

Its a tough problem. Jason, what do you suggest? Downloading the

full postgres backup (with ephemeral tables like analytics excluded)

can be a serious challenge for many countries to do with regularity

and reliability. So how best to go about incremental backup and

restore of data?

On 20 April 2015 at 15:36, Muyepa A. muyepaa@gmail.com wrote:

I had thought of data import/export functionality, however this will

require two instances of tomcat to be running. (or one instance with one

database at a time, however there will be an associated down time).

Since upgrading to 2.18, when exported, the resultant download zip file says

invalid for all formats (xml, json, excel) and is only 2kb in size

regardless of the hierarchy I choose.

Since one database is copy of the other, i thought I could easily update

from the other as described.

On Mon, Apr 20, 2015 at 3:48 PM, Jason Pickering

jason.p.pickering@gmail.com wrote:

It may work, but in general, it will not because the internal IDs of

different DHIS2 databases are not the same. So, it might work, but it

requires both databases to be in essentially the same state. Better to use

the data import/export functionality if you can.

Regards,

Jason

On Mon, Apr 20, 2015 at 3:28 PM Muyepa A. muyepaa@gmail.com wrote:

Can I safely update DHIS2 database A with data from DHIS database B both

on the same PostgreSQL server by cross query. B is backup of A

From database B, I can retrieve and save the data that is required using:

psql B -c "\copy (select * from datavalue where periodid=‘xxxx’) TO

STDOUT" > /tmp/data.tsv;

And to restore:

psql A -c “\copy datavaue (x, y, z, …) FROM /tmp/data.tsv”

Is this recommended, and what else should be cross transferred.


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp


Knut Staring

Dept. of Informatics, University of Oslo

Norway: +4791880522

Skype: knutstar

http://dhis2.org

Couldn’t two “left joins with null” give you the diff?

···

On Mon, Apr 20, 2015 at 10:48 PM, Muyepa A. muyepaa@gmail.com wrote:

Indeed Database A has extra data not available in B while B has data missing in A. No changes have been made to orgunits, dataelements, periods or users. The delta is in data-values. I would really want to get views on whether the operation described (all other parameters being equal) will bring the missing data into A, and whether it may affect report summary rates.
I cannot easily download the big databases. Something on command-line would be very suitable for this circumstance and transparent to users.


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

On Mon, Apr 20, 2015 at 5:08 PM, Bob Jolliffe bobjolliffe@gmail.com wrote:

If database B is restored off a backup of database A then it will

probably work with some caveats. Clearly they are not exactly the

same because you’ve got extra data in A which you are trying to push

into B. So it depends really on what else might have changed in A.

If you have made any changes to orgunits, datelements or periods then

that will cause at best a database integrity violation when you try to

import that way. At worst cause a very horrible mess.

Even though the process is fragile, there is a related use case where

one might need to do such a thing or something similar. If a country

system is hosted somewhere in the cloud (because they have poor or

expensive local bandwidth) then there is a real problem to solve about

getting data backups from the server back into the country.

In particular we’d like to be able to pull not all the datavalues but

only those which have been changed ie. do an incremental backup of the

datavalues table. The same could apply to events.

One possible approach might be to first do a metadata backup and

restore. And then pull the changed datavalues - though you would have

to do this through the api with uids or codes as the metadata backup

and restore would have ignored the database identifiers.

Yet another approach would be to take a reduced postgres backup

excluding the datavalues table, restore that and then try and pull in

the delta of datavalues table something like you describe.

Its a tough problem. Jason, what do you suggest? Downloading the

full postgres backup (with ephemeral tables like analytics excluded)

can be a serious challenge for many countries to do with regularity

and reliability. So how best to go about incremental backup and

restore of data?

On 20 April 2015 at 15:36, Muyepa A. muyepaa@gmail.com wrote:

I had thought of data import/export functionality, however this will

require two instances of tomcat to be running. (or one instance with one

database at a time, however there will be an associated down time).

Since upgrading to 2.18, when exported, the resultant download zip file says

invalid for all formats (xml, json, excel) and is only 2kb in size

regardless of the hierarchy I choose.

Since one database is copy of the other, i thought I could easily update

from the other as described.

On Mon, Apr 20, 2015 at 3:48 PM, Jason Pickering

jason.p.pickering@gmail.com wrote:

It may work, but in general, it will not because the internal IDs of

different DHIS2 databases are not the same. So, it might work, but it

requires both databases to be in essentially the same state. Better to use

the data import/export functionality if you can.

Regards,

Jason

On Mon, Apr 20, 2015 at 3:28 PM Muyepa A. muyepaa@gmail.com wrote:

Can I safely update DHIS2 database A with data from DHIS database B both

on the same PostgreSQL server by cross query. B is backup of A

From database B, I can retrieve and save the data that is required using:

psql B -c "\copy (select * from datavalue where periodid=‘xxxx’) TO

STDOUT" > /tmp/data.tsv;

And to restore:

psql A -c “\copy datavaue (x, y, z, …) FROM /tmp/data.tsv”

Is this recommended, and what else should be cross transferred.


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp


Knut Staring

Dept. of Informatics, University of Oslo

Norway: +4791880522

Skype: knutstar

http://dhis2.org

Yeah maybe, or simply “NOT EXISTS” .

However, cross-database queries in postgres are not in general supported without an extension, so you would need to deal with that, and obviously have a completely integral copy of the remote database on the local machine.

Seems like a lot of moving parts, but it could be done. Whether or not it would save you any bandwidth is another issue.

Anthony, how big is the database which you are trying to replicate, after compression? Are you working over a very slow link and thus, you are trying to decrease the size of what you need to download to the remote machine?

Regards,

Jason

···

On Mon, Apr 20, 2015 at 10:48 PM, Muyepa A. muyepaa@gmail.com wrote:

Indeed Database A has extra data not available in B while B has data missing in A. No changes have been made to orgunits, dataelements, periods or users. The delta is in data-values. I would really want to get views on whether the operation described (all other parameters being equal) will bring the missing data into A, and whether it may affect report summary rates.
I cannot easily download the big databases. Something on command-line would be very suitable for this circumstance and transparent to users.


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

On Mon, Apr 20, 2015 at 5:08 PM, Bob Jolliffe bobjolliffe@gmail.com wrote:

If database B is restored off a backup of database A then it will

probably work with some caveats. Clearly they are not exactly the

same because you’ve got extra data in A which you are trying to push

into B. So it depends really on what else might have changed in A.

If you have made any changes to orgunits, datelements or periods then

that will cause at best a database integrity violation when you try to

import that way. At worst cause a very horrible mess.

Even though the process is fragile, there is a related use case where

one might need to do such a thing or something similar. If a country

system is hosted somewhere in the cloud (because they have poor or

expensive local bandwidth) then there is a real problem to solve about

getting data backups from the server back into the country.

In particular we’d like to be able to pull not all the datavalues but

only those which have been changed ie. do an incremental backup of the

datavalues table. The same could apply to events.

One possible approach might be to first do a metadata backup and

restore. And then pull the changed datavalues - though you would have

to do this through the api with uids or codes as the metadata backup

and restore would have ignored the database identifiers.

Yet another approach would be to take a reduced postgres backup

excluding the datavalues table, restore that and then try and pull in

the delta of datavalues table something like you describe.

Its a tough problem. Jason, what do you suggest? Downloading the

full postgres backup (with ephemeral tables like analytics excluded)

can be a serious challenge for many countries to do with regularity

and reliability. So how best to go about incremental backup and

restore of data?

On 20 April 2015 at 15:36, Muyepa A. muyepaa@gmail.com wrote:

I had thought of data import/export functionality, however this will

require two instances of tomcat to be running. (or one instance with one

database at a time, however there will be an associated down time).

Since upgrading to 2.18, when exported, the resultant download zip file says

invalid for all formats (xml, json, excel) and is only 2kb in size

regardless of the hierarchy I choose.

Since one database is copy of the other, i thought I could easily update

from the other as described.

On Mon, Apr 20, 2015 at 3:48 PM, Jason Pickering

jason.p.pickering@gmail.com wrote:

It may work, but in general, it will not because the internal IDs of

different DHIS2 databases are not the same. So, it might work, but it

requires both databases to be in essentially the same state. Better to use

the data import/export functionality if you can.

Regards,

Jason

On Mon, Apr 20, 2015 at 3:28 PM Muyepa A. muyepaa@gmail.com wrote:

Can I safely update DHIS2 database A with data from DHIS database B both

on the same PostgreSQL server by cross query. B is backup of A

From database B, I can retrieve and save the data that is required using:

psql B -c "\copy (select * from datavalue where periodid=‘xxxx’) TO

STDOUT" > /tmp/data.tsv;

And to restore:

psql A -c “\copy datavaue (x, y, z, …) FROM /tmp/data.tsv”

Is this recommended, and what else should be cross transferred.


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp


Knut Staring

Dept. of Informatics, University of Oslo

Norway: +4791880522

Skype: knutstar

http://dhis2.org

Hi Jason

yes DELETES are a problem alright.

Having a complete copy of the remote database would in general defeat
the purpose of the exercise :slight_smile: Which is to maintain such a copy with
increments rather that download the whole thing ...

A while back i thought about using postgres native replication feature
for this, but the problem is that the datamart/analytics process
breaks any sane process of following incremental updates. And there
is no way to do table-selective replication.

Of course this is one (of a number) of quite good reasons
architecturally to break the data collection database away from the
analytics database entirely, but that is not something that will
happen anytime soon. A simple datacollection database that grew (and
occasionally shrunk) incrementally would be quite easy to track.

A slightly roundabout route might be to maintain a separate database
on the server which didn't have any of the analytics stuff (ie. acting
as if it were just a data collection database, but back to
front/upside down). This database could track the data tables in the
production database using some of the diff stuff you talked of
earlier, or even putting triggers on the datavalue table - though that
could certainly impact the speed of large data updates. In fact
putting triggers on this table would probably just feed updates to a
"transaction" table in the same database (recording
delete,update,inserts). There are quite a few systems I can think of
where this would probably not be a big problem at all.

Whichever way it was accomplished, something like this would then be
more easily trackable from a remote end accross a
low/expensive/unreliable connection.

Currently I know at least one country system in the cloud where this
is a real issue. The system is running well but there is some
discomfort that they are not able to realistically pull nightly
snapshots back because of the size of the data. And given that 95%
(and growing) of the data is data that they already have it does seem
remarkably inefficient.

···

On 21 April 2015 at 07:21, Jason Pickering <jason.p.pickering@gmail.com> wrote:

Yeah maybe, or simply "NOT EXISTS" .

However, cross-database queries in postgres are not in general supported
without an extension, so you would need to deal with that, and obviously
have a completely integral copy of the remote database on the local machine.

Seems like a lot of moving parts, but it could be done. Whether or not it
would save you any bandwidth is another issue.

Anthony, how big is the database which you are trying to replicate, after
compression? Are you working over a very slow link and thus, you are trying
to decrease the size of what you need to download to the remote machine?

Regards,
Jason

On Tue, Apr 21, 2015 at 2:44 AM Knut Staring <knutst@gmail.com> wrote:

Couldn't two "left joins with null" give you the diff?

On Apr 21, 2015 12:36 AM, "Jason Pickering" <jason.p.pickering@gmail.com> >> wrote:

Hi there. I think the big challenge here is what is not there, as opposed
to what needs to be added. DHIS2 has no functionality for "soft deletes",
meaning when you mark a record with a column like "deletedon" with a
timestamp for deletion, but you actually do not remove it from the database.
This is a common problem with DXF syncronizations, as the upstream system
does not transmit DELETES. It only transmits INSERTS and UPDATES.

At first glance, in your case, it would seem to be pretty easy, since you
can use the "lastupdated" of the data value table to parse out values which
have been added since your last synchronization. The problem however is to
determine what has been DELETED from the data value table.Well, that is more
tricky, and would require you diff both data values tables to determine what
changes you need to transmit to your downstream system.

There are tools out there, to help determine the "diff" between two
databases, which I think in your case would require you to have copy of your
remote DB and then diff it against your local DB. You would then prepare a
diff, make the update to your local copy and then you remote copy.

I think that would really be the only real way of doing this, using a
tool like perhaps these
(http://stackoverflow.com/questions/4804779/how-to-check-difference-between-two-databases-in-p)
other other commercial alternatives to diff the two databases, and prepare a
series of INSERT and DELETE statements. Never attempted it myself, but it
might work.

Does not seem really that trivial. :frowning:

Regards,
Jason

On Mon, Apr 20, 2015 at 6:59 PM Knut Staring <knutst@gmail.com> wrote:

I think I would trust UIDs or Codes more than database IDs

On Mon, Apr 20, 2015 at 10:48 PM, Muyepa A. <muyepaa@gmail.com> wrote:

Indeed Database A has extra data not available in B while B has data
missing in A. No changes have been made to orgunits, dataelements, periods
or users. The delta is in data-values. I would really want to get views on
whether the operation described (all other parameters being equal) will
bring the missing data into A, and whether it may affect report summary
rates.
I cannot easily download the big databases. Something on command-line
would be very suitable for this circumstance and transparent to users.

On Mon, Apr 20, 2015 at 5:08 PM, Bob Jolliffe <bobjolliffe@gmail.com> >>>>> wrote:

If database B is restored off a backup of database A then it will
probably work with some caveats. Clearly they are not exactly the
same because you've got extra data in A which you are trying to push
into B. So it depends really on what else might have changed in A.

If you have made any changes to orgunits, datelements or periods then
that will cause at best a database integrity violation when you try to
import that way. At worst cause a very horrible mess.

Even though the process is fragile, there is a related use case where
one might need to do such a thing or something similar. If a country
system is hosted somewhere in the cloud (because they have poor or
expensive local bandwidth) then there is a real problem to solve about
getting data backups from the server back into the country.

In particular we'd like to be able to pull not all the datavalues but
only those which have been changed ie. do an incremental backup of the
datavalues table. The same could apply to events.

One possible approach might be to first do a metadata backup and
restore. And then pull the changed datavalues - though you would have
to do this through the api with uids or codes as the metadata backup
and restore would have ignored the database identifiers.

Yet another approach would be to take a reduced postgres backup
excluding the datavalues table, restore that and then try and pull in
the delta of datavalues table something like you describe.

Its a tough problem. Jason, what do you suggest? Downloading the
full postgres backup (with ephemeral tables like analytics excluded)
can be a serious challenge for many countries to do with regularity
and reliability. So how best to go about incremental backup and
restore of data?

On 20 April 2015 at 15:36, Muyepa A. <muyepaa@gmail.com> wrote:
> I had thought of data import/export functionality, however this
> will
> require two instances of tomcat to be running. (or one instance with
> one
> database at a time, however there will be an associated down time).
> Since upgrading to 2.18, when exported, the resultant download zip
> file says
> invalid for all formats (xml, json, excel) and is only 2kb in size
> regardless of the hierarchy I choose.
>
> Since one database is copy of the other, i thought I could easily
> update
> from the other as described.
>
> On Mon, Apr 20, 2015 at 3:48 PM, Jason Pickering >>>>>> > <jason.p.pickering@gmail.com> wrote:
>>
>> It may work, but in general, it will not because the internal IDs
>> of
>> different DHIS2 databases are not the same. So, it might work, but
>> it
>> requires both databases to be in essentially the same state. Better
>> to use
>> the data import/export functionality if you can.
>>
>> Regards,
>> Jason
>>
>>
>> On Mon, Apr 20, 2015 at 3:28 PM Muyepa A. <muyepaa@gmail.com> >>>>>> >> wrote:
>>>
>>> Can I safely update DHIS2 database A with data from DHIS database
>>> B both
>>> on the same PostgreSQL server by cross query. B is backup of A
>>>
>>> From database B, I can retrieve and save the data that is required
>>> using:
>>>
>>> psql B -c "\copy (select * from datavalue where periodid='xxxx')
>>> TO
>>> STDOUT" > /tmp/data.tsv;
>>>
>>>
>>> And to restore:
>>> psql A -c "\copy datavaue (x, y, z, ...) FROM /tmp/data.tsv"
>>>
>>>
>>> Is this recommended, and what else should be cross transferred.
>>>
>>>
>>>
>>> _______________________________________________
>>> Mailing list: https://launchpad.net/~dhis2-users
>>> Post to : dhis2-users@lists.launchpad.net
>>> Unsubscribe : https://launchpad.net/~dhis2-users
>>> More help : https://help.launchpad.net/ListHelp
>
>
>
> _______________________________________________
> Mailing list: https://launchpad.net/~dhis2-users
> Post to : dhis2-users@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~dhis2-users
> More help : https://help.launchpad.net/ListHelp
>

_______________________________________________
Mailing list: https://launchpad.net/~dhis2-users
Post to : dhis2-users@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-users
More help : https://help.launchpad.net/ListHelp

--
Knut Staring
Dept. of Informatics, University of Oslo
Norway: +4791880522
Skype: knutstar
http://dhis2.org
_______________________________________________
Mailing list: https://launchpad.net/~dhis2-users
Post to : dhis2-users@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-users
More help : https://help.launchpad.net/ListHelp

_______________________________________________
Mailing list: https://launchpad.net/~dhis2-users
Post to : dhis2-users@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-users
More help : https://help.launchpad.net/ListHelp