Conflicts merging databases

Hi guys,

I have a DHIS db (in MySQL) into which a lot of data have been manually imported from a different system.

Now I want to move (parts of) these data into my repository, but when importing the DXF export, I get a PSQLException like this: ERROR: duplicate key value violates unique constraint “dataelementcategoryoption_name_key” Detail: Key (name) = (Male) already exists.

Basically, the dataelements in the old HMIS do not (yet) exist in the repository. I hoped they (and the datasets they belong to) would just be added, and then the data imported.

Any good tips on how to handle this situation? Should I do a metadata export/import first? Will that help? Do I have to rename the option in the source HMIS db before exporting?

Knut

I suspect, but at not 100% sure this may because of the dodgy workaround some people (including yours truly) have used to get around the restriction of the same category option in different categories.

“Male” and "Male " which appear in two different categories will be accepted by DHIS2, but during metadata export, I think the "Male " will be trimmed to “Male”, and when this is imported into the target system, this error will occur.

I have not 100% confirmed that this is the case, but I have seen similar errors when exporting DXF and attempting to import it into a cloned database.

Regards,

Jason

···

On Fri, Aug 19, 2011 at 1:40 PM, Knut Staring knutst@gmail.com wrote:

Hi guys,

I have a DHIS db (uin MySQL) into which a lot of data have been manually imported from a different system.

Now I want to move (parts of) these data into my repository, but when importing the DXF export, I get a PSQLException like this: ERROR: duplicate key value violates unique constraint “dataelementcategoryoption_name_key” Detail: Key (name) = (Male) already exists.

Basically, the dataelements in the old HMIS do not (yet) exist in the repository. I hoped they (and the datasets they belong to) would just be added, and then the data imported.

Any good tips on how to handle this situation? Should I do a metadata export/import first? Will that help? Do I have to rename the option in the source HMIS db before exporting?

Knut


Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

I agree with Jason. Run this sql to check:

select categoryoptionid, (
select count(categoryoptionid) from categories_categoryoptions where
categoryoptionid=cc.categoryoptionid )
as categorycount from categories_categoryoptions as cc order by
categorycount desc;

Lars

I think in fact this is caused by Hibernate being confused - since a
category option is member of more than one category Hibernate produces
that category option twice for some reason when asking for all
category options (since the association is one-to-many).

Yes, this query will give you category options, which I think should not be possible, but does seem to happen.

The specific issue I was referring to can be checked with this…

SELECT trim(name), count(*) from dataelementcategoryoption

GROUP BY trim(name)

HAVING COUNT(*) >1

Again, I have not checked the code to see if the category is actually trimmed or not, but I suspect it may be. Maybe Lars can comment as this workaround seems to be popping up in various places (Nigeria, Gambia that I know of).

Regards,

Jason

···

2011/8/19 Lars Helge Øverland larshelge@gmail.com

I agree with Jason. Run this sql to check:

select categoryoptionid, (

select count(categoryoptionid) from categories_categoryoptions where

categoryoptionid=cc.categoryoptionid )

as categorycount from categories_categoryoptions as cc order by

categorycount desc;

Lars

It is probably remains from the days before we changed the category
model - users who have not updated their databases to reflect the
change from many-to-many to one-to-many between category and category
options.

Yes I've seen this on a number of databases as well - causing trouble
with mydatamart. Do you think we should "discourage" this workaround
by trimming on save? I can understand the frustration that might have
lead people to do this, but it is better to use Male_ or _Male or the
like than just adding whitespace.

···

2011/8/19 Lars Helge Øverland <larshelge@gmail.com>:

It is probably remains from the days before we changed the category
model - users who have not updated their databases to reflect the
change from many-to-many to one-to-many between category and category
options.

_______________________________________________
Mailing list: DHIS 2 developers in Launchpad
Post to : dhis2-devs@lists.launchpad.net
Unsubscribe : DHIS 2 developers in Launchpad
More help : ListHelp - Launchpad Help

Good idea to trim on save, I support that.

This should be the done everywhere. I know that e.g. copy pasting from excel into DHIS can cause unwanted trailing spaces as well, I have seen many orgunits with trailing whitespaces.

Ola

···

2011/8/19 Bob Jolliffe bobjolliffe@gmail.com

Yes I’ve seen this on a number of databases as well - causing trouble

with mydatamart. Do you think we should “discourage” this workaround

by trimming on save? I can understand the frustration that might have

lead people to do this, but it is better to use Male_ or _Male or the

like than just adding whitespace.


2011/8/19 Lars Helge Øverland larshelge@gmail.com:

It is probably remains from the days before we changed the category

model - users who have not updated their databases to reflect the

change from many-to-many to one-to-many between category and category

options.


Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp


Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Yes I've seen this on a number of databases as well - causing trouble
with mydatamart. Do you think we should "discourage" this workaround
by trimming on save? I can understand the frustration that might have
lead people to do this, but it is better to use Male_ or _Male or the
like than just adding whitespace.

Good idea to trim on save, I support that.

Well there would be quite a few fields to trim and a few places to do
it. It could be done at javascript level, but probably better in the
model. Eg changing AbstractIdentifiableObject.java setter to:

public void setName( String name )
    {
        this.name = name.trim();
    }

Would catch a whole lot of critical objects. Something similar on
"code" for orgunits would fix up a bundle of apparent mismatches with
the master facility list as well. Some of the others (shortname et
al) are less critical and could be done progressively.

Very easy to do, but my worry is that given that there appears to be
quite a few installations who have been using this "workaround" (as
well as accidental copy-n-pastes) it could create some unexpected
grief. On the other hand, nipping it in the bud now is probably the
right thing to do.

Maybe we would need a transitional startup task to replace existing
offending leading/trailing whitespace with '_'?

···

On 19 August 2011 13:49, Ola Hodne Titlestad <olati@ifi.uio.no> wrote:

2011/8/19 Bob Jolliffe <bobjolliffe@gmail.com>

This should be the done everywhere. I know that e.g. copy pasting from excel
into DHIS can cause unwanted trailing spaces as well, I have seen many
orgunits with trailing whitespaces.
Ola
-----

2011/8/19 Lars Helge Øverland <larshelge@gmail.com>:
> It is probably remains from the days before we changed the category
> model - users who have not updated their databases to reflect the
> change from many-to-many to one-to-many between category and category
> options.
>
> _______________________________________________
> Mailing list: DHIS 2 developers in Launchpad
> Post to : dhis2-devs@lists.launchpad.net
> Unsubscribe : DHIS 2 developers in Launchpad
> More help : ListHelp - Launchpad Help
>

_______________________________________________
Mailing list: DHIS 2 developers in Launchpad
Post to : dhis2-devs@lists.launchpad.net
Unsubscribe : DHIS 2 developers in Launchpad
More help : ListHelp - Launchpad Help

Yes I've seen this on a number of databases as well - causing trouble
with mydatamart. Do you think we should "discourage" this workaround
by trimming on save? I can understand the frustration that might have
lead people to do this, but it is better to use Male_ or _Male or the
like than just adding whitespace.

Good idea to trim on save, I support that.

Well there would be quite a few fields to trim and a few places to do
it. It could be done at javascript level, but probably better in the
model. Eg changing AbstractIdentifiableObject.java setter to:

public void setName( String name )
{
this.name = name.trim();
}

Would catch a whole lot of critical objects. Something similar on
"code" for orgunits would fix up a bundle of apparent mismatches with
the master facility list as well.

Sorry that is a Kenya-specific reference for those who were wondering :slight_smile:

···

On 19 August 2011 14:14, Bob Jolliffe <bobjolliffe@gmail.com> wrote:

On 19 August 2011 13:49, Ola Hodne Titlestad <olati@ifi.uio.no> wrote:

2011/8/19 Bob Jolliffe <bobjolliffe@gmail.com>

Some of the others (shortname et
al) are less critical and could be done progressively.

Very easy to do, but my worry is that given that there appears to be
quite a few installations who have been using this "workaround" (as
well as accidental copy-n-pastes) it could create some unexpected
grief. On the other hand, nipping it in the bud now is probably the
right thing to do.

Maybe we would need a transitional startup task to replace existing
offending leading/trailing whitespace with '_'?

This should be the done everywhere. I know that e.g. copy pasting from excel
into DHIS can cause unwanted trailing spaces as well, I have seen many
orgunits with trailing whitespaces.
Ola
-----

2011/8/19 Lars Helge Øverland <larshelge@gmail.com>:
> It is probably remains from the days before we changed the category
> model - users who have not updated their databases to reflect the
> change from many-to-many to one-to-many between category and category
> options.
>
> _______________________________________________
> Mailing list: DHIS 2 developers in Launchpad
> Post to : dhis2-devs@lists.launchpad.net
> Unsubscribe : DHIS 2 developers in Launchpad
> More help : ListHelp - Launchpad Help
>

_______________________________________________
Mailing list: DHIS 2 developers in Launchpad
Post to : dhis2-devs@lists.launchpad.net
Unsubscribe : DHIS 2 developers in Launchpad
More help : ListHelp - Launchpad Help

The exact term may be somewhat Kenya specific, but it is generic enough that I think it is quite suitable for our general discussion and could also be used in the manual (I think it has been adopted by WHO for their work on Service Availability and Readiness Assessment (*SARA), aka SAM:

http://www.who.int/healthinfo/systems/serviceavailabilitymapping/en/

A largely equivalent term would be facility registry. I think we should introduce some of this vocabulary into our implementation manual (if it’s not there already), to link with the overall Enterprise Architecture thinking (while keeping things concrete and understandable and mostly avoiding mystifying or technical/academic jargon such as the term “Enterprise Architecture” itself).

Evidently, in many countries DHIS2 will serve as the de facto registry for both facilities and aggregate data elements/indicators, but this is not the case everywhere, and metadata synchronization both between DHIS2 instances (as highlighted by the ongoing discussion on offline installs) and with other systems through third party registries will be crucial. You of course know this best of all Bob, but I still thought it worth highlighting on this list in light of my current work.

Knut*

···

On Fri, Aug 19, 2011 at 3:16 PM, Bob Jolliffe bobjolliffe@gmail.com wrote:

On 19 August 2011 14:14, Bob Jolliffe bobjolliffe@gmail.com wrote:

On 19 August 2011 13:49, Ola Hodne Titlestad olati@ifi.uio.no wrote:

2011/8/19 Bob Jolliffe bobjolliffe@gmail.com

Yes I’ve seen this on a number of databases as well - causing trouble

with mydatamart. Do you think we should “discourage” this workaround

by trimming on save? I can understand the frustration that might have

lead people to do this, but it is better to use Male_ or _Male or the

like than just adding whitespace.

Good idea to trim on save, I support that.

Well there would be quite a few fields to trim and a few places to do

it. It could be done at javascript level, but probably better in the

model. Eg changing AbstractIdentifiableObject.java setter to:

public void setName( String name )

{

   [this.name](http://this.name) = name.trim();

}

Would catch a whole lot of critical objects. Something similar on

“code” for orgunits would fix up a bundle of apparent mismatches with

the master facility list as well.

Sorry that is a Kenya-specific reference for those who were wondering :slight_smile:

Trimming is certainly the least that we should do. I have recently
come up against a problem with some sort of tick marks (UNICODE 0x92)
which look like apostrophes (which are bad enough) but are not. These
seem to get inserted from copy/paste from Excel or Word who
conveniently changes normal apostrophes to these weird characters
which are not, but may appear to the user to be.

Trim is a good start, but we still need think about more strict
(regex) validation.

Regards,
jason

···

On Fri, Aug 19, 2011 at 3:35 PM, Knut Staring <knutst@gmail.com> wrote:

On Fri, Aug 19, 2011 at 3:16 PM, Bob Jolliffe <bobjolliffe@gmail.com> wrote:

On 19 August 2011 14:14, Bob Jolliffe <bobjolliffe@gmail.com> wrote:
> On 19 August 2011 13:49, Ola Hodne Titlestad <olati@ifi.uio.no> wrote:
>> 2011/8/19 Bob Jolliffe <bobjolliffe@gmail.com>
>>>
>>> Yes I've seen this on a number of databases as well - causing trouble
>>> with mydatamart. Do you think we should "discourage" this workaround
>>> by trimming on save? I can understand the frustration that might have
>>> lead people to do this, but it is better to use Male_ or _Male or the
>>> like than just adding whitespace.
>>>
>>
>> Good idea to trim on save, I support that.
>
> Well there would be quite a few fields to trim and a few places to do
> it. It could be done at javascript level, but probably better in the
> model. Eg changing AbstractIdentifiableObject.java setter to:
>
> public void setName( String name )
> {
> this.name = name.trim();
> }
>
> Would catch a whole lot of critical objects. Something similar on
> "code" for orgunits would fix up a bundle of apparent mismatches with
> the master facility list as well.

Sorry that is a Kenya-specific reference for those who were wondering :slight_smile:

The exact term may be somewhat Kenya specific, but it is generic enough that I think it is quite suitable for our general discussion and could also be used in the manual (I think it has been adopted by WHO for their work on Service Availability and Readiness Assessment (SARA), aka SAM:

http://www.who.int/healthinfo/systems/serviceavailabilitymapping/en/

A largely equivalent term would be facility registry. I think we should introduce some of this vocabulary into our implementation manual (if it's not there already), to link with the overall Enterprise Architecture thinking (while keeping things concrete and understandable and mostly avoiding mystifying or technical/academic jargon such as the term "Enterprise Architecture" itself).

Evidently, in many countries DHIS2 will serve as the de facto registry for both facilities and aggregate data elements/indicators, but this is not the case everywhere, and metadata synchronization both between DHIS2 instances (as highlighted by the ongoing discussion on offline installs) and with other systems through third party registries will be crucial. You of course know this best of all Bob, but I still thought it worth highlighting on this list in light of my current work.

Knut

_______________________________________________
Mailing list: DHIS 2 developers in Launchpad
Post to : dhis2-devs@lists.launchpad.net
Unsubscribe : DHIS 2 developers in Launchpad
More help : ListHelp - Launchpad Help

Trimming is certainly the least that we should do. I have recently
come up against a problem with some sort of tick marks (UNICODE 0x92)

which look like apostrophes (which are bad enough) but are not. These
seem to get inserted from copy/paste from Excel or Word who
conveniently changes normal apostrophes to these weird characters
which are not, but may appear to the user to be.

Trim is a good start, but we still need think about more strict
(regex) validation.

Regards,
jason

Yes I’ve seen this on a number of databases as well - causing trouble

with mydatamart. Do you think we should “discourage” this workaround
by trimming on save? I can understand the frustration that might have
lead people to do this, but it is better to use Male_ or _Male or the

like than just adding whitespace.

Good idea to trim on save, I support that.

Well there would be quite a few fields to trim and a few places to do

it. It could be done at javascript level, but probably better in the
model. Eg changing AbstractIdentifiableObject.java setter to:

public void setName( String name )

{
this.name = name.trim();
}

Would catch a whole lot of critical objects. Something similar on

“code” for orgunits would fix up a bundle of apparent mismatches with
the master facility list as well.

Sorry that is a Kenya-specific reference for those who were wondering :slight_smile:

The exact term may be somewhat Kenya specific, but it is generic enough that I think it is quite suitable for our general discussion and could also be used in the manual (I think it has been adopted by WHO for their work on Service Availability and Readiness Assessment (SARA), aka SAM:

http://www.who.int/healthinfo/systems/serviceavailabilitymapping/en/

A largely equivalent term would be facility registry. I think we should introduce some of this vocabulary into our implementation manual (if it’s not there already), to link with the overall Enterprise Architecture thinking (while keeping things concrete and understandable and mostly avoiding mystifying or technical/academic jargon such as the term “Enterprise Architecture” itself).

Evidently, in many countries DHIS2 will serve as the de facto registry for both facilities and aggregate data elements/indicators, but this is not the case everywhere, and metadata synchronization both between DHIS2 instances (as highlighted by the ongoing discussion on offline installs) and with other systems through third party registries will be crucial. You of course know this best of all Bob, but I still thought it worth highlighting on this list in light of my current work.

It worth a lot Knut. Your microscopic ideas must be implemented in existing DHIS2. There are so many such commonly used terms in various health systems like facility, beneficiary, Upazila, * etc should be included in application to make it look and feel different than other Enterprise Application. And i am pretty much sure there so many people on this list should be agreed on it.*

···

On Fri, Aug 19, 2011 at 7:28 PM, Jason Pickering jason.p.pickering@gmail.com wrote:

On Fri, Aug 19, 2011 at 3:35 PM, Knut Staring knutst@gmail.com wrote:

On Fri, Aug 19, 2011 at 3:16 PM, Bob Jolliffe bobjolliffe@gmail.com wrote:

On 19 August 2011 14:14, Bob Jolliffe bobjolliffe@gmail.com wrote:

On 19 August 2011 13:49, Ola Hodne Titlestad olati@ifi.uio.no wrote:

2011/8/19 Bob Jolliffe bobjolliffe@gmail.com

Knut


Mailing list: https://launchpad.net/~dhis2-devs
Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs
More help : https://help.launchpad.net/ListHelp


Mailing list: https://launchpad.net/~dhis2-devs
Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs
More help : https://help.launchpad.net/ListHelp


Regards,
Brajesh

Trimming is certainly the least that we should do. I have recently
come up against a problem with some sort of tick marks (UNICODE 0x92)
which look like apostrophes (which are bad enough) but are not. These
seem to get inserted from copy/paste from Excel or Word who
conveniently changes normal apostrophes to these weird characters
which are not, but may appear to the user to be.

Trim is a good start, but we still need think about more strict
(regex) validation.

Agree. But I think there is a subtle difference. Regarding
leading/trailing whitespace I think its ok to just trim and save.
Regarding dodgy characters within the string I don't think we can just
silently deal with them. At the level of the setter in the model an
exception would need to be thrown and handled somewhere.

The general logic in our model mostly presupposes that all data is
good and this sort of validation is handled in user interface, which I
think trips us up from time to time all over the place. Lars has
indicated recently that we need to do a due diligence of our model in
a more general sense and I would see this sort of validation as part
of that exercise of building robustness into the model by making it
inherently coherent and reactive (throws exception) to attempts do
evil upon it.

Bob

···

On 19 August 2011 14:58, Jason Pickering <jason.p.pickering@gmail.com> wrote:

Regards,
jason

On Fri, Aug 19, 2011 at 3:35 PM, Knut Staring <knutst@gmail.com> wrote:

On Fri, Aug 19, 2011 at 3:16 PM, Bob Jolliffe <bobjolliffe@gmail.com> wrote:

On 19 August 2011 14:14, Bob Jolliffe <bobjolliffe@gmail.com> wrote:
> On 19 August 2011 13:49, Ola Hodne Titlestad <olati@ifi.uio.no> wrote:
>> 2011/8/19 Bob Jolliffe <bobjolliffe@gmail.com>
>>>
>>> Yes I've seen this on a number of databases as well - causing trouble
>>> with mydatamart. Do you think we should "discourage" this workaround
>>> by trimming on save? I can understand the frustration that might have
>>> lead people to do this, but it is better to use Male_ or _Male or the
>>> like than just adding whitespace.
>>>
>>
>> Good idea to trim on save, I support that.
>
> Well there would be quite a few fields to trim and a few places to do
> it. It could be done at javascript level, but probably better in the
> model. Eg changing AbstractIdentifiableObject.java setter to:
>
> public void setName( String name )
> {
> this.name = name.trim();
> }
>
> Would catch a whole lot of critical objects. Something similar on
> "code" for orgunits would fix up a bundle of apparent mismatches with
> the master facility list as well.

Sorry that is a Kenya-specific reference for those who were wondering :slight_smile:

The exact term may be somewhat Kenya specific, but it is generic enough that I think it is quite suitable for our general discussion and could also be used in the manual (I think it has been adopted by WHO for their work on Service Availability and Readiness Assessment (SARA), aka SAM:

http://www.who.int/healthinfo/systems/serviceavailabilitymapping/en/

A largely equivalent term would be facility registry. I think we should introduce some of this vocabulary into our implementation manual (if it's not there already), to link with the overall Enterprise Architecture thinking (while keeping things concrete and understandable and mostly avoiding mystifying or technical/academic jargon such as the term "Enterprise Architecture" itself).

Evidently, in many countries DHIS2 will serve as the de facto registry for both facilities and aggregate data elements/indicators, but this is not the case everywhere, and metadata synchronization both between DHIS2 instances (as highlighted by the ongoing discussion on offline installs) and with other systems through third party registries will be crucial. You of course know this best of all Bob, but I still thought it worth highlighting on this list in light of my current work.

Knut

_______________________________________________
Mailing list: DHIS 2 developers in Launchpad
Post to : dhis2-devs@lists.launchpad.net
Unsubscribe : DHIS 2 developers in Launchpad
More help : ListHelp - Launchpad Help