Need a group for UNIQUE attributes for deduplication

With Tracker, there is a high probability of getting duplicates (could be exact duplicates, or misspellings of name for example).

To deal with this, it would be good to be able to designate SOME of the attributes of each person (or rather trackedentityinstance) as the ones really identifying a person or thing, e.g. Firstname, Lastname, Age, Address. So we need a way to designate a subset of all the attributes as input for a deduplication process, which could start by just finding exact matches, and subsequently be refined with introducing different kinds of fuzzy logic etc.

And then later, we could build a GUI for human review and merger of clear duplicates (which can also be defined). But I suppose we initially need an addition to the model. So this is like the UNIQUNESS property, but not for just ONE attribute, but rather for a group/collection of attributes.

So, it will be similar to a compound key in SQL: http://en.wikipedia.org/wiki/Compound_key

Knut

···


Knut Staring

Dept. of Informatics, University of Oslo

Norway: +4791880522

Skype: knutstar

http://dhis2.org

Sorry, I said “(which can also be defined)”, but I meant “which can also be gradually REFINED in future releases”

···

On Mon, Mar 16, 2015 at 2:13 PM, Knut Staring knutst@gmail.com wrote:

With Tracker, there is a high probability of getting duplicates (could be exact duplicates, or misspellings of name for example).

To deal with this, it would be good to be able to designate SOME of the attributes of each person (or rather trackedentityinstance) as the ones really identifying a person or thing, e.g. Firstname, Lastname, Age, Address. So we need a way to designate a subset of all the attributes as input for a deduplication process, which could start by just finding exact matches, and subsequently be refined with introducing different kinds of fuzzy logic etc.

And then later, we could build a GUI for human review and merger of clear duplicates (which can also be defined). But I suppose we initially need an addition to the model. So this is like the UNIQUNESS property, but not for just ONE attribute, but rather for a group/collection of attributes.

So, it will be similar to a compound key in SQL: http://en.wikipedia.org/wiki/Compound_key

Knut

Knut Staring

Dept. of Informatics, University of Oslo

Norway: +4791880522

Skype: knutstar

http://dhis2.org

Knut Staring

Dept. of Informatics, University of Oslo

Norway: +4791880522

Skype: knutstar

http://dhis2.org