With Tracker, there is a high probability of getting duplicates (could be exact duplicates, or misspellings of name for example).
To deal with this, it would be good to be able to designate SOME of the attributes of each person (or rather trackedentityinstance) as the ones really identifying a person or thing, e.g. Firstname, Lastname, Age, Address. So we need a way to designate a subset of all the attributes as input for a deduplication process, which could start by just finding exact matches, and subsequently be refined with introducing different kinds of fuzzy logic etc.
And then later, we could build a GUI for human review and merger of clear duplicates (which can also be defined). But I suppose we initially need an addition to the model. So this is like the UNIQUNESS property, but not for just ONE attribute, but rather for a group/collection of attributes.
So, it will be similar to a compound key in SQL: http://en.wikipedia.org/wiki/Compound_key
Knut
···
–
Knut Staring
Dept. of Informatics, University of Oslo
Norway: +4791880522
Skype: knutstar
http://dhis2.org
Sorry, I said “(which can also be defined)”, but I meant “which can also be gradually REFINED in future releases”
···
On Mon, Mar 16, 2015 at 2:13 PM, Knut Staring knutst@gmail.com wrote:
With Tracker, there is a high probability of getting duplicates (could be exact duplicates, or misspellings of name for example).
To deal with this, it would be good to be able to designate SOME of the attributes of each person (or rather trackedentityinstance) as the ones really identifying a person or thing, e.g. Firstname, Lastname, Age, Address. So we need a way to designate a subset of all the attributes as input for a deduplication process, which could start by just finding exact matches, and subsequently be refined with introducing different kinds of fuzzy logic etc.
And then later, we could build a GUI for human review and merger of clear duplicates (which can also be defined). But I suppose we initially need an addition to the model. So this is like the UNIQUNESS property, but not for just ONE attribute, but rather for a group/collection of attributes.
So, it will be similar to a compound key in SQL: http://en.wikipedia.org/wiki/Compound_key
Knut
Knut Staring
Dept. of Informatics, University of Oslo
Norway: +4791880522
Skype: knutstar
http://dhis2.org
–
–
Knut Staring
Dept. of Informatics, University of Oslo
Norway: +4791880522
Skype: knutstar
http://dhis2.org