uuids

bobj · 28 September 2009 12:36

Thinking some more about unique identifiers and names (and how they are not the same thing) and still in the realm of multi dimensions, I recall having this discussion in Delhi workshop.

Currently we have with dataElements, Categories and a whole host of other stuff, a uniqueness constraint on the name. So we are effectively using name as an identifier. We also use name for display purposes which is quite a different use case. The important characteristics in the first case are uniqueness and preferably succintness. The latter case is to look meaningful and presentable in a form display or report. One of the downsides of using names as identifiers as we do is that we can’t have two categories with the same name. I have seen recently that there are somtimes some really compelling reasons you might want to do that.

The classic case which seems to come up is the question of “Age”. Whereas you might want to have an “Age” category with options {“0-5”,“5-15”,“over 15”} you might equally want to have an “Age” category with options {“under18”, “18 and over”}. What seems to then happen is that implementors tie themselves up in knots trying to find new and imaginative ways of naming the “Age” categories so that they are unique from one another when there is really no compelling reason to this. With a structure like:

where id is just the internal key and uid is a unique identifier (which could be a uuid but not necessarily) then there is really no reason not to have any number of categories with the name “Age” where each one has different sets of category options.

There are also cases where you might want to have a different name for the same uniquely identified element. This is frequently the case with internationalization.

In my ideal scenario, I don’t like uuids because they are too long and carry no meaning, I’d rather see the unique identifier being something like a code AGE_STI or AGE_ART_COHORT (or even AGE_1, AGE_2, AGE_3 …) rather than a uuid. To enforce global uniquensess they would have to be demarcated in a namespace such as the URI scheme I have mentioned below. Of course the downside here is that some effort now needs to be put into defining unique identifiers but one could create simple guidelines.

Either way, for the moment I’m sticking with uuids because we already have them elsewhere and because other systems we might want to talk to (openMRS) also use them. But I’d like folk to give it some thought. In particular some thought to relaxing the uniqueness requirement on names where we also have a unique identifier. Where we have such an identifier we should use that to compare and disambiguate between entities. That in most cases the names will also be identical should be treated as incidental.

Regards
Bob

···

2009/9/28 Bob Jolliffe bobjolliffe@gmail.com

Hi Lars

Much as I hate uuids I am now attaching them to DataElementCategories. This is the process I have followed:

I have modified DataElementCategory.java in API to provide for the string member and getters and setter.

Modified addDataElementCategory() of defaultDataElementCategoryService.java to generate uuid as per the equivalent method in defaultDataElementService.

Added property in DataElementCategory.hbm.xml

So far so good. I see that the DataElementCategory uses generic store so nothing more to do there. Built and fired up and everything works fine. New field is created on DataElementCategory table (nice!). New categories are now created with uuids.

Obviously what I now need to do is to have an upgrade script to attach uuids to existing categories. Where is the best place to do this? Presumably in one of the (12) startup routines. I don’t want to add more fat to the start up but I guess this is unavoidable. Please suggest the best place for this and I’ll add it. Meanwhile I’ll commit the above.

I do need to add child element to the dxf representation of the category.

And then there is all the stuff to do with comparisons (which is I suppose the point of having the uuids). Probably should modify the isEqual() method to take uuid into account. Looking at the GenericNameStore I think we should either create a GenericUUIDStore or add an extra method to the former to retrieve by UUID. Currently, we enforce a requirement of uniqueness on the name which should actually make UUIDs redundant. We don’t need two unique fields to identify. If the name is unique anyway we could compose a URI string like for example http://dhis2.org/names/TZ/dataElementCategory/Sex .

Much though I’d like to do that I think we would have to enforce better naming conventions to make it work well. Currently we have some quite unwieldy category names which don’t make very nice URIs. Perhaps enforcing a camelcase or underscore convention through the user interface might work. Anyway, for the moment UUIDs it is.

Cheers
Bob

Jo_Storset · 28 September 2009 13:53

Basically I agree with you, just one minor opinion

···

Den 28. sep. 2009 kl. 14.36 skrev Bob Jolliffe:

What seems to then happen is that implementors tie themselves up in knots trying to find new and imaginative ways of naming the "Age" categories so that they are unique from one another when there is really no compelling reason to this. With a structure like:

<category id="23" name="Age" uid="4454545656456477756" />

where id is just the internal key and uid is a unique identifier (which could be a uuid but not necessarily) then there is really no reason not to have any number of categories with the name "Age" where each one has different sets of category options.

The problem with non-unique names is that you can easily get into unintended situations where the name is displayed in a context where the difference between "versions" is important for the person using the system. I don´t really know dhis well enough, but I have seen this kind of problem other places.

Jo

jason · 28 September 2009 16:30

If we are voting here, I would say that any naming convention should
be avoided at all costs, as they tend to be rather arbitrary, and
easily manipulated.

I am also much against the unique name restriction currently imposed
in the data model. We have situations where there identially named
facilities within the same organizational unit (think of McDonalds
here). I think DHIS is even more restrictive that this example. I grew
up in Pickens County, Georgia. There is no reason why there cannot be,
or should not be identically named organization units. DHIS 1.4 gets
around this by the use of naming conventions, which causes me no end
of headaches when I have to produce a report. I suppose the same could
apply to other concepts, such as "Age". They should be able to be
distinguished through other means.

I am not sure if a UUID is the best way for this, as it does have
potential performance implications, but it is certainly preferably to
any naming convention that may be enforced through the UI and not the
DB.

Regards,
Jason

···

On Mon, Sep 28, 2009 at 3:53 PM, Jo Størset <storset@gmail.com> wrote:

Basically I agree with you, just one minor opinion

Den 28. sep. 2009 kl. 14.36 skrev Bob Jolliffe:

What seems to then happen is that implementors tie themselves up in knots
trying to find new and imaginative ways of naming the "Age" categories so
that they are unique from one another when there is really no compelling
reason to this. With a structure like:

<category id="23" name="Age" uid="4454545656456477756" />

where id is just the internal key and uid is a unique identifier (which
could be a uuid but not necessarily) then there is really no reason not to
have any number of categories with the name "Age" where each one has
different sets of category options.

The problem with non-unique names is that you can easily get into unintended
situations where the name is displayed in a context where the difference
between "versions" is important for the person using the system. I don´t
really know dhis well enough, but I have seen this kind of problem other
places.

Jo
_______________________________________________
Mailing list: https://launchpad.net/~dhis2-devs
Post to : dhis2-devs@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-devs
More help : https://help.launchpad.net/ListHelp

bobj · 28 September 2009 16:45

Hi Jason

I don’t think we’re voting yet For the moment we have uuids and we also have a uniqueness constraint on names. Changing that in a hurry would have a lot of rippled consequences - even some good ones, I here you - which are nobody’s priority to address at the moment. But I’d just like to get an idea of what we would ideally want first. And keep hammering my hobby horse that names and identifiers are different concepts for good reason.

Generally the choice of globally unique identifiers come down to a choice of either a form of uuid or a form of uri (which can be either a url or a urn - there are religious wars on that). There are many pros and cons of the three approaches. In general they all suffer from the performance impact of long strings. Which is why I favour the use of a short string or code internally, which can be concatenated to a longer namespaced uri when we have to interoperate between systems. But that solution is not perfect either.

Cheers
Bob

PS. I would have though of Georgia as more of a KFC county than a McDonalds one. Similar problem mind you

···

2009/9/28 Jason Pickering jason.p.pickering@gmail.com

If we are voting here, I would say that any naming convention should

be avoided at all costs, as they tend to be rather arbitrary, and

easily manipulated.

I am also much against the unique name restriction currently imposed

in the data model. We have situations where there identially named

facilities within the same organizational unit (think of McDonalds

here). I think DHIS is even more restrictive that this example. I grew

up in Pickens County, Georgia. There is no reason why there cannot be,

or should not be identically named organization units. DHIS 1.4 gets

around this by the use of naming conventions, which causes me no end

of headaches when I have to produce a report. I suppose the same could

apply to other concepts, such as “Age”. They should be able to be

distinguished through other means.

I am not sure if a UUID is the best way for this, as it does have

potential performance implications, but it is certainly preferably to

any naming convention that may be enforced through the UI and not the

DB.

Regards,

Jason

On Mon, Sep 28, 2009 at 3:53 PM, Jo Størset storset@gmail.com wrote:

Basically I agree with you, just one minor opinion

Den 28. sep. 2009 kl. 14.36 skrev Bob Jolliffe:

What seems to then happen is that implementors tie themselves up in knots

trying to find new and imaginative ways of naming the “Age” categories so

that they are unique from one another when there is really no compelling

reason to this. With a structure like:

where id is just the internal key and uid is a unique identifier (which

could be a uuid but not necessarily) then there is really no reason not to

have any number of categories with the name “Age” where each one has

different sets of category options.

The problem with non-unique names is that you can easily get into unintended

situations where the name is displayed in a context where the difference

between “versions” is important for the person using the system. I don´t

really know dhis well enough, but I have seen this kind of problem other

places.

Jo

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

jason · 28 September 2009 17:01

OK, good to hear. As long as it is not a naming convention but I will
go ahead and vote early just in case.

There are multiple Pickens Counties (one in Alabama, Georgia and South
Caroina respectively), and you are right, we have multiple KFCs in
each. If I want to record the total number of chicken dinners sold in
each, I would need to invent some naming convention to get around the
data model. This is really a separate issue I guess, but my point is
that the data model should capture reality, rather than imposing an
alternate view of it. Just say no to naming conventions (and KFC!).

Regards,
Jason

In general they all suffer from the performance impact of

···

long strings. Which is why I favour the use of a short string or code
internally, which can be concatenated to a longer namespaced uri when we
have to interoperate between systems. But that solution is not perfect
either.

Cheers
Bob

PS. I would have though of Georgia as more of a KFC county than a McDonalds
one. Similar problem mind you

2009/9/28 Jason Pickering <jason.p.pickering@gmail.com>

If we are voting here, I would say that any naming convention should
be avoided costs, as they tend to be rather arbitrary, and
easily manipulated.

I am also much against the unique name restriction currently imposed
in the data model. We have situations where there identially named
facilities within the same organizational unit (think of McDonalds
here). I think DHIS is even more restrictive that this example. I grew
up in Pickens County, Georgia. There is no reason why there cannot be,
or should not be identically named organization units. DHIS 1.4 gets
around this by the use of naming conventions, which causes me no end
of headaches when I have to produce a report. I suppose the same could
apply to other concepts, such as "Age". They should be able to be
distinguished through other means.

I am not sure if a UUID is the best way for this, as it does have
potential performance implications, but it is certainly preferably to
any naming convention that may be enforced through the UI and not the
DB.

Regards,
Jason

On Mon, Sep 28, 2009 at 3:53 PM, Jo Størset <storset@gmail.com> wrote:
> Basically I agree with you, just one minor opinion
>
> Den 28. sep. 2009 kl. 14.36 skrev Bob Jolliffe:
>
>> What seems to then happen is that implementors tie themselves up in
>> knots
>> trying to find new and imaginative ways of naming the "Age" categories
>> so
>> that they are unique from one another when there is really no
>> compelling
>> reason to this. With a structure like:
>>
>> <category id="23" name="Age" uid="4454545656456477756" />
>>
>> where id is just the internal key and uid is a unique identifier (which
>> could be a uuid but not necessarily) then there is really no reason not
>> to
>> have any number of categories with the name "Age" where each one has
>> different sets of category options.
>
> The problem with non-unique names is that you can easily get into
> unintended
> situations where the name is displayed in a context where the difference
> between "versions" is important for the person using the system. I don´t
> really know dhis well enough, but I have seen this kind of problem other
> places.
>
> Jo
> _______________________________________________
> Mailing list: https://launchpad.net/~dhis2-devs
> Post to : dhis2-devs@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~dhis2-devs
> More help : https://help.launchpad.net/ListHelp
>