Regular expressions in data validation rules

https://blueprints.launchpad.net/dhis2/+spec/regex-validation

I have updated the blueprint on regular expression use in data
validation rules. This would really make my life (and I suspect
others) lives a lot easier, as long as we are using naming
conventions, lets at least enforce them somehow.

For discussion.

Jason

Hi all,

I’ve no idea about using RegEx for validating data in DHIS2. Just a small comment, I am also using this many times so my feeling on this is not easy but not too difficult when applying RegEx in your coding, ie, javascript and java also.

With RegEx, we can easy controlling any thing that we want to force the user for entering data (text, number) or something else (a file name is an example).
Let’s try !

Thanks !

···

On Sun, Feb 7, 2010 at 10:24 PM, Jason Pickering jason.p.pickering@gmail.com wrote:

https://blueprints.launchpad.net/dhis2/+spec/regex-validation

I have updated the blueprint on regular expression use in data

validation rules. This would really make my life (and I suspect

others) lives a lot easier, as long as we are using naming

conventions, lets at least enforce them somehow.

For discussion.

Jason


Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp


Hieu.HISPVietnam
Good Health !

Hi Hieu,
Yes, I am actively fishing for a developer to implement this, as it
will really save me a huge amount of work in trying to clean up data.

I have no idea really how it would be implemented, other than that
java.util.regex should be able to be used, but let me give it a try at
a better specification. I do not think it should be so difficult
either.

I am thinking of something like this....

The user would create a regular expression for later assignment to a
database object. The user would select a database table (object) and
field for validation. For instance, lets say we want to validate that
there are no trailing spaces in an organization name.

So, we would create a rule called "Trailing spaces are not allowed"

We would create this rule, and assign a description and a regular
expression to it.

in this case, it would probably be something really simple like '\s+$'

Now, I have no idea how to do this in java, but I assume this would be
really simple, something like this query in Postgresql.

SELECT name from organisationunit where name ~*('\s+$')

Wow, I found 571 orgunits in my organisationunittable with trailing
spaces. Cool.

So, i think we need two objects.

1) A persistence object that stores the following files for the
RegexExpression

a) regexid
b) name
c) expression
d) description
e) resolution description (telling the user how to solve this problem)

2) A table to assign regular expressions to database objects.

a) regexid
b) table
c) field

We could maybe reuse this rule on the davavalue table, to determine if
any values have been stored with trailing spaces.

Yeah, its very easy I think. I would do it myself if I knew a lick of Java. :slight_smile:

Best regards,
Jason

···

On Sun, Feb 7, 2010 at 7:36 PM, Hieu Dang Duy <hieu.hispvietnam@gmail.com> wrote:

Hi all,

I've no idea about using RegEx for validating data in DHIS2. Just a small
comment, I am also using this many times so my feeling on this is not easy
but not too difficult when applying RegEx in your coding, ie, javascript and
java also.
With RegEx, we can easy controlling any thing that we want to force the user
for entering data (text, number) or something else (a file name is an
example).
Let's try !

Thanks !

On Sun, Feb 7, 2010 at 10:24 PM, Jason Pickering > <jason.p.pickering@gmail.com> wrote:

https://blueprints.launchpad.net/dhis2/+spec/regex-validation

I have updated the blueprint on regular expression use in data
validation rules. This would really make my life (and I suspect
others) lives a lot easier, as long as we are using naming
conventions, lets at least enforce them somehow.

For discussion.

Jason

_______________________________________________
Mailing list: https://launchpad.net/~dhis2-devs
Post to : dhis2-devs@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-devs
More help : https://help.launchpad.net/ListHelp

--
Hieu.HISPVietnam
Good Health !

Hi Jason,

Looks like single time task if I understood you correctly? If you want to clean data already on database. like data integrity checking. Why not to make it clean at the very beginning, when particular record being captured? For this one could use regexp in javascript on client side too. As for leading and trailing spaces String.trim(" string ") should do before passing to database.

regards,
murod

···

----- Original Message ----
From: Jason Pickering <jason.p.pickering@gmail.com>
To: Hieu Dang Duy <hieu.hispvietnam@gmail.com>
Cc: dhis2-devs <dhis2-devs@lists.launchpad.net>
Sent: Mon, February 8, 2010 1:05:27 PM
Subject: Re: [Dhis2-devs] Regular expressions in data validation rules

Hi Hieu,
Yes, I am actively fishing for a developer to implement this, as it
will really save me a huge amount of work in trying to clean up data.

I have no idea really how it would be implemented, other than that
java.util.regex should be able to be used, but let me give it a try at
a better specification. I do not think it should be so difficult
either.

I am thinking of something like this....

The user would create a regular expression for later assignment to a
database object. The user would select a database table (object) and
field for validation. For instance, lets say we want to validate that
there are no trailing spaces in an organization name.

So, we would create a rule called "Trailing spaces are not allowed"

We would create this rule, and assign a description and a regular
expression to it.

in this case, it would probably be something really simple like '\s+$'

Now, I have no idea how to do this in java, but I assume this would be
really simple, something like this query in Postgresql.

SELECT name from organisationunit where name ~*('\s+$')

Wow, I found 571 orgunits in my organisationunittable with trailing
spaces. Cool.

So, i think we need two objects.

1) A persistence object that stores the following files for the
RegexExpression

a) regexid
b) name
c) expression
d) description
e) resolution description (telling the user how to solve this problem)

2) A table to assign regular expressions to database objects.

a) regexid
b) table
c) field

We could maybe reuse this rule on the davavalue table, to determine if
any values have been stored with trailing spaces.

Yeah, its very easy I think. I would do it myself if I knew a lick of Java. :slight_smile:

Best regards,
Jason

On Sun, Feb 7, 2010 at 7:36 PM, Hieu Dang Duy <hieu.hispvietnam@gmail.com> wrote:

Hi all,

I've no idea about using RegEx for validating data in DHIS2. Just a small
comment, I am also using this many times so my feeling on this is not easy
but not too difficult when applying RegEx in your coding, ie, javascript and
java also.
With RegEx, we can easy controlling any thing that we want to force the user
for entering data (text, number) or something else (a file name is an
example).
Let's try !

Thanks !

On Sun, Feb 7, 2010 at 10:24 PM, Jason Pickering > <jason.p.pickering@gmail.com> wrote:

https://blueprints.launchpad.net/dhis2/+spec/regex-validation

I have updated the blueprint on regular expression use in data
validation rules. This would really make my life (and I suspect
others) lives a lot easier, as long as we are using naming
conventions, lets at least enforce them somehow.

For discussion.

Jason

_______________________________________________
Mailing list: https://launchpad.net/~dhis2-devs
Post to : dhis2-devs@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-devs
More help : https://help.launchpad.net/ListHelp

--
Hieu.HISPVietnam
Good Health !

_______________________________________________
Mailing list: https://launchpad.net/~dhis2-devs
Post to : dhis2-devs@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-devs
More help : https://help.launchpad.net/ListHelp

Hi Murod,

This, of course, is one particular trivial example and was provided
to illustrate a point.

I totally agree, this particular example could be solved through
JavaScript validation on the client, and it may already be there in
2.0. I have found this particular example by importing data from 1.4,
where organization units are allowed to have trailing spaces. I think
this is not really a one-off issue, as many people may need to import
data from external systems, which may or may not have this particular
validation enforced.

What I am trying to get at is that regular expressions could be used
to expand the scope of the current data integrity checks, by enforcing
certain patterns on the data (which in some cases could also be
enforced through JavaScript in through the UI). Of course, if we can
do it at the UI level great, but it may not work in all cases,
especially when receiving data from external system. This is why I
think that the data integrity checks come in place. For instance, as I
mentioned in the specs, I need to find all organizational units that
do not correspond to the naming conventions here in Zambia. I can do
this with this...

SELECT name from organisationunit where name !~ '^(ce|co|ea|ls|lu|no|nw|so|we) '

Well, I found 47, which do not correspond to the naming convention. I
have made my dislike of the supposed best practice naming conventions
in earlier threads, but with the implmenetation of regex for checking
of these conventions, at least we could enforce them, even if it is ex
post facto.

Again, these are all examples, and they are really impossible to
predict what they may be, thus the need for flexible rules, built by
administrators/users, and then applied during data integrity checks
(and/or during data entry).

Regards,
Jason

···

On Mon, Feb 8, 2010 at 9:55 AM, Murodullo Latifov <murodlatifov@yahoo.com> wrote:

Hi Jason,

Looks like single time task if I understood you correctly? If you want to clean data already on database. like data integrity checking. Why not to make it clean at the very beginning, when particular record being captured? For this one could use regexp in javascript on client side too. As for leading and trailing spaces String.trim(" string ") should do before passing to database.

regards,
murod

----- Original Message ----
From: Jason Pickering <jason.p.pickering@gmail.com>
To: Hieu Dang Duy <hieu.hispvietnam@gmail.com>
Cc: dhis2-devs <dhis2-devs@lists.launchpad.net>
Sent: Mon, February 8, 2010 1:05:27 PM
Subject: Re: [Dhis2-devs] Regular expressions in data validation rules

Hi Hieu,
Yes, I am actively fishing for a developer to implement this, as it
will really save me a huge amount of work in trying to clean up data.

I have no idea really how it would be implemented, other than that
java.util.regex should be able to be used, but let me give it a try at
a better specification. I do not think it should be so difficult
either.

I am thinking of something like this....

The user would create a regular expression for later assignment to a
database object. The user would select a database table (object) and
field for validation. For instance, lets say we want to validate that
there are no trailing spaces in an organization name.

So, we would create a rule called "Trailing spaces are not allowed"

We would create this rule, and assign a description and a regular
expression to it.

in this case, it would probably be something really simple like '\s+$'

Now, I have no idea how to do this in java, but I assume this would be
really simple, something like this query in Postgresql.

SELECT name from organisationunit where name ~*('\s+$')

Wow, I found 571 orgunits in my organisationunittable with trailing
spaces. Cool.

So, i think we need two objects.

1) A persistence object that stores the following files for the
RegexExpression

a) regexid
b) name
c) expression
d) description
e) resolution description (telling the user how to solve this problem)

2) A table to assign regular expressions to database objects.

a) regexid
b) table
c) field

We could maybe reuse this rule on the davavalue table, to determine if
any values have been stored with trailing spaces.

Yeah, its very easy I think. I would do it myself if I knew a lick of Java. :slight_smile:

Best regards,
Jason

On Sun, Feb 7, 2010 at 7:36 PM, Hieu Dang Duy > <hieu.hispvietnam@gmail.com> wrote:

Hi all,

I've no idea about using RegEx for validating data in DHIS2. Just a small
comment, I am also using this many times so my feeling on this is not easy
but not too difficult when applying RegEx in your coding, ie, javascript and
java also.
With RegEx, we can easy controlling any thing that we want to force the user
for entering data (text, number) or something else (a file name is an
example).
Let's try !

Thanks !

On Sun, Feb 7, 2010 at 10:24 PM, Jason Pickering >> <jason.p.pickering@gmail.com> wrote:

https://blueprints.launchpad.net/dhis2/+spec/regex-validation

I have updated the blueprint on regular expression use in data
validation rules. This would really make my life (and I suspect
others) lives a lot easier, as long as we are using naming
conventions, lets at least enforce them somehow.

For discussion.

Jason

_______________________________________________
Mailing list: https://launchpad.net/~dhis2-devs
Post to : dhis2-devs@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-devs
More help : https://help.launchpad.net/ListHelp

--
Hieu.HISPVietnam
Good Health !

_______________________________________________
Mailing list: https://launchpad.net/~dhis2-devs
Post to : dhis2-devs@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-devs
More help : https://help.launchpad.net/ListHelp

Hi Jason,

Ok, I understood your point. If you are about import validation, than its good to do that, as there is no proper XML validator for DHIS data exchange, I mean content not validated against any dictionary or repository, when source is other than DHIS itself. Kettle is one such tool for ETL with inbuilt regexp support. You can use it for single task type of issues with transformation, validation and such. http://kettle.pentaho.org/\. Support many different input formats as well as output.

Regards,
murod

···

----- Original Message ----
From: Jason Pickering <jason.p.pickering@gmail.com>
To: Murodullo Latifov <murodlatifov@yahoo.com>
Cc: Hieu Dang Duy <hieu.hispvietnam@gmail.com>; dhis2-devs <dhis2-devs@lists.launchpad.net>
Sent: Mon, February 8, 2010 2:27:13 PM
Subject: Re: [Dhis2-devs] Regular expressions in data validation rules

Hi Murod,

This, of course, is one particular trivial example and was provided
to illustrate a point.

I totally agree, this particular example could be solved through
JavaScript validation on the client, and it may already be there in
2.0. I have found this particular example by importing data from 1.4,
where organization units are allowed to have trailing spaces. I think
this is not really a one-off issue, as many people may need to import
data from external systems, which may or may not have this particular
validation enforced.

What I am trying to get at is that regular expressions could be used
to expand the scope of the current data integrity checks, by enforcing
certain patterns on the data (which in some cases could also be
enforced through JavaScript in through the UI). Of course, if we can
do it at the UI level great, but it may not work in all cases,
especially when receiving data from external system. This is why I
think that the data integrity checks come in place. For instance, as I
mentioned in the specs, I need to find all organizational units that
do not correspond to the naming conventions here in Zambia. I can do
this with this...

SELECT name from organisationunit where name !~ '^(ce|co|ea|ls|lu|no|nw|so|we) '

Well, I found 47, which do not correspond to the naming convention. I
have made my dislike of the supposed best practice naming conventions
in earlier threads, but with the implmenetation of regex for checking
of these conventions, at least we could enforce them, even if it is ex
post facto.

Again, these are all examples, and they are really impossible to
predict what they may be, thus the need for flexible rules, built by
administrators/users, and then applied during data integrity checks
(and/or during data entry).

Regards,
Jason

On Mon, Feb 8, 2010 at 9:55 AM, Murodullo Latifov <murodlatifov@yahoo.com> wrote:

Hi Jason,

Looks like single time task if I understood you correctly? If you want to clean data already on database. like data integrity checking. Why not to make it clean at the very beginning, when particular record being captured? For this one could use regexp in javascript on client side too. As for leading and trailing spaces String.trim(" string ") should do before passing to database.

regards,
murod

----- Original Message ----
From: Jason Pickering <jason.p.pickering@gmail.com>
To: Hieu Dang Duy <hieu.hispvietnam@gmail.com>
Cc: dhis2-devs <dhis2-devs@lists.launchpad.net>
Sent: Mon, February 8, 2010 1:05:27 PM
Subject: Re: [Dhis2-devs] Regular expressions in data validation rules

Hi Hieu,
Yes, I am actively fishing for a developer to implement this, as it
will really save me a huge amount of work in trying to clean up data.

I have no idea really how it would be implemented, other than that
java.util.regex should be able to be used, but let me give it a try at
a better specification. I do not think it should be so difficult
either.

I am thinking of something like this....

The user would create a regular expression for later assignment to a
database object. The user would select a database table (object) and
field for validation. For instance, lets say we want to validate that
there are no trailing spaces in an organization name.

So, we would create a rule called "Trailing spaces are not allowed"

We would create this rule, and assign a description and a regular
expression to it.

in this case, it would probably be something really simple like '\s+$'

Now, I have no idea how to do this in java, but I assume this would be
really simple, something like this query in Postgresql.

SELECT name from organisationunit where name ~*('\s+$')

Wow, I found 571 orgunits in my organisationunittable with trailing
spaces. Cool.

So, i think we need two objects.

1) A persistence object that stores the following files for the
RegexExpression

a) regexid
b) name
c) expression
d) description
e) resolution description (telling the user how to solve this problem)

2) A table to assign regular expressions to database objects.

a) regexid
b) table
c) field

We could maybe reuse this rule on the davavalue table, to determine if
any values have been stored with trailing spaces.

Yeah, its very easy I think. I would do it myself if I knew a lick of Java. :slight_smile:

Best regards,
Jason

On Sun, Feb 7, 2010 at 7:36 PM, Hieu Dang Duy > <hieu.hispvietnam@gmail.com> wrote:

Hi all,

I've no idea about using RegEx for validating data in DHIS2. Just a small
comment, I am also using this many times so my feeling on this is not easy
but not too difficult when applying RegEx in your coding, ie, javascript and
java also.
With RegEx, we can easy controlling any thing that we want to force the user
for entering data (text, number) or something else (a file name is an
example).
Let's try !

Thanks !

On Sun, Feb 7, 2010 at 10:24 PM, Jason Pickering >> <jason.p.pickering@gmail.com> wrote:

https://blueprints.launchpad.net/dhis2/+spec/regex-validation

I have updated the blueprint on regular expression use in data
validation rules. This would really make my life (and I suspect
others) lives a lot easier, as long as we are using naming
conventions, lets at least enforce them somehow.

For discussion.

Jason

_______________________________________________
Mailing list: https://launchpad.net/~dhis2-devs
Post to : dhis2-devs@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-devs
More help : https://help.launchpad.net/ListHelp

--
Hieu.HISPVietnam
Good Health !

_______________________________________________
Mailing list: https://launchpad.net/~dhis2-devs
Post to : dhis2-devs@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-devs
More help : https://help.launchpad.net/ListHelp

Hi

There are 4 places one could use these regex’s:

  1. in the browser - client side validation

  2. in the framework action/interceptors (http://struts.apache.org/2.1.8.1/docs/validation.html)

  3. in the object persist methods

  4. post fact validation checks.

There are lots of examples of validation with regex using javascript. Not much to say.

Regarding 2 it is a natural way to proceed but it won’t affect import which doesn’t use the web interface.

Regarding 3 we do need to be aware of those places where we bypass the object model. But where the object model is being used it is not difficult to validate with a regex on save. Of course we have to find the corresponding regex. That is really the first problem to solve. Where to find the regex within the model.

Leaving values out of the picture for a while it might make sense to start with names. We have many named objects and the way we name then is frequently very important as the names also act as primary identifiers. We need somehow to add a class-wide string regex field for descendents of NamedObjects (you might want two - one for name and one shortName, but maybe start with name). This way the regex should be available to clients of orgunit, dataelement, category etc

On importing from XML it is very natural and easy to do regular expression based validation using something like schematron which can validate against any xpath expression - but regex is only available in XPath2 which means using saxon and there are some concerns about introducing a saxon dependency. (We might re-look at that). Though there is also another reason to perhaps not use regex validation on dataValues. It will slow things enormously for large imports.

It is also possible to do regular expression matching at the schema level (using either RelaxNG or XSD) and validate via schema. This might be the most viable way to go though it would imply that the Zambia dxf schema would have slightly different constraints to say the Tajik one. And these schema variations would have to be auto-generated somehow based on the local database.

Regards
Bob

···

On 8 February 2010 08:57, Jason Pickering jason.p.pickering@gmail.com wrote:

Hi Murod,

This, of course, is one particular trivial example and was provided

to illustrate a point.

I totally agree, this particular example could be solved through

JavaScript validation on the client, and it may already be there in

2.0. I have found this particular example by importing data from 1.4,

where organization units are allowed to have trailing spaces. I think

this is not really a one-off issue, as many people may need to import

data from external systems, which may or may not have this particular

validation enforced.

What I am trying to get at is that regular expressions could be used

to expand the scope of the current data integrity checks, by enforcing

certain patterns on the data (which in some cases could also be

enforced through JavaScript in through the UI). Of course, if we can

do it at the UI level great, but it may not work in all cases,

especially when receiving data from external system. This is why I

think that the data integrity checks come in place. For instance, as I

mentioned in the specs, I need to find all organizational units that

do not correspond to the naming conventions here in Zambia. I can do

this with this…

SELECT name from organisationunit where name !~ '^(ce|co|ea|ls|lu|no|nw|so|we) ’

Well, I found 47, which do not correspond to the naming convention. I

have made my dislike of the supposed best practice naming conventions

in earlier threads, but with the implmenetation of regex for checking

of these conventions, at least we could enforce them, even if it is ex

post facto.

Again, these are all examples, and they are really impossible to

predict what they may be, thus the need for flexible rules, built by

administrators/users, and then applied during data integrity checks

(and/or during data entry).

Regards,

Jason

On Mon, Feb 8, 2010 at 9:55 AM, Murodullo Latifov > > murodlatifov@yahoo.com wrote:

Hi Jason,

Looks like single time task if I understood you correctly? If you want to clean data already on database. like data integrity checking. Why not to make it clean at the very beginning, when particular record being captured? For this one could use regexp in javascript on client side too. As for leading and trailing spaces String.trim(" string ") should do before passing to database.

regards,

murod

----- Original Message ----

From: Jason Pickering jason.p.pickering@gmail.com

To: Hieu Dang Duy hieu.hispvietnam@gmail.com

Cc: dhis2-devs dhis2-devs@lists.launchpad.net

Sent: Mon, February 8, 2010 1:05:27 PM

Subject: Re: [Dhis2-devs] Regular expressions in data validation rules

Hi Hieu,

Yes, I am actively fishing for a developer to implement this, as it

will really save me a huge amount of work in trying to clean up data.

I have no idea really how it would be implemented, other than that

java.util.regex should be able to be used, but let me give it a try at

a better specification. I do not think it should be so difficult

either.

I am thinking of something like this…

The user would create a regular expression for later assignment to a

database object. The user would select a database table (object) and

field for validation. For instance, lets say we want to validate that

there are no trailing spaces in an organization name.

So, we would create a rule called “Trailing spaces are not allowed”

We would create this rule, and assign a description and a regular

expression to it.

in this case, it would probably be something really simple like ‘\s+$’

Now, I have no idea how to do this in java, but I assume this would be

really simple, something like this query in Postgresql.

SELECT name from organisationunit where name ~*(‘\s+$’)

Wow, I found 571 orgunits in my organisationunittable with trailing

spaces. Cool.

So, i think we need two objects.

  1. A persistence object that stores the following files for the

RegexExpression

a) regexid

b) name

c) expression

d) description

e) resolution description (telling the user how to solve this problem)

  1. A table to assign regular expressions to database objects.

a) regexid

b) table

c) field

We could maybe reuse this rule on the davavalue table, to determine if

any values have been stored with trailing spaces.

Yeah, its very easy I think. I would do it myself if I knew a lick of Java. :slight_smile:

Best regards,

Jason

On Sun, Feb 7, 2010 at 7:36 PM, Hieu Dang Duy > > > hieu.hispvietnam@gmail.com wrote:

Hi all,

I’ve no idea about using RegEx for validating data in DHIS2. Just a small

comment, I am also using this many times so my feeling on this is not easy

but not too difficult when applying RegEx in your coding, ie, javascript and

java also.

With RegEx, we can easy controlling any thing that we want to force the user

for entering data (text, number) or something else (a file name is an

example).

Let’s try !

Thanks !

On Sun, Feb 7, 2010 at 10:24 PM, Jason Pickering > > >> jason.p.pickering@gmail.com wrote:

https://blueprints.launchpad.net/dhis2/+spec/regex-validation

I have updated the blueprint on regular expression use in data

validation rules. This would really make my life (and I suspect

others) lives a lot easier, as long as we are using naming

conventions, lets at least enforce them somehow.

For discussion.

Jason


Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Hieu.HISPVietnam

Good Health !


Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp


Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Very good points. I was thinking initially at least, so start with 4
and 1, in that order.

There are already many checks already in place in the UI, but somehow
it feels that it should be possible to extend them and make them more
generic, to suit a particular implementations needs. Could the rules
defined in the data integrity checks be reused at the UI level (and
other levels?). It feels like it is possible, although there may be
complications due to different regex flavors. The fourth alternative
seems like a quick win.

Data integrity checks serve a useful purpose, by allowing people to
enter some data, even if it may not be 100% correct. This is a
property of HMIS systems that I think we all face, namely that some
information is better than no information and all. For instance, you
can enter values beyond the min/max values, but there are checks there
to warn you. The same could be said of the functionality of regular
expressions in the data validation process. Allow people to enter
data, even though it may be not entirely correct (e.g. does not follow
the countries naming conventions, includes decimal places where there
should not be any, etc). Each of these rules are often highly specific
to implementations. Placing regular expressions in the data integrity
as a start, would seem fairly simple to implement, and would offer up
some quick wins to allow better data quality.

I agree that intercepting problems at the import level is important,
but as Bob highlights, it is costly in terms of processing. At a
personal level, I tend to want to get the data in the DB first, and
then try and clean it up, rather than trying to analyze all the
possibly problems prior to a data import. I think there are good
arguments both ways, but in many cases, we have no control, except
when we do the import ourselves, of whether imported data has been
properly imported or no. 90% of the time here in Zambia, data imported
is pretty good, but it is that 10% that can often only be resolved by
a human most efficiently, at least when one thinks about the code
required to try and correct every single issue that may arise from a
particular naming convention, and whether someone follows it or not.

Regards,
Jason

···

On Mon, Feb 8, 2010 at 12:34 PM, Bob Jolliffe <bobjolliffe@gmail.com> wrote:

Hi

There are 4 places one could use these regex's:
1. in the browser - client side validation
2. in the framework action/interceptors
(http://struts.apache.org/2.1.8.1/docs/validation.html\)
3. in the object persist methods
4. post fact validation checks.

There are lots of examples of validation with regex using javascript. Not
much to say.

Regarding 2 it is a natural way to proceed but it won't affect import which
doesn't use the web interface.

Regarding 3 we do need to be aware of those places where we bypass the
object model. But where the object model is being used it is not difficult
to validate with a regex on save. Of course we have to find the
corresponding regex. That is really the first problem to solve. Where to
find the regex within the model.

Leaving values out of the picture for a while it might make sense to start
with names. We have many named objects and the way we name then is
frequently very important as the names also act as primary identifiers. We
need somehow to add a class-wide string regex field for descendents of
NamedObjects (you might want two - one for name and one shortName, but maybe
start with name). This way the regex should be available to clients of
orgunit, dataelement, category etc

On importing from XML it is very natural and easy to do regular expression
based validation using something like schematron which can validate against
any xpath expression - but regex is only available in XPath2 which means
using saxon and there are some concerns about introducing a saxon
dependency. (We might re-look at that). Though there is also another
reason to perhaps not use regex validation on dataValues. It will slow
things enormously for large imports.

It is also possible to do regular expression matching at the schema level
(using either RelaxNG or XSD) and validate via schema. This might be the
most viable way to go though it would imply that the Zambia dxf schema would
have slightly different constraints to say the Tajik one. And these schema
variations would have to be auto-generated somehow based on the local
database.

Regards
Bob

On 8 February 2010 08:57, Jason Pickering <jason.p.pickering@gmail.com> > wrote:

Hi Murod,

This, of course, is one particular trivial example and was provided
to illustrate a point.

I totally agree, this particular example could be solved through
JavaScript validation on the client, and it may already be there in
2.0. I have found this particular example by importing data from 1.4,
where organization units are allowed to have trailing spaces. I think
this is not really a one-off issue, as many people may need to import
data from external systems, which may or may not have this particular
validation enforced.

What I am trying to get at is that regular expressions could be used
to expand the scope of the current data integrity checks, by enforcing
certain patterns on the data (which in some cases could also be
enforced through JavaScript in through the UI). Of course, if we can
do it at the UI level great, but it may not work in all cases,
especially when receiving data from external system. This is why I
think that the data integrity checks come in place. For instance, as I
mentioned in the specs, I need to find all organizational units that
do not correspond to the naming conventions here in Zambia. I can do
this with this...

SELECT name from organisationunit where name !~
'^(ce|co|ea|ls|lu|no|nw|so|we) '

Well, I found 47, which do not correspond to the naming convention. I
have made my dislike of the supposed best practice naming conventions
in earlier threads, but with the implmenetation of regex for checking
of these conventions, at least we could enforce them, even if it is ex
post facto.

Again, these are all examples, and they are really impossible to
predict what they may be, thus the need for flexible rules, built by
administrators/users, and then applied during data integrity checks
(and/or during data entry).

Regards,
Jason

On Mon, Feb 8, 2010 at 9:55 AM, Murodullo Latifov >> <murodlatifov@yahoo.com> wrote:
> Hi Jason,
>
> Looks like single time task if I understood you correctly? If you want
> to clean data already on database. like data integrity checking. Why not to
> make it clean at the very beginning, when particular record being captured?
> For this one could use regexp in javascript on client side too. As for
> leading and trailing spaces String.trim(" string ") should do before
> passing to database.
>
> regards,
> murod
>
>
>
> ----- Original Message ----
> From: Jason Pickering <jason.p.pickering@gmail.com>
> To: Hieu Dang Duy <hieu.hispvietnam@gmail.com>
> Cc: dhis2-devs <dhis2-devs@lists.launchpad.net>
> Sent: Mon, February 8, 2010 1:05:27 PM
> Subject: Re: [Dhis2-devs] Regular expressions in data validation rules
>
> Hi Hieu,
> Yes, I am actively fishing for a developer to implement this, as it
> will really save me a huge amount of work in trying to clean up data.
>
> I have no idea really how it would be implemented, other than that
> java.util.regex should be able to be used, but let me give it a try at
> a better specification. I do not think it should be so difficult
> either.
>
> I am thinking of something like this....
>
> The user would create a regular expression for later assignment to a
> database object. The user would select a database table (object) and
> field for validation. For instance, lets say we want to validate that
> there are no trailing spaces in an organization name.
>
> So, we would create a rule called "Trailing spaces are not allowed"
>
> We would create this rule, and assign a description and a regular
> expression to it.
>
> in this case, it would probably be something really simple like '\s+$'
>
> Now, I have no idea how to do this in java, but I assume this would be
> really simple, something like this query in Postgresql.
>
> SELECT name from organisationunit where name ~*('\s+$')
>
> Wow, I found 571 orgunits in my organisationunittable with trailing
> spaces. Cool.
>
> So, i think we need two objects.
>
> 1) A persistence object that stores the following files for the
> RegexExpression
>
> a) regexid
> b) name
> c) expression
> d) description
> e) resolution description (telling the user how to solve this problem)
>
> 2) A table to assign regular expressions to database objects.
>
> a) regexid
> b) table
> c) field
>
> We could maybe reuse this rule on the davavalue table, to determine if
> any values have been stored with trailing spaces.
>
> Yeah, its very easy I think. I would do it myself if I knew a lick of
> Java. :slight_smile:
>
> Best regards,
> Jason
>
>
> On Sun, Feb 7, 2010 at 7:36 PM, Hieu Dang Duy >> > <hieu.hispvietnam@gmail.com> wrote:
>> Hi all,
>>
>> I've no idea about using RegEx for validating data in DHIS2. Just a
>> small
>> comment, I am also using this many times so my feeling on this is not
>> easy
>> but not too difficult when applying RegEx in your coding, ie,
>> javascript and
>> java also.
>> With RegEx, we can easy controlling any thing that we want to force the
>> user
>> for entering data (text, number) or something else (a file name is an
>> example).
>> Let's try !
>>
>> Thanks !
>>
>> On Sun, Feb 7, 2010 at 10:24 PM, Jason Pickering >> >> <jason.p.pickering@gmail.com> wrote:
>>>
>>> https://blueprints.launchpad.net/dhis2/+spec/regex-validation
>>>
>>> I have updated the blueprint on regular expression use in data
>>> validation rules. This would really make my life (and I suspect
>>> others) lives a lot easier, as long as we are using naming
>>> conventions, lets at least enforce them somehow.
>>>
>>> For discussion.
>>>
>>> Jason
>>>
>>> _______________________________________________
>>> Mailing list: https://launchpad.net/~dhis2-devs
>>> Post to : dhis2-devs@lists.launchpad.net
>>> Unsubscribe : https://launchpad.net/~dhis2-devs
>>> More help : https://help.launchpad.net/ListHelp
>>
>>
>>
>> --
>> Hieu.HISPVietnam
>> Good Health !
>>
>
> _______________________________________________
> Mailing list: https://launchpad.net/~dhis2-devs
> Post to : dhis2-devs@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~dhis2-devs
> More help : https://help.launchpad.net/ListHelp
>
>
>
>
>

_______________________________________________
Mailing list: https://launchpad.net/~dhis2-devs
Post to : dhis2-devs@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-devs
More help : https://help.launchpad.net/ListHelp

I very much agree with both Jason and Bob that it is useful to be able to put checks (or not) at various levels, depending on use cases (and users). Ideally, a common set of rules could be applied, as Jason says.

Which rules to switch on at what levels should be up to an administrator, or in some cases even the end user. There is no one size fits all here, but hopefully we can describe scenarios and best practices in the documentation and training material, without tying everyone into the same mould.

Knut

···

On Mon, Feb 8, 2010 at 12:43 PM, Jason Pickering jason.p.pickering@gmail.com wrote:

Very good points. I was thinking initially at least, so start with 4

and 1, in that order.

There are already many checks already in place in the UI, but somehow

it feels that it should be possible to extend them and make them more

generic, to suit a particular implementations needs. Could the rules

defined in the data integrity checks be reused at the UI level (and

other levels?). It feels like it is possible, although there may be

complications due to different regex flavors. The fourth alternative

seems like a quick win.

Data integrity checks serve a useful purpose, by allowing people to

enter some data, even if it may not be 100% correct. This is a

property of HMIS systems that I think we all face, namely that some

information is better than no information and all. For instance, you

can enter values beyond the min/max values, but there are checks there

to warn you. The same could be said of the functionality of regular

expressions in the data validation process. Allow people to enter

data, even though it may be not entirely correct (e.g. does not follow

the countries naming conventions, includes decimal places where there

should not be any, etc). Each of these rules are often highly specific

to implementations. Placing regular expressions in the data integrity

as a start, would seem fairly simple to implement, and would offer up

some quick wins to allow better data quality.

I agree that intercepting problems at the import level is important,

but as Bob highlights, it is costly in terms of processing. At a

personal level, I tend to want to get the data in the DB first, and

then try and clean it up, rather than trying to analyze all the

possibly problems prior to a data import. I think there are good

arguments both ways, but in many cases, we have no control, except

when we do the import ourselves, of whether imported data has been

properly imported or no. 90% of the time here in Zambia, data imported

is pretty good, but it is that 10% that can often only be resolved by

a human most efficiently, at least when one thinks about the code

required to try and correct every single issue that may arise from a

particular naming convention, and whether someone follows it or not.

Regards,

Jason

On Mon, Feb 8, 2010 at 12:34 PM, Bob Jolliffe bobjolliffe@gmail.com wrote:

Hi

There are 4 places one could use these regex’s:

  1. in the browser - client side validation
  1. in the framework action/interceptors

(http://struts.apache.org/2.1.8.1/docs/validation.html)

  1. in the object persist methods
  1. post fact validation checks.

There are lots of examples of validation with regex using javascript. Not

much to say.

Regarding 2 it is a natural way to proceed but it won’t affect import which

doesn’t use the web interface.

Regarding 3 we do need to be aware of those places where we bypass the

object model. But where the object model is being used it is not difficult

to validate with a regex on save. Of course we have to find the

corresponding regex. That is really the first problem to solve. Where to

find the regex within the model.

Leaving values out of the picture for a while it might make sense to start

with names. We have many named objects and the way we name then is

frequently very important as the names also act as primary identifiers. We

need somehow to add a class-wide string regex field for descendents of

NamedObjects (you might want two - one for name and one shortName, but maybe

start with name). This way the regex should be available to clients of

orgunit, dataelement, category etc

On importing from XML it is very natural and easy to do regular expression

based validation using something like schematron which can validate against

any xpath expression - but regex is only available in XPath2 which means

using saxon and there are some concerns about introducing a saxon

dependency. (We might re-look at that). Though there is also another

reason to perhaps not use regex validation on dataValues. It will slow

things enormously for large imports.

It is also possible to do regular expression matching at the schema level

(using either RelaxNG or XSD) and validate via schema. This might be the

most viable way to go though it would imply that the Zambia dxf schema would

have slightly different constraints to say the Tajik one. And these schema

variations would have to be auto-generated somehow based on the local

database.

Regards

Bob

On 8 February 2010 08:57, Jason Pickering jason.p.pickering@gmail.com > > > wrote:

Hi Murod,

This, of course, is one particular trivial example and was provided

to illustrate a point.

I totally agree, this particular example could be solved through

JavaScript validation on the client, and it may already be there in

2.0. I have found this particular example by importing data from 1.4,

where organization units are allowed to have trailing spaces. I think

this is not really a one-off issue, as many people may need to import

data from external systems, which may or may not have this particular

validation enforced.

What I am trying to get at is that regular expressions could be used

to expand the scope of the current data integrity checks, by enforcing

certain patterns on the data (which in some cases could also be

enforced through JavaScript in through the UI). Of course, if we can

do it at the UI level great, but it may not work in all cases,

especially when receiving data from external system. This is why I

think that the data integrity checks come in place. For instance, as I

mentioned in the specs, I need to find all organizational units that

do not correspond to the naming conventions here in Zambia. I can do

this with this…

SELECT name from organisationunit where name !~

'^(ce|co|ea|ls|lu|no|nw|so|we) ’

Well, I found 47, which do not correspond to the naming convention. I

have made my dislike of the supposed best practice naming conventions

in earlier threads, but with the implmenetation of regex for checking

of these conventions, at least we could enforce them, even if it is ex

post facto.

Again, these are all examples, and they are really impossible to

predict what they may be, thus the need for flexible rules, built by

administrators/users, and then applied during data integrity checks

(and/or during data entry).

Regards,

Jason

On Mon, Feb 8, 2010 at 9:55 AM, Murodullo Latifov > > >> murodlatifov@yahoo.com wrote:

Hi Jason,

Looks like single time task if I understood you correctly? If you want

to clean data already on database. like data integrity checking. Why not to

make it clean at the very beginning, when particular record being captured?

For this one could use regexp in javascript on client side too. As for

leading and trailing spaces String.trim(" string ") should do before

passing to database.

regards,

murod

----- Original Message ----

From: Jason Pickering jason.p.pickering@gmail.com

To: Hieu Dang Duy hieu.hispvietnam@gmail.com

Cc: dhis2-devs dhis2-devs@lists.launchpad.net

Sent: Mon, February 8, 2010 1:05:27 PM

Subject: Re: [Dhis2-devs] Regular expressions in data validation rules

Hi Hieu,

Yes, I am actively fishing for a developer to implement this, as it

will really save me a huge amount of work in trying to clean up data.

I have no idea really how it would be implemented, other than that

java.util.regex should be able to be used, but let me give it a try at

a better specification. I do not think it should be so difficult

either.

I am thinking of something like this…

The user would create a regular expression for later assignment to a

database object. The user would select a database table (object) and

field for validation. For instance, lets say we want to validate that

there are no trailing spaces in an organization name.

So, we would create a rule called “Trailing spaces are not allowed”

We would create this rule, and assign a description and a regular

expression to it.

in this case, it would probably be something really simple like ‘\s+$’

Now, I have no idea how to do this in java, but I assume this would be

really simple, something like this query in Postgresql.

SELECT name from organisationunit where name ~*(‘\s+$’)

Wow, I found 571 orgunits in my organisationunittable with trailing

spaces. Cool.

So, i think we need two objects.

  1. A persistence object that stores the following files for the

RegexExpression

a) regexid

b) name

c) expression

d) description

e) resolution description (telling the user how to solve this problem)

  1. A table to assign regular expressions to database objects.

a) regexid

b) table

c) field

We could maybe reuse this rule on the davavalue table, to determine if

any values have been stored with trailing spaces.

Yeah, its very easy I think. I would do it myself if I knew a lick of

Java. :slight_smile:

Best regards,

Jason

On Sun, Feb 7, 2010 at 7:36 PM, Hieu Dang Duy > > >> > hieu.hispvietnam@gmail.com wrote:

Hi all,

I’ve no idea about using RegEx for validating data in DHIS2. Just a

small

comment, I am also using this many times so my feeling on this is not

easy

but not too difficult when applying RegEx in your coding, ie,

javascript and

java also.

With RegEx, we can easy controlling any thing that we want to force the

user

for entering data (text, number) or something else (a file name is an

example).

Let’s try !

Thanks !

On Sun, Feb 7, 2010 at 10:24 PM, Jason Pickering > > >> >> jason.p.pickering@gmail.com wrote:

https://blueprints.launchpad.net/dhis2/+spec/regex-validation

I have updated the blueprint on regular expression use in data

validation rules. This would really make my life (and I suspect

others) lives a lot easier, as long as we are using naming

conventions, lets at least enforce them somehow.

For discussion.

Jason


Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Hieu.HISPVietnam

Good Health !


Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp


Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp


Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp


Cheers,
Knut Staring

Just forwarding a side discussion with Hieu to the list for discussion
and consideration.

Input welcome.

Regards,
Jason

···

---------- Forwarded message ----------
From: Jason Pickering <jason.p.pickering@gmail.com>
Date: Mon, Feb 8, 2010 at 9:25 PM
Subject: Re: [Dhis2-devs] Regular expressions in data validation rules
To: Hieu Dang Duy <hieu.hispvietnam@gmail.com>

Fantastic. Please review
https://blueprints.launchpad.net/dhis2/+spec/regex-validation
carefully. I have indicated to persistence objects which I think would
be necessary.

The first, would be a table that would store the regex itself. The can
be a one-to-many relationship between regexes and other objects. So,
the first table would simple store the regex itself, and a few other
properties.

So, to create that example query (Detect trailing spaces in names) we
would have a persistence object like this..
a)regexid (taken from hibernate_sequence)
b)name (Detect trailing spaces in names)
c)expression ('\s+$')
d)description('Trailing spaces should not be allowed.')
e)resolution('The following objects have trailing spaces. Please
delete them from the appropriate field')

OK, so that would basically store the regex itself.

Now, another table would store the particulars of what we want to
apply the regex to.

a) regexid (same as the hibernate sequence)
b) table (organisationunit)
c) field (name)
d) negate (false)
e) caseinsentive (defaults to false)

So, we would need two dialogs. One to allow a user to define the
regex, the other to assign a particular regex to a particular field.

Now, this is where I need help. I know the SQL that will return the
record set we want.

SELECT organisationunitid, name from organisationunit where name ~('\s+$')

If we were to select the negate field as TRUE, then the query would be
SELECT organisationunitid, name from organisationunit where name !~('\s+$')

If we select case insensitve the query would be (which of course in
this case is meaningless)
SELECT organisationunitid, name from organisationunit where name !~*('\s+$')

This is Postgresql dialect.The key question for me is how do we make
it database independent? Do we return a recordset from the
organisationunit table, apply a regex with java.util.regex or can we
just push it back to the DB? Both Postres, MySQL and H2 seem to
support this (MySQL uses RLIKE or REGEXP and H2 uses LIKE while
postgres uses ~ ) So, we could create a query and then just execute it
directly on the DB. Otherwise, I am not sure how to do it directly
with Hibernate. Perhaps it is easy, but I reviewed a few pages from
Google, and there did not seem to be a clear answer.

In the end however, the result set would then need to be returned to
the user through during a data integrity check, with the list of
fields that violate the rule, similar to what happens now.

I think this is really incredibly important, and the example I have
given thus far is really just a trivial example. I am sure there are
many more situations where this would come in handy.

Anyway, let me know what you think.

Best regards,
Jason

On Mon, Feb 8, 2010 at 8:39 PM, Hieu Dang Duy <hieu.hispvietnam@gmail.com> wrote:

Dear Jason,

I would like to work with u on this blueprint. As your previous mail, you
make me open my eyes really at using RegEx in sql query. Absolutely, I've
never seen this query like that before. I had try running your example on
orgunit table. Amazing, it ran fine.

Plz show off to me which things what I have to do for u?
Hopefully that, I will try to help you test this if I can.

On Mon, Feb 8, 2010 at 11:25 PM, Jason Pickering > <jason.p.pickering@gmail.com> wrote:

Hi there Hieu,

I was hoping maybe to work with you on this. I have no idea how to
code Java really, but I understand the code fairly well if you can
point me in the right direction. I can certainly help you with the
specs. I know exactly what needs to be done I think (see the
blueprint) and can help with the testing and regex definitions.

I think we should start with the data integrity checks. This seems
like the quickest win.

Do you have any time?

Best regards,
Jason

On Sun, Feb 7, 2010 at 7:36 PM, Hieu Dang Duy >> <hieu.hispvietnam@gmail.com> wrote:
> Hi all,
>
> I've no idea about using RegEx for validating data in DHIS2. Just a
> small
> comment, I am also using this many times so my feeling on this is not
> easy
> but not too difficult when applying RegEx in your coding, ie, javascript
> and
> java also.
> With RegEx, we can easy controlling any thing that we want to force the
> user
> for entering data (text, number) or something else (a file name is an
> example).
> Let's try !
>
> Thanks !
>
> On Sun, Feb 7, 2010 at 10:24 PM, Jason Pickering >> > <jason.p.pickering@gmail.com> wrote:
>>
>> https://blueprints.launchpad.net/dhis2/+spec/regex-validation
>>
>> I have updated the blueprint on regular expression use in data
>> validation rules. This would really make my life (and I suspect
>> others) lives a lot easier, as long as we are using naming
>> conventions, lets at least enforce them somehow.
>>
>> For discussion.
>>
>> Jason
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~dhis2-devs
>> Post to : dhis2-devs@lists.launchpad.net
>> Unsubscribe : https://launchpad.net/~dhis2-devs
>> More help : https://help.launchpad.net/ListHelp
>
>
>
> --
> Hieu.HISPVietnam
> Good Health !
>

--
Hieu.HISPVietnam
Good Health !

Hi Jason

Looks good. I think we might want to push ahead and make the tables
anyway. Then you can already start taking advantage of them with your
queries outside of the system. We can then start implementing the
other use cases as we go along - and hopefully not hit too many
problems with dialect. Having a few regexes there might be a starting
point for investigating the portability issues. I am pretty sure I
will take advantage of such a table at some point to do schema based
validation on import for example.

Cheers
Bob

···

On 9 February 2010 05:20, Jason Pickering <jason.p.pickering@gmail.com> wrote:

Just forwarding a side discussion with Hieu to the list for discussion
and consideration.

Input welcome.

Regards,
Jason

---------- Forwarded message ----------
From: Jason Pickering <jason.p.pickering@gmail.com>
Date: Mon, Feb 8, 2010 at 9:25 PM
Subject: Re: [Dhis2-devs] Regular expressions in data validation rules
To: Hieu Dang Duy <hieu.hispvietnam@gmail.com>

Fantastic. Please review
https://blueprints.launchpad.net/dhis2/+spec/regex-validation
carefully. I have indicated to persistence objects which I think would
be necessary.

The first, would be a table that would store the regex itself. The can
be a one-to-many relationship between regexes and other objects. So,
the first table would simple store the regex itself, and a few other
properties.

So, to create that example query (Detect trailing spaces in names) we
would have a persistence object like this..
a)regexid (taken from hibernate_sequence)
b)name (Detect trailing spaces in names)
c)expression ('\s+$')
d)description('Trailing spaces should not be allowed.')
e)resolution('The following objects have trailing spaces. Please
delete them from the appropriate field')

OK, so that would basically store the regex itself.

Now, another table would store the particulars of what we want to
apply the regex to.

a) regexid (same as the hibernate sequence)
b) table (organisationunit)
c) field (name)
d) negate (false)
e) caseinsentive (defaults to false)

So, we would need two dialogs. One to allow a user to define the
regex, the other to assign a particular regex to a particular field.

Now, this is where I need help. I know the SQL that will return the
record set we want.

SELECT organisationunitid, name from organisationunit where name ~('\s+$')

If we were to select the negate field as TRUE, then the query would be
SELECT organisationunitid, name from organisationunit where name !~('\s+$')

If we select case insensitve the query would be (which of course in
this case is meaningless)
SELECT organisationunitid, name from organisationunit where name !~*('\s+$')

This is Postgresql dialect.The key question for me is how do we make
it database independent? Do we return a recordset from the
organisationunit table, apply a regex with java.util.regex or can we
just push it back to the DB? Both Postres, MySQL and H2 seem to
support this (MySQL uses RLIKE or REGEXP and H2 uses LIKE while
postgres uses ~ ) So, we could create a query and then just execute it
directly on the DB. Otherwise, I am not sure how to do it directly
with Hibernate. Perhaps it is easy, but I reviewed a few pages from
Google, and there did not seem to be a clear answer.

In the end however, the result set would then need to be returned to
the user through during a data integrity check, with the list of
fields that violate the rule, similar to what happens now.

I think this is really incredibly important, and the example I have
given thus far is really just a trivial example. I am sure there are
many more situations where this would come in handy.

Anyway, let me know what you think.

Best regards,
Jason

On Mon, Feb 8, 2010 at 8:39 PM, Hieu Dang Duy > <hieu.hispvietnam@gmail.com> wrote:

Dear Jason,

I would like to work with u on this blueprint. As your previous mail, you
make me open my eyes really at using RegEx in sql query. Absolutely, I've
never seen this query like that before. I had try running your example on
orgunit table. Amazing, it ran fine.

Plz show off to me which things what I have to do for u?
Hopefully that, I will try to help you test this if I can.

On Mon, Feb 8, 2010 at 11:25 PM, Jason Pickering >> <jason.p.pickering@gmail.com> wrote:

Hi there Hieu,

I was hoping maybe to work with you on this. I have no idea how to
code Java really, but I understand the code fairly well if you can
point me in the right direction. I can certainly help you with the
specs. I know exactly what needs to be done I think (see the
blueprint) and can help with the testing and regex definitions.

I think we should start with the data integrity checks. This seems
like the quickest win.

Do you have any time?

Best regards,
Jason

On Sun, Feb 7, 2010 at 7:36 PM, Hieu Dang Duy >>> <hieu.hispvietnam@gmail.com> wrote:
> Hi all,
>
> I've no idea about using RegEx for validating data in DHIS2. Just a
> small
> comment, I am also using this many times so my feeling on this is not
> easy
> but not too difficult when applying RegEx in your coding, ie, javascript
> and
> java also.
> With RegEx, we can easy controlling any thing that we want to force the
> user
> for entering data (text, number) or something else (a file name is an
> example).
> Let's try !
>
> Thanks !
>
> On Sun, Feb 7, 2010 at 10:24 PM, Jason Pickering >>> > <jason.p.pickering@gmail.com> wrote:
>>
>> https://blueprints.launchpad.net/dhis2/+spec/regex-validation
>>
>> I have updated the blueprint on regular expression use in data
>> validation rules. This would really make my life (and I suspect
>> others) lives a lot easier, as long as we are using naming
>> conventions, lets at least enforce them somehow.
>>
>> For discussion.
>>
>> Jason
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~dhis2-devs
>> Post to : dhis2-devs@lists.launchpad.net
>> Unsubscribe : https://launchpad.net/~dhis2-devs
>> More help : https://help.launchpad.net/ListHelp
>
>
>
> --
> Hieu.HISPVietnam
> Good Health !
>

--
Hieu.HISPVietnam
Good Health !

_______________________________________________
Mailing list: https://launchpad.net/~dhis2-devs
Post to : dhis2-devs@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-devs
More help : https://help.launchpad.net/ListHelp

Digging up this old thread here – I assume there is no support for reg ex validations for aggregate data yet, correct? At least the available validation rule functions documented in the docs don’t indicate that.

Hi @dhuser starting in 2.35 we support logic expressions in indicators and validation rules. The issues is we do not yet have aggregation levels in indicators. Check out the screenshot from indicators in 2.37 below

hi @Scott thanks! Does the Validation Rule engine support checking that a DE value (valueType=TEXT) is of a certain format, e.g. needs to start with XY followed by a 3-digit number where RegEx expressions would be useful? I understand that the VR engine needs numbers for left/right side, so maybe something like:

left side => if( match( '^XY-[0-9]{3}$', #{DEUID} ), 1, 0)
operator => equal or greater than
right side => 1

Valid: XY123 / not valid: 1X2Y3.

that’s great info thanks @Scott ! Is there a relevant jira issue for this (couldn’t find it)?

Hi @dhuser my mistake, but maybe good news for you.

As of 2.37.0 and 2.36.4, text value (and date and boolean) are supported in validation rules and predictors inputs. (Indicators are kind of locked into numeric values.).

1 Like

That’s great @Scott – I might have misunderstood but since it is marked as a Solution to this CoP thread, can you shed some light how the functions are used to validate text values [e.g. via a Regular Expression] in Validation Rules?

The docs do not mention anything yet: docs > metadata > validation rules and I don’t see yet how if( ) can be used for this.

Sure, you can use an if statements with text just like in the example below.

I am pretty sure that what you are asking for is not supported @dhuser . @Scott example shows how a numeric value can be generated if the value of a textual data element matches a reference, but you are asking for a way to match a pattern to a particular data value. I think it would be a nice feature, but almost certain its not supported at this point in time.

Understood, thanks! I created a feature request in DHIS2-12023.

1 Like