Very good points. I was thinking initially at least, so start with 4
and 1, in that order.
There are already many checks already in place in the UI, but somehow
it feels that it should be possible to extend them and make them more
generic, to suit a particular implementations needs. Could the rules
defined in the data integrity checks be reused at the UI level (and
other levels?). It feels like it is possible, although there may be
complications due to different regex flavors. The fourth alternative
seems like a quick win.
Data integrity checks serve a useful purpose, by allowing people to
enter some data, even if it may not be 100% correct. This is a
property of HMIS systems that I think we all face, namely that some
information is better than no information and all. For instance, you
can enter values beyond the min/max values, but there are checks there
to warn you. The same could be said of the functionality of regular
expressions in the data validation process. Allow people to enter
data, even though it may be not entirely correct (e.g. does not follow
the countries naming conventions, includes decimal places where there
should not be any, etc). Each of these rules are often highly specific
to implementations. Placing regular expressions in the data integrity
as a start, would seem fairly simple to implement, and would offer up
some quick wins to allow better data quality.
I agree that intercepting problems at the import level is important,
but as Bob highlights, it is costly in terms of processing. At a
personal level, I tend to want to get the data in the DB first, and
then try and clean it up, rather than trying to analyze all the
possibly problems prior to a data import. I think there are good
arguments both ways, but in many cases, we have no control, except
when we do the import ourselves, of whether imported data has been
properly imported or no. 90% of the time here in Zambia, data imported
is pretty good, but it is that 10% that can often only be resolved by
a human most efficiently, at least when one thinks about the code
required to try and correct every single issue that may arise from a
particular naming convention, and whether someone follows it or not.
Regards,
Jason
On Mon, Feb 8, 2010 at 12:34 PM, Bob Jolliffe bobjolliffe@gmail.com wrote:
Hi
There are 4 places one could use these regex’s:
- in the browser - client side validation
- in the framework action/interceptors
(http://struts.apache.org/2.1.8.1/docs/validation.html)
- in the object persist methods
- post fact validation checks.
There are lots of examples of validation with regex using javascript. Not
much to say.
Regarding 2 it is a natural way to proceed but it won’t affect import which
doesn’t use the web interface.
Regarding 3 we do need to be aware of those places where we bypass the
object model. But where the object model is being used it is not difficult
to validate with a regex on save. Of course we have to find the
corresponding regex. That is really the first problem to solve. Where to
find the regex within the model.
Leaving values out of the picture for a while it might make sense to start
with names. We have many named objects and the way we name then is
frequently very important as the names also act as primary identifiers. We
need somehow to add a class-wide string regex field for descendents of
NamedObjects (you might want two - one for name and one shortName, but maybe
start with name). This way the regex should be available to clients of
orgunit, dataelement, category etc
On importing from XML it is very natural and easy to do regular expression
based validation using something like schematron which can validate against
any xpath expression - but regex is only available in XPath2 which means
using saxon and there are some concerns about introducing a saxon
dependency. (We might re-look at that). Though there is also another
reason to perhaps not use regex validation on dataValues. It will slow
things enormously for large imports.
It is also possible to do regular expression matching at the schema level
(using either RelaxNG or XSD) and validate via schema. This might be the
most viable way to go though it would imply that the Zambia dxf schema would
have slightly different constraints to say the Tajik one. And these schema
variations would have to be auto-generated somehow based on the local
database.
Regards
Bob
On 8 February 2010 08:57, Jason Pickering jason.p.pickering@gmail.com > > > wrote:
Hi Murod,
This, of course, is one particular trivial example and was provided
to illustrate a point.
I totally agree, this particular example could be solved through
JavaScript validation on the client, and it may already be there in
2.0. I have found this particular example by importing data from 1.4,
where organization units are allowed to have trailing spaces. I think
this is not really a one-off issue, as many people may need to import
data from external systems, which may or may not have this particular
validation enforced.
What I am trying to get at is that regular expressions could be used
to expand the scope of the current data integrity checks, by enforcing
certain patterns on the data (which in some cases could also be
enforced through JavaScript in through the UI). Of course, if we can
do it at the UI level great, but it may not work in all cases,
especially when receiving data from external system. This is why I
think that the data integrity checks come in place. For instance, as I
mentioned in the specs, I need to find all organizational units that
do not correspond to the naming conventions here in Zambia. I can do
this with this…
SELECT name from organisationunit where name !~
'^(ce|co|ea|ls|lu|no|nw|so|we) ’
Well, I found 47, which do not correspond to the naming convention. I
have made my dislike of the supposed best practice naming conventions
in earlier threads, but with the implmenetation of regex for checking
of these conventions, at least we could enforce them, even if it is ex
post facto.
Again, these are all examples, and they are really impossible to
predict what they may be, thus the need for flexible rules, built by
administrators/users, and then applied during data integrity checks
(and/or during data entry).
Regards,
Jason
On Mon, Feb 8, 2010 at 9:55 AM, Murodullo Latifov > > >> murodlatifov@yahoo.com wrote:
Hi Jason,
Looks like single time task if I understood you correctly? If you want
to clean data already on database. like data integrity checking. Why not to
make it clean at the very beginning, when particular record being captured?
For this one could use regexp in javascript on client side too. As for
leading and trailing spaces String.trim(" string ") should do before
passing to database.
regards,
murod
----- Original Message ----
From: Jason Pickering jason.p.pickering@gmail.com
To: Hieu Dang Duy hieu.hispvietnam@gmail.com
Cc: dhis2-devs dhis2-devs@lists.launchpad.net
Sent: Mon, February 8, 2010 1:05:27 PM
Subject: Re: [Dhis2-devs] Regular expressions in data validation rules
Hi Hieu,
Yes, I am actively fishing for a developer to implement this, as it
will really save me a huge amount of work in trying to clean up data.
I have no idea really how it would be implemented, other than that
java.util.regex should be able to be used, but let me give it a try at
a better specification. I do not think it should be so difficult
either.
I am thinking of something like this…
The user would create a regular expression for later assignment to a
database object. The user would select a database table (object) and
field for validation. For instance, lets say we want to validate that
there are no trailing spaces in an organization name.
So, we would create a rule called “Trailing spaces are not allowed”
We would create this rule, and assign a description and a regular
expression to it.
in this case, it would probably be something really simple like ‘\s+$’
Now, I have no idea how to do this in java, but I assume this would be
really simple, something like this query in Postgresql.
SELECT name from organisationunit where name ~*(‘\s+$’)
Wow, I found 571 orgunits in my organisationunittable with trailing
spaces. Cool.
So, i think we need two objects.
- A persistence object that stores the following files for the
RegexExpression
a) regexid
b) name
c) expression
d) description
e) resolution description (telling the user how to solve this problem)
- A table to assign regular expressions to database objects.
a) regexid
b) table
c) field
We could maybe reuse this rule on the davavalue table, to determine if
any values have been stored with trailing spaces.
Yeah, its very easy I think. I would do it myself if I knew a lick of
Java. 
Best regards,
Jason
On Sun, Feb 7, 2010 at 7:36 PM, Hieu Dang Duy > > >> > hieu.hispvietnam@gmail.com wrote:
Hi all,
I’ve no idea about using RegEx for validating data in DHIS2. Just a
small
comment, I am also using this many times so my feeling on this is not
easy
but not too difficult when applying RegEx in your coding, ie,
javascript and
java also.
With RegEx, we can easy controlling any thing that we want to force the
user
for entering data (text, number) or something else (a file name is an
example).
Let’s try !
Thanks !
On Sun, Feb 7, 2010 at 10:24 PM, Jason Pickering > > >> >> jason.p.pickering@gmail.com wrote:
https://blueprints.launchpad.net/dhis2/+spec/regex-validation
I have updated the blueprint on regular expression use in data
validation rules. This would really make my life (and I suspect
others) lives a lot easier, as long as we are using naming
conventions, lets at least enforce them somehow.
For discussion.
Jason
Mailing list: https://launchpad.net/~dhis2-devs
Post to : dhis2-devs@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-devs
More help : https://help.launchpad.net/ListHelp
–
Hieu.HISPVietnam
Good Health !
Mailing list: https://launchpad.net/~dhis2-devs
Post to : dhis2-devs@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-devs
More help : https://help.launchpad.net/ListHelp
Mailing list: https://launchpad.net/~dhis2-devs
Post to : dhis2-devs@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-devs
More help : https://help.launchpad.net/ListHelp
Mailing list: https://launchpad.net/~dhis2-devs
Post to : dhis2-devs@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-devs
More help : https://help.launchpad.net/ListHelp