Can we apply regex string into hypernate mapping ?

Hieu_Dang_Duy · 9 February 2010 12:23

This is Postgresql dialect.The key question for me is how do we make

it database independent? Do we return a recordset from the

organisationunit table, apply a regex with java.util.regex or can we

just push it back to the DB? Both Postres, MySQL and H2 seem to

support this (MySQL uses RLIKE or REGEXP and H2 uses LIKE while

postgres uses ~ ) So, we could create a query and then just execute it

directly on the DB. Otherwise, I am not sure how to do it directly

with Hibernate. Perhaps it is easy, but I reviewed a few pages from

Google, and there did not seem to be a clear answer.

Excuse me, everyone !

I wanna ask to you all like the subject. At the present, I really don’t know how to do this. But in my opinion, can we use regex if using to combine with the FILTER in hibernate mapping?

If can, I would like to suggest that we should not use all regexes for each persistent object (DE, Orgunit, Cate, ect…). Because if you apply all regexes into hibernate mapping file you will accidentally make our system to become rigid or inflexible (If you want to modify the regex which you want so you MUST go to the hibernate file to re-fix and re-build your code) . Although, this solution will help the system is faster in processing.

Btw, if using this way I don’t know but I guess that your idea about using the regex rule on each persistent object which is not available anymore. Thus, I think we should combine them together.

We separate the regex rule to two kind of regex. One is regex default and another one is customization (using by the user). Why called the second one is customization? Because user can use or not the regex as they want anytime.

In the end however, the result set would then need to be returned to

the user through during a data integrity check, with the list of

fields that violate the rule, similar to what happens now.

Yeas, sure ! we have to show up what happening and which mistake to the user when they are importing data into DHIS2.

I think this is really incredibly important, and the example I have

given thus far is really just a trivial example. I am sure there are

many more situations where this would come in handy.

By the reason so you should thinking more my first opinion is above.

Thank you !

···

–
Hieu.HISPVietnam
Good Health !

jason · 9 February 2010 13:07

I wanna ask to you all like the subject. At the present, I really don't know
how to do this. But in my opinion, can we use regex if using to combine with
the FILTER in hibernate mapping?

OK, here is the thing. I have no clue how hibernate works, but if
FILTER is equivalent to a WHERE in SQL, then it may or may not work. I
would assume that it would not work, as regular expressions are not
necessarily standard SQL. They are implemented slightly differently in
each DB system. Now, but the query

SELECT * FROM organisationunit where name = '\s+$'
is not the same as
SELECT * FROM organisationunit where name ~ '\s+$'

In Postgres, the ~ symbol means to evaluate the expression as a
regular expression, and not a a literal value.

If can, I would like to suggest that we should not use all regexes for each
persistent object (DE, Orgunit, Cate, ect...). Because if you apply all
regexes into hibernate mapping file you will accidentally make our system to
become rigid or inflexible (If you want to modify the regex which you want
so you MUST go to the hibernate file to re-fix and re-build your code) .
Although, this solution will help the system is faster in processing.
Btw, if using this way I don't know but I guess that your idea about using
the regex rule on each persistent object which is not available anymore.
Thus, I think we should combine them together.

Well, this was the whole reason why I suggested, if possible, to
persist the expressions as objects, and then use a regular expression
evaluator (either in the DB or with java.util.regex) to determine what
the matches are. As you may have seen from Bobs mail, it should be
possible to reuse regular expressions during data import, to ensure
that data matches regular expression restrictions for a given database
field. It is really just a mechanism to attach more data dynamically
to the schema, rather than hard-coding it.

We separate the regex rule to two kind of regex. One is regex default and
another one is customization (using by the user). Why called the second one
is customization? Because user can use or not the regex as they want
anytime.

This is why I suggested that for the time being, we put it in the data
integrity checks. Administrators could then run the field-level
validation checks on a regular basis and make corrections from there.

Yeas, sure ! we have to show up what happening and which mistake to the user
when they are importing data into DHIS2.

Agreed, but I think this is almost a second step. Ideally, we would
want to intercept "bad" data from entering the system in the first
place, but who know what people will do. Some people (not calling any
names) have inserted several hundreds of thousands of rows of data
with ETL processes. Now, ideally as Murod pointed out, the ETL job
would have regular expression rules built into it. But where would
these rules come from? Well, if they were in a table, it would be
pretty easy to dynamically create a set of rules, for any import
process be it from an external ETL package, or through DXF import.

Again, I think the key question, and this is only something the devs
can answer is
1) Should a recordset be returned to java.util.regex for processing
(if it is even feasible) or
2) Should native DB queries be used? If we are supporting at the
moment MySQL, Postgresql and H2, then implementing regular expression
queries natively should be a piece of cake. However, this restricts
the DBs that this particular function would be able to be applied to.

I think as well to highlight Bobs point in a previous mail, that if
the regexs are persisted, then they could be reused at any point (UI
level validation, import validation, data integrity checks, or even
with external ETL tools). Currently, the data validation rules are
hard-coded (unlike 1.4 where they consist of SQL queries). Ideally, we
should make this dynamic too, to allow admins to apply their own data
integrity checks, but this may deserve a separate thread.

Regards,
Jason