Creation of CategoryOptionCombinations

Dear devs,

I am experiencing problems when handling category combinations. Our protoype
with 5 dimensions went through the process of generating
categoryOptionCombinations (~20.000 records) quite well. 7 dimensions (~400.000)
worked as well, although it took a very long time.

Now we defined the next datamodel with 10 dimensions (expecting ~5Mio
categoryOptionCombinations) and the process dies without further notice. Last
words in catalina.out:
* INFO 2016-06-07 13:29:33,783 Building object-bridge maps (preheatCache: true,
3 classes). (DefaultObjectBridge.java [http-bio-8180-exec-15])
* INFO 2016-06-07 13:29:36,779 Building object-bridge maps took 2.99 seconds.
(DefaultObjectBridge.java [http-bio-8180-exec-15])
* INFO 2016-06-07 13:29:36,896 'admin' update
org.hisp.dhis.dataelement.DataElementCategoryCombo, name: Membership, uid:
SCgLXYHqVzz (AuditLogUtil.java [http-bio-8180-exec-15])

Ten dimensions with not extraordinarily big option sets is actually not unusual
and rather slim for multi-dimensional data-models in data warehouses, so I'd
expect DHIS2 to be able to handle this easily.

Could of course be a memory problem (tried up to 14g for tomcat on a 4-core
Ubuntu 14.04 server, DHIS 2.23) Before I'll start experimenting with other
parameters, I am hoping to get some hints on known limitations or workarounds
from you (not allowed: reducing the number of options or categories, sql-hacks
:slight_smile: ). Is there any info on whether optimizations on this process are being
planned in the kernel?

Some observations on the process:

* during generation (either when saving the categoryCombination or in the data
maintenance menu):
- long names - cOCs are generated with generated names that are getting
extremely long as they are mere concats of the involved categoryOptions. Could
there be an option to just use the codes as basis or to leave away the names
completely? Could be one reason for a memory problem and performance issues.
- long log entries - every single entry is logged in catalina.out with several
lines of text, causing catalina to become extremely big.
- during execution lots of Java-memory are being used and no DB-memory, which
looks to me as if all the logic is happening in the java machine. It might be
more usefull to transfer more logic into SQLs to the DB (e.g. use DB cross-joins
for combining options) as the DB will be more efficient.
- because of the log entries I assume that every single combination is being
persisted into the DB with a single SQL statement, causing millions of single
SQL requests. Prefer batch SQL instead of single record processing.

* during import/export of categoryOptionCombinations:
- prefer batch SQL instead of single record processing
- huge log entries in catalina.out due to several lines of text per combination

I'd be very happy about comments.

Thanks in advance,

Uwe

Hi Uwe

Did you try out new importer? Available as /api/23/metadata in 2.23

···

On Tuesday, 7 June 2016, Uwe Wahser uwe@wahser.de wrote:

Dear devs,

I am experiencing problems when handling category combinations. Our protoype

with 5 dimensions went through the process of generating

categoryOptionCombinations (~20.000 records) quite well. 7 dimensions (~400.000)

worked as well, although it took a very long time.

Now we defined the next datamodel with 10 dimensions (expecting ~5Mio

categoryOptionCombinations) and the process dies without further notice. Last

words in catalina.out:

  • INFO 2016-06-07 13:29:33,783 Building object-bridge maps (preheatCache: true,

3 classes). (DefaultObjectBridge.java [http-bio-8180-exec-15])

  • INFO 2016-06-07 13:29:36,779 Building object-bridge maps took 2.99 seconds.

(DefaultObjectBridge.java [http-bio-8180-exec-15])

  • INFO 2016-06-07 13:29:36,896 ‘admin’ update

org.hisp.dhis.dataelement.DataElementCategoryCombo, name: Membership, uid:

SCgLXYHqVzz (AuditLogUtil.java [http-bio-8180-exec-15])

Ten dimensions with not extraordinarily big option sets is actually not unusual

and rather slim for multi-dimensional data-models in data warehouses, so I’d

expect DHIS2 to be able to handle this easily.

Could of course be a memory problem (tried up to 14g for tomcat on a 4-core

Ubuntu 14.04 server, DHIS 2.23) Before I’ll start experimenting with other

parameters, I am hoping to get some hints on known limitations or workarounds

from you (not allowed: reducing the number of options or categories, sql-hacks

:slight_smile: ). Is there any info on whether optimizations on this process are being

planned in the kernel?

Some observations on the process:

  • during generation (either when saving the categoryCombination or in the data

maintenance menu):

  • long names - cOCs are generated with generated names that are getting

extremely long as they are mere concats of the involved categoryOptions. Could

there be an option to just use the codes as basis or to leave away the names

completely? Could be one reason for a memory problem and performance issues.

  • long log entries - every single entry is logged in catalina.out with several

lines of text, causing catalina to become extremely big.

  • during execution lots of Java-memory are being used and no DB-memory, which

looks to me as if all the logic is happening in the java machine. It might be

more usefull to transfer more logic into SQLs to the DB (e.g. use DB cross-joins

for combining options) as the DB will be more efficient.

  • because of the log entries I assume that every single combination is being

persisted into the DB with a single SQL statement, causing millions of single

SQL requests. Prefer batch SQL instead of single record processing.

  • during import/export of categoryOptionCombinations:
  • prefer batch SQL instead of single record processing

  • huge log entries in catalina.out due to several lines of text per combination

I’d be very happy about comments.

Thanks in advance,

Uwe


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

Morten Olav Hansen

Senior Engineer, DHIS 2

University of Oslo

http://www.dhis2.org

Hi Morten,

no, i didn't. What would be the procedure for that? Importing Categories,
Options and CategoryCombinations via api and having DHIS2 generate the
CategoryOptionCombinations? Would that bring about any change at all or does the
importer use different libs for generating the COCs?

btw. is the 23 in the api link valid for future dhis2 versions? I noticed it in
a few api descriptions recently ...

Regards, Uwe

···

Morten Olav Hansen <morten@dhis2.org> hat am 7. Juni 2016 um 18:50
geschrieben:

Hi Uwe

Did you try out new importer? Available as /api/23/metadata in 2.23

On Tuesday, 7 June 2016, Uwe Wahser <uwe@wahser.de> wrote:

> Dear devs,
>
> I am experiencing problems when handling category combinations. Our
> protoype
> with 5 dimensions went through the process of generating
> categoryOptionCombinations (~20.000 records) quite well. 7 dimensions
> (~400.000)
> worked as well, although it took a very long time.
>
> Now we defined the next datamodel with 10 dimensions (expecting ~5Mio
> categoryOptionCombinations) and the process dies without further notice.
> Last
> words in catalina.out:
> * INFO 2016-06-07 13:29:33,783 Building object-bridge maps (preheatCache:
> true,
> 3 classes). (DefaultObjectBridge.java [http-bio-8180-exec-15])
> * INFO 2016-06-07 13:29:36,779 Building object-bridge maps took 2.99
> seconds.
> (DefaultObjectBridge.java [http-bio-8180-exec-15])
> * INFO 2016-06-07 13:29:36,896 'admin' update
> org.hisp.dhis.dataelement.DataElementCategoryCombo, name: Membership, uid:
> SCgLXYHqVzz (AuditLogUtil.java [http-bio-8180-exec-15])
>
> Ten dimensions with not extraordinarily big option sets is actually not
> unusual
> and rather slim for multi-dimensional data-models in data warehouses, so
> I'd
> expect DHIS2 to be able to handle this easily.
>
> Could of course be a memory problem (tried up to 14g for tomcat on a 4-core
> Ubuntu 14.04 server, DHIS 2.23) Before I'll start experimenting with other
> parameters, I am hoping to get some hints on known limitations or
> workarounds
> from you (not allowed: reducing the number of options or categories,
> sql-hacks
> :slight_smile: ). Is there any info on whether optimizations on this process are being
> planned in the kernel?
>
> Some observations on the process:
>
> * during generation (either when saving the categoryCombination or in the
> data
> maintenance menu):
> - long names - cOCs are generated with generated names that are getting
> extremely long as they are mere concats of the involved categoryOptions.
> Could
> there be an option to just use the codes as basis or to leave away the
> names
> completely? Could be one reason for a memory problem and performance
> issues.
> - long log entries - every single entry is logged in catalina.out with
> several
> lines of text, causing catalina to become extremely big.
> - during execution lots of Java-memory are being used and no DB-memory,
> which
> looks to me as if all the logic is happening in the java machine. It might
> be
> more usefull to transfer more logic into SQLs to the DB (e.g. use DB
> cross-joins
> for combining options) as the DB will be more efficient.
> - because of the log entries I assume that every single combination is
> being
> persisted into the DB with a single SQL statement, causing millions of
> single
> SQL requests. Prefer batch SQL instead of single record processing.
>
> * during import/export of categoryOptionCombinations:
> - prefer batch SQL instead of single record processing
> - huge log entries in catalina.out due to several lines of text per
> combination
>
> I'd be very happy about comments.
>
> Thanks in advance,
>
> Uwe
>
> _______________________________________________
> Mailing list: DHIS 2 Users in Launchpad
> Post to : dhis2-users@lists.launchpad.net <javascript:;>
> Unsubscribe : DHIS 2 Users in Launchpad
> More help : ListHelp - Launchpad Help
>

--
--
Morten Olav Hansen
Senior Engineer, DHIS 2
University of Oslo
http://www.dhis2.org

Hi Uwe

The improvements are mainly for speed and validation. Yes, we are now (in 2.24) introducing versioned web-api, so that endpoint importer will be available until 2.26 (we will support 3 versions). In 2.24, the same endpoint is available at /api/24/metadata.

If you are using cURL, or another utility… the import part would be the same, but the UI in 2.23 can not be used, as it’s hardcoded to legacy importer.

···

On Tue, Jun 7, 2016 at 11:25 PM, Uwe Wahser uwe@wahser.de wrote:

Hi Morten,

no, i didn’t. What would be the procedure for that? Importing Categories,

Options and CategoryCombinations via api and having DHIS2 generate the

CategoryOptionCombinations? Would that bring about any change at all or does the

importer use different libs for generating the COCs?

btw. is the 23 in the api link valid for future dhis2 versions? I noticed it in

a few api descriptions recently …

Regards, Uwe

Morten Olav Hansen morten@dhis2.org hat am 7. Juni 2016 um 18:50

geschrieben:

Hi Uwe

Did you try out new importer? Available as /api/23/metadata in 2.23

On Tuesday, 7 June 2016, Uwe Wahser uwe@wahser.de wrote:

Dear devs,

I am experiencing problems when handling category combinations. Our

protoype

with 5 dimensions went through the process of generating

categoryOptionCombinations (~20.000 records) quite well. 7 dimensions

(~400.000)

worked as well, although it took a very long time.

Now we defined the next datamodel with 10 dimensions (expecting ~5Mio

categoryOptionCombinations) and the process dies without further notice.

Last

words in catalina.out:

  • INFO 2016-06-07 13:29:33,783 Building object-bridge maps (preheatCache:

true,

3 classes). (DefaultObjectBridge.java [http-bio-8180-exec-15])

  • INFO 2016-06-07 13:29:36,779 Building object-bridge maps took 2.99

seconds.

(DefaultObjectBridge.java [http-bio-8180-exec-15])

  • INFO 2016-06-07 13:29:36,896 ‘admin’ update

org.hisp.dhis.dataelement.DataElementCategoryCombo, name: Membership, uid:

SCgLXYHqVzz (AuditLogUtil.java [http-bio-8180-exec-15])

Ten dimensions with not extraordinarily big option sets is actually not

unusual

and rather slim for multi-dimensional data-models in data warehouses, so

I’d

expect DHIS2 to be able to handle this easily.

Could of course be a memory problem (tried up to 14g for tomcat on a 4-core

Ubuntu 14.04 server, DHIS 2.23) Before I’ll start experimenting with other

parameters, I am hoping to get some hints on known limitations or

workarounds

from you (not allowed: reducing the number of options or categories,

sql-hacks

:slight_smile: ). Is there any info on whether optimizations on this process are being

planned in the kernel?

Some observations on the process:

  • during generation (either when saving the categoryCombination or in the

data

maintenance menu):

  • long names - cOCs are generated with generated names that are getting

extremely long as they are mere concats of the involved categoryOptions.

Could

there be an option to just use the codes as basis or to leave away the

names

completely? Could be one reason for a memory problem and performance

issues.

  • long log entries - every single entry is logged in catalina.out with

several

lines of text, causing catalina to become extremely big.

  • during execution lots of Java-memory are being used and no DB-memory,

which

looks to me as if all the logic is happening in the java machine. It might

be

more usefull to transfer more logic into SQLs to the DB (e.g. use DB

cross-joins

for combining options) as the DB will be more efficient.

  • because of the log entries I assume that every single combination is

being

persisted into the DB with a single SQL statement, causing millions of

single

SQL requests. Prefer batch SQL instead of single record processing.

  • during import/export of categoryOptionCombinations:
  • prefer batch SQL instead of single record processing
  • huge log entries in catalina.out due to several lines of text per

combination

I’d be very happy about comments.

Thanks in advance,

Uwe


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net <javascript:;>
Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

Morten Olav Hansen

Senior Engineer, DHIS 2

University of Oslo

http://www.dhis2.org

Morten Olav Hansen

Senior Engineer, DHIS 2

University of Oslo

http://www.dhis2.org

Hi Uwe,

I agree ten dimensions is not much per se, but you might say 10 categories for disaggregation per data element is a lot. Would it be possible to redesign the model a bit, and rely more on data element group sets + groups where you classify your data elements? This as opposed to having everything as categories / options.

5 million option combos I think will in any case take some time to generate and maintain. If you are willing to share some more info on your use-case perhaps someone can offer some views.

regards,

Lars

···

On Tue, Jun 7, 2016 at 12:28 PM, Morten Olav Hansen morten@dhis2.org wrote:

Hi Uwe

The improvements are mainly for speed and validation. Yes, we are now (in 2.24) introducing versioned web-api, so that endpoint importer will be available until 2.26 (we will support 3 versions). In 2.24, the same endpoint is available at /api/24/metadata.

If you are using cURL, or another utility… the import part would be the same, but the UI in 2.23 can not be used, as it’s hardcoded to legacy importer.


Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Morten Olav Hansen

Senior Engineer, DHIS 2

University of Oslo

http://www.dhis2.org

On Tue, Jun 7, 2016 at 11:25 PM, Uwe Wahser uwe@wahser.de wrote:

Hi Morten,

no, i didn’t. What would be the procedure for that? Importing Categories,

Options and CategoryCombinations via api and having DHIS2 generate the

CategoryOptionCombinations? Would that bring about any change at all or does the

importer use different libs for generating the COCs?

btw. is the 23 in the api link valid for future dhis2 versions? I noticed it in

a few api descriptions recently …

Regards, Uwe

Morten Olav Hansen morten@dhis2.org hat am 7. Juni 2016 um 18:50

geschrieben:

Hi Uwe

Did you try out new importer? Available as /api/23/metadata in 2.23

On Tuesday, 7 June 2016, Uwe Wahser uwe@wahser.de wrote:

Dear devs,

I am experiencing problems when handling category combinations. Our

protoype

with 5 dimensions went through the process of generating

categoryOptionCombinations (~20.000 records) quite well. 7 dimensions

(~400.000)

worked as well, although it took a very long time.

Now we defined the next datamodel with 10 dimensions (expecting ~5Mio

categoryOptionCombinations) and the process dies without further notice.

Last

words in catalina.out:

  • INFO 2016-06-07 13:29:33,783 Building object-bridge maps (preheatCache:

true,

3 classes). (DefaultObjectBridge.java [http-bio-8180-exec-15])

  • INFO 2016-06-07 13:29:36,779 Building object-bridge maps took 2.99

seconds.

(DefaultObjectBridge.java [http-bio-8180-exec-15])

  • INFO 2016-06-07 13:29:36,896 ‘admin’ update

org.hisp.dhis.dataelement.DataElementCategoryCombo, name: Membership, uid:

SCgLXYHqVzz (AuditLogUtil.java [http-bio-8180-exec-15])

Ten dimensions with not extraordinarily big option sets is actually not

unusual

and rather slim for multi-dimensional data-models in data warehouses, so

I’d

expect DHIS2 to be able to handle this easily.

Could of course be a memory problem (tried up to 14g for tomcat on a 4-core

Ubuntu 14.04 server, DHIS 2.23) Before I’ll start experimenting with other

parameters, I am hoping to get some hints on known limitations or

workarounds

from you (not allowed: reducing the number of options or categories,

sql-hacks

:slight_smile: ). Is there any info on whether optimizations on this process are being

planned in the kernel?

Some observations on the process:

  • during generation (either when saving the categoryCombination or in the

data

maintenance menu):

  • long names - cOCs are generated with generated names that are getting

extremely long as they are mere concats of the involved categoryOptions.

Could

there be an option to just use the codes as basis or to leave away the

names

completely? Could be one reason for a memory problem and performance

issues.

  • long log entries - every single entry is logged in catalina.out with

several

lines of text, causing catalina to become extremely big.

  • during execution lots of Java-memory are being used and no DB-memory,

which

looks to me as if all the logic is happening in the java machine. It might

be

more usefull to transfer more logic into SQLs to the DB (e.g. use DB

cross-joins

for combining options) as the DB will be more efficient.

  • because of the log entries I assume that every single combination is

being

persisted into the DB with a single SQL statement, causing millions of

single

SQL requests. Prefer batch SQL instead of single record processing.

  • during import/export of categoryOptionCombinations:
  • prefer batch SQL instead of single record processing
  • huge log entries in catalina.out due to several lines of text per

combination

I’d be very happy about comments.

Thanks in advance,

Uwe


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net <javascript:;>
Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

Morten Olav Hansen

Senior Engineer, DHIS 2

University of Oslo

http://www.dhis2.org

Lars Helge Øverland

Lead developer, DHIS 2

University of Oslo

Skype: larshelgeoverland

lars@dhis2.org

http://www.dhis2.org

Hi Lars,

I'm not using them for data-element disaggregation, I also understand that this
would be a bit awkward, as this would technically result in millions of
dataElements, if I understood the concept correctly.

I am using them as categoryCombination for a dataSet. In my current
understanding this is the one structure in DHIS2 that comes closest to classical
cubes with facts (aka dataelements) and dimensions (aka categories). Usecase
would be aggregated patient data with dataElements like {Headcount, Weight,
Size, FeesPaid} in the dataset with categories like {AgeGroup, Gender, JobGroup,
InsuranceScheme}. The data are not entered via data entry form (which would also
be awkward), but imported via api from a group-by sql on a source system with
relational DB. This enables reports/pivots/graphs like "Number of patients by
AgeGroup and Gender" or "Fees Collected by JobGroup and InsuranceScheme" or any
other combination of the categories. Works nicely for five dimensions so far.

Of course I could think about creating dataElements per AgeGroup or even
AgeGroup/Gender combination (which makes a lot of sense in the context of manual
data entry from manual summary reports), but for the sake of flexibility I
prefer datamodels with few dataElements and lots of dimensions.

Hope that clarifies the scenario a bit ...

Thanks,

Uwe

···

Lars Helge Øverland <lars@dhis2.org> hat am 7. Juni 2016 um 20:20 geschrieben:

Hi Uwe,

I agree ten dimensions is not much per se, but you might say 10 categories
for disaggregation per data element is a lot. Would it be possible to
redesign the model a bit, and rely more on data element group sets + groups
where you classify your data elements? This as opposed to having everything
as categories / options.

5 million option combos I think will in any case take some time to generate
and maintain. If you are willing to share some more info on your use-case
perhaps someone can offer some views.

regards,

Lars

On Tue, Jun 7, 2016 at 12:28 PM, Morten Olav Hansen <morten@dhis2.org> > wrote:

> Hi Uwe
>
> The improvements are mainly for speed and validation. Yes, we are now (in
> 2.24) introducing versioned web-api, so that endpoint importer will be
> available until 2.26 (we will support 3 versions). In 2.24, the same
> endpoint is available at /api/24/metadata.
>
> If you are using cURL, or another utility.. the import part would be the
> same, but the UI in 2.23 can not be used, as it's hardcoded to legacy
> importer.
>
> --
> Morten Olav Hansen
> Senior Engineer, DHIS 2
> University of Oslo
> http://www.dhis2.org
>
> On Tue, Jun 7, 2016 at 11:25 PM, Uwe Wahser <uwe@wahser.de> wrote:
>
>> Hi Morten,
>>
>> no, i didn't. What would be the procedure for that? Importing Categories,
>> Options and CategoryCombinations via api and having DHIS2 generate the
>> CategoryOptionCombinations? Would that bring about any change at all or
>> does the
>> importer use different libs for generating the COCs?
>>
>> btw. is the 23 in the api link valid for future dhis2 versions? I noticed
>> it in
>> a few api descriptions recently ...
>>
>> Regards, Uwe
>>
>> > Morten Olav Hansen <morten@dhis2.org> hat am 7. Juni 2016 um 18:50
>> > geschrieben:
>> >
>> >
>> > Hi Uwe
>> >
>> > Did you try out new importer? Available as /api/23/metadata in 2.23
>> >
>> > On Tuesday, 7 June 2016, Uwe Wahser <uwe@wahser.de> wrote:
>> >
>> > > Dear devs,
>> > >
>> > > I am experiencing problems when handling category combinations. Our
>> > > protoype
>> > > with 5 dimensions went through the process of generating
>> > > categoryOptionCombinations (~20.000 records) quite well. 7 dimensions
>> > > (~400.000)
>> > > worked as well, although it took a very long time.
>> > >
>> > > Now we defined the next datamodel with 10 dimensions (expecting ~5Mio
>> > > categoryOptionCombinations) and the process dies without further
>> notice.
>> > > Last
>> > > words in catalina.out:
>> > > * INFO 2016-06-07 13:29:33,783 Building object-bridge maps
>> (preheatCache:
>> > > true,
>> > > 3 classes). (DefaultObjectBridge.java [http-bio-8180-exec-15])
>> > > * INFO 2016-06-07 13:29:36,779 Building object-bridge maps took 2.99
>> > > seconds.
>> > > (DefaultObjectBridge.java [http-bio-8180-exec-15])
>> > > * INFO 2016-06-07 13:29:36,896 'admin' update
>> > > org.hisp.dhis.dataelement.DataElementCategoryCombo, name: Membership,
>> uid:
>> > > SCgLXYHqVzz (AuditLogUtil.java [http-bio-8180-exec-15])
>> > >
>> > > Ten dimensions with not extraordinarily big option sets is actually
>> not
>> > > unusual
>> > > and rather slim for multi-dimensional data-models in data warehouses,
>> so
>> > > I'd
>> > > expect DHIS2 to be able to handle this easily.
>> > >
>> > > Could of course be a memory problem (tried up to 14g for tomcat on a
>> 4-core
>> > > Ubuntu 14.04 server, DHIS 2.23) Before I'll start experimenting with
>> other
>> > > parameters, I am hoping to get some hints on known limitations or
>> > > workarounds
>> > > from you (not allowed: reducing the number of options or categories,
>> > > sql-hacks
>> > > :slight_smile: ). Is there any info on whether optimizations on this process are
>> being
>> > > planned in the kernel?
>> > >
>> > > Some observations on the process:
>> > >
>> > > * during generation (either when saving the categoryCombination or in
>> the
>> > > data
>> > > maintenance menu):
>> > > - long names - cOCs are generated with generated names that are
>> getting
>> > > extremely long as they are mere concats of the involved
>> categoryOptions.
>> > > Could
>> > > there be an option to just use the codes as basis or to leave away the
>> > > names
>> > > completely? Could be one reason for a memory problem and performance
>> > > issues.
>> > > - long log entries - every single entry is logged in catalina.out with
>> > > several
>> > > lines of text, causing catalina to become extremely big.
>> > > - during execution lots of Java-memory are being used and no
>> DB-memory,
>> > > which
>> > > looks to me as if all the logic is happening in the java machine. It
>> might
>> > > be
>> > > more usefull to transfer more logic into SQLs to the DB (e.g. use DB
>> > > cross-joins
>> > > for combining options) as the DB will be more efficient.
>> > > - because of the log entries I assume that every single combination is
>> > > being
>> > > persisted into the DB with a single SQL statement, causing millions of
>> > > single
>> > > SQL requests. Prefer batch SQL instead of single record processing.
>> > >
>> > > * during import/export of categoryOptionCombinations:
>> > > - prefer batch SQL instead of single record processing
>> > > - huge log entries in catalina.out due to several lines of text per
>> > > combination
>> > >
>> > > I'd be very happy about comments.
>> > >
>> > > Thanks in advance,
>> > >
>> > > Uwe
>> > >
>> > > _______________________________________________
>> > > Mailing list: DHIS 2 Users in Launchpad
>> > > Post to : dhis2-users@lists.launchpad.net <javascript:;>
>> > > Unsubscribe : DHIS 2 Users in Launchpad
>> > > More help : ListHelp - Launchpad Help
>> > >
>> >
>> >
>> > --
>> > --
>> > Morten Olav Hansen
>> > Senior Engineer, DHIS 2
>> > University of Oslo
>> > http://www.dhis2.org
>>
>
>
> _______________________________________________
> Mailing list: DHIS 2 developers in Launchpad
> Post to : dhis2-devs@lists.launchpad.net
> Unsubscribe : DHIS 2 developers in Launchpad
> More help : ListHelp - Launchpad Help
>
>

--
Lars Helge Øverland
Lead developer, DHIS 2
University of Oslo
Skype: larshelgeoverland
lars@dhis2.org
http://www.dhis2.org <https://www.dhis2.org/&gt;

Hi Morten,

sorry I had a little typing error: we are on 2.24. Is the GUI in 2.24 hooked to
the new importer?

Just to make sure, we are talking about the same thing: the problem does not
appear during import, but when generating of all possible combinations (when
saving the CategoryCombination or when manually evoking the update of
categoryOptionCombinations)

If you are using cURL, or another utility.. the import part would be the
same, but the UI in 2.23 can not be used, as it's hardcoded to legacy
importer.

so I can still use /api/metadata without version to call the current
api-version?

Thanks for your replies at this time of the day :slight_smile:

Regards, Uwe

···

---

Morten Olav Hansen <morten@dhis2.org> hat am 7. Juni 2016 um 19:28
geschrieben:

Hi Uwe

The improvements are mainly for speed and validation. Yes, we are now (in
2.24) introducing versioned web-api, so that endpoint importer will be
available until 2.26 (we will support 3 versions). In 2.24, the same
endpoint is available at /api/24/metadata.

If you are using cURL, or another utility.. the import part would be the
same, but the UI in 2.23 can not be used, as it's hardcoded to legacy
importer.

--
Morten Olav Hansen
Senior Engineer, DHIS 2
University of Oslo
http://www.dhis2.org

On Tue, Jun 7, 2016 at 11:25 PM, Uwe Wahser <uwe@wahser.de> wrote:

> Hi Morten,
>
> no, i didn't. What would be the procedure for that? Importing Categories,
> Options and CategoryCombinations via api and having DHIS2 generate the
> CategoryOptionCombinations? Would that bring about any change at all or
> does the
> importer use different libs for generating the COCs?
>
> btw. is the 23 in the api link valid for future dhis2 versions? I noticed
> it in
> a few api descriptions recently ...
>
> Regards, Uwe
>
> > Morten Olav Hansen <morten@dhis2.org> hat am 7. Juni 2016 um 18:50
> > geschrieben:
> >
> >
> > Hi Uwe
> >
> > Did you try out new importer? Available as /api/23/metadata in 2.23
> >
> > On Tuesday, 7 June 2016, Uwe Wahser <uwe@wahser.de> wrote:
> >
> > > Dear devs,
> > >
> > > I am experiencing problems when handling category combinations. Our
> > > protoype
> > > with 5 dimensions went through the process of generating
> > > categoryOptionCombinations (~20.000 records) quite well. 7 dimensions
> > > (~400.000)
> > > worked as well, although it took a very long time.
> > >
> > > Now we defined the next datamodel with 10 dimensions (expecting ~5Mio
> > > categoryOptionCombinations) and the process dies without further
> notice.
> > > Last
> > > words in catalina.out:
> > > * INFO 2016-06-07 13:29:33,783 Building object-bridge maps
> (preheatCache:
> > > true,
> > > 3 classes). (DefaultObjectBridge.java [http-bio-8180-exec-15])
> > > * INFO 2016-06-07 13:29:36,779 Building object-bridge maps took 2.99
> > > seconds.
> > > (DefaultObjectBridge.java [http-bio-8180-exec-15])
> > > * INFO 2016-06-07 13:29:36,896 'admin' update
> > > org.hisp.dhis.dataelement.DataElementCategoryCombo, name: Membership,
> uid:
> > > SCgLXYHqVzz (AuditLogUtil.java [http-bio-8180-exec-15])
> > >
> > > Ten dimensions with not extraordinarily big option sets is actually not
> > > unusual
> > > and rather slim for multi-dimensional data-models in data warehouses,
> so
> > > I'd
> > > expect DHIS2 to be able to handle this easily.
> > >
> > > Could of course be a memory problem (tried up to 14g for tomcat on a
> 4-core
> > > Ubuntu 14.04 server, DHIS 2.23) Before I'll start experimenting with
> other
> > > parameters, I am hoping to get some hints on known limitations or
> > > workarounds
> > > from you (not allowed: reducing the number of options or categories,
> > > sql-hacks
> > > :slight_smile: ). Is there any info on whether optimizations on this process are
> being
> > > planned in the kernel?
> > >
> > > Some observations on the process:
> > >
> > > * during generation (either when saving the categoryCombination or in
> the
> > > data
> > > maintenance menu):
> > > - long names - cOCs are generated with generated names that are getting
> > > extremely long as they are mere concats of the involved
> categoryOptions.
> > > Could
> > > there be an option to just use the codes as basis or to leave away the
> > > names
> > > completely? Could be one reason for a memory problem and performance
> > > issues.
> > > - long log entries - every single entry is logged in catalina.out with
> > > several
> > > lines of text, causing catalina to become extremely big.
> > > - during execution lots of Java-memory are being used and no DB-memory,
> > > which
> > > looks to me as if all the logic is happening in the java machine. It
> might
> > > be
> > > more usefull to transfer more logic into SQLs to the DB (e.g. use DB
> > > cross-joins
> > > for combining options) as the DB will be more efficient.
> > > - because of the log entries I assume that every single combination is
> > > being
> > > persisted into the DB with a single SQL statement, causing millions of
> > > single
> > > SQL requests. Prefer batch SQL instead of single record processing.
> > >
> > > * during import/export of categoryOptionCombinations:
> > > - prefer batch SQL instead of single record processing
> > > - huge log entries in catalina.out due to several lines of text per
> > > combination
> > >
> > > I'd be very happy about comments.
> > >
> > > Thanks in advance,
> > >
> > > Uwe
> > >
> > > _______________________________________________
> > > Mailing list: DHIS 2 Users in Launchpad
> > > Post to : dhis2-users@lists.launchpad.net <javascript:;>
> > > Unsubscribe : DHIS 2 Users in Launchpad
> > > More help : ListHelp - Launchpad Help
> > >
> >
> >
> > --
> > --
> > Morten Olav Hansen
> > Senior Engineer, DHIS 2
> > University of Oslo
> > http://www.dhis2.org
>

Just to make sure, we are talking about the same thing: the problem does
not
appear during import, but when generating of all possible combinations
(when
saving the CategoryCombination or when manually evoking the update of
categoryOptionCombinations)

Ah, sorry.. I was thinking it was the import that was slow.. so that part
is ok?

so I can still use /api/metadata without version to call the current
api-version?

That will give you the legacy importer, so going forward you would need to
use /api/{version}/{endpoint}, we will have more
info about it in the release notes.

And no, the UI is not switched to new importer yet (in 2.24), not 100% it
will...

···

Thanks for your replies at this time of the day :slight_smile:

Regards, Uwe

---

> Morten Olav Hansen <morten@dhis2.org> hat am 7. Juni 2016 um 19:28
> geschrieben:
>
>
> Hi Uwe
>
> The improvements are mainly for speed and validation. Yes, we are now (in
> 2.24) introducing versioned web-api, so that endpoint importer will be
> available until 2.26 (we will support 3 versions). In 2.24, the same
> endpoint is available at /api/24/metadata.
>
> If you are using cURL, or another utility.. the import part would be the
> same, but the UI in 2.23 can not be used, as it's hardcoded to legacy
> importer.
>
> --
> Morten Olav Hansen
> Senior Engineer, DHIS 2
> University of Oslo
> http://www.dhis2.org
>
> On Tue, Jun 7, 2016 at 11:25 PM, Uwe Wahser <uwe@wahser.de> wrote:
>
> > Hi Morten,
> >
> > no, i didn't. What would be the procedure for that? Importing
Categories,
> > Options and CategoryCombinations via api and having DHIS2 generate the
> > CategoryOptionCombinations? Would that bring about any change at all or
> > does the
> > importer use different libs for generating the COCs?
> >
> > btw. is the 23 in the api link valid for future dhis2 versions? I
noticed
> > it in
> > a few api descriptions recently ...
> >
> > Regards, Uwe
> >
> > > Morten Olav Hansen <morten@dhis2.org> hat am 7. Juni 2016 um 18:50
> > > geschrieben:
> > >
> > >
> > > Hi Uwe
> > >
> > > Did you try out new importer? Available as /api/23/metadata in 2.23
> > >
> > > On Tuesday, 7 June 2016, Uwe Wahser <uwe@wahser.de> wrote:
> > >
> > > > Dear devs,
> > > >
> > > > I am experiencing problems when handling category combinations. Our
> > > > protoype
> > > > with 5 dimensions went through the process of generating
> > > > categoryOptionCombinations (~20.000 records) quite well. 7
dimensions
> > > > (~400.000)
> > > > worked as well, although it took a very long time.
> > > >
> > > > Now we defined the next datamodel with 10 dimensions (expecting
~5Mio
> > > > categoryOptionCombinations) and the process dies without further
> > notice.
> > > > Last
> > > > words in catalina.out:
> > > > * INFO 2016-06-07 13:29:33,783 Building object-bridge maps
> > (preheatCache:
> > > > true,
> > > > 3 classes). (DefaultObjectBridge.java [http-bio-8180-exec-15])
> > > > * INFO 2016-06-07 13:29:36,779 Building object-bridge maps took
2.99
> > > > seconds.
> > > > (DefaultObjectBridge.java [http-bio-8180-exec-15])
> > > > * INFO 2016-06-07 13:29:36,896 'admin' update
> > > > org.hisp.dhis.dataelement.DataElementCategoryCombo, name:
Membership,
> > uid:
> > > > SCgLXYHqVzz (AuditLogUtil.java [http-bio-8180-exec-15])
> > > >
> > > > Ten dimensions with not extraordinarily big option sets is
actually not
> > > > unusual
> > > > and rather slim for multi-dimensional data-models in data
warehouses,
> > so
> > > > I'd
> > > > expect DHIS2 to be able to handle this easily.
> > > >
> > > > Could of course be a memory problem (tried up to 14g for tomcat on
a
> > 4-core
> > > > Ubuntu 14.04 server, DHIS 2.23) Before I'll start experimenting
with
> > other
> > > > parameters, I am hoping to get some hints on known limitations or
> > > > workarounds
> > > > from you (not allowed: reducing the number of options or
categories,
> > > > sql-hacks
> > > > :slight_smile: ). Is there any info on whether optimizations on this process
are
> > being
> > > > planned in the kernel?
> > > >
> > > > Some observations on the process:
> > > >
> > > > * during generation (either when saving the categoryCombination or
in
> > the
> > > > data
> > > > maintenance menu):
> > > > - long names - cOCs are generated with generated names that are
getting
> > > > extremely long as they are mere concats of the involved
> > categoryOptions.
> > > > Could
> > > > there be an option to just use the codes as basis or to leave away
the
> > > > names
> > > > completely? Could be one reason for a memory problem and
performance
> > > > issues.
> > > > - long log entries - every single entry is logged in catalina.out
with
> > > > several
> > > > lines of text, causing catalina to become extremely big.
> > > > - during execution lots of Java-memory are being used and no
DB-memory,
> > > > which
> > > > looks to me as if all the logic is happening in the java machine.
It
> > might
> > > > be
> > > > more usefull to transfer more logic into SQLs to the DB (e.g. use
DB
> > > > cross-joins
> > > > for combining options) as the DB will be more efficient.
> > > > - because of the log entries I assume that every single
combination is
> > > > being
> > > > persisted into the DB with a single SQL statement, causing
millions of
> > > > single
> > > > SQL requests. Prefer batch SQL instead of single record processing.
> > > >
> > > > * during import/export of categoryOptionCombinations:
> > > > - prefer batch SQL instead of single record processing
> > > > - huge log entries in catalina.out due to several lines of text per
> > > > combination
> > > >
> > > > I'd be very happy about comments.
> > > >
> > > > Thanks in advance,
> > > >
> > > > Uwe
> > > >
> > > > _______________________________________________
> > > > Mailing list: DHIS 2 Users in Launchpad
> > > > Post to : dhis2-users@lists.launchpad.net <javascript:;>
> > > > Unsubscribe : DHIS 2 Users in Launchpad
> > > > More help : ListHelp - Launchpad Help
> > > >
> > >
> > >
> > > --
> > > --
> > > Morten Olav Hansen
> > > Senior Engineer, DHIS 2
> > > University of Oslo
> > > http://www.dhis2.org
> >

Hi Uwe,

Are you importing this as aggregate data or as events?

Regards,

Jason

···

On Wed, Jun 8, 2016 at 2:27 AM, Morten Olav Hansen morten@dhis2.org wrote:


Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Just to make sure, we are talking about the same thing: the problem does not

appear during import, but when generating of all possible combinations (when

saving the CategoryCombination or when manually evoking the update of

categoryOptionCombinations)

Ah, sorry… I was thinking it was the import that was slow… so that part is ok?

so I can still use /api/metadata without version to call the current

api-version?

That will give you the legacy importer, so going forward you would need to use /api/{version}/{endpoint}, we will have more

info about it in the release notes.

And no, the UI is not switched to new importer yet (in 2.24), not 100% it will…

Thanks for your replies at this time of the day :slight_smile:

Regards, Uwe


Morten Olav Hansen morten@dhis2.org hat am 7. Juni 2016 um 19:28

geschrieben:

Hi Uwe

The improvements are mainly for speed and validation. Yes, we are now (in

2.24) introducing versioned web-api, so that endpoint importer will be

available until 2.26 (we will support 3 versions). In 2.24, the same

endpoint is available at /api/24/metadata.

If you are using cURL, or another utility… the import part would be the

same, but the UI in 2.23 can not be used, as it’s hardcoded to legacy

importer.

Morten Olav Hansen

Senior Engineer, DHIS 2

University of Oslo

http://www.dhis2.org

On Tue, Jun 7, 2016 at 11:25 PM, Uwe Wahser uwe@wahser.de wrote:

Hi Morten,

no, i didn’t. What would be the procedure for that? Importing Categories,

Options and CategoryCombinations via api and having DHIS2 generate the

CategoryOptionCombinations? Would that bring about any change at all or

does the

importer use different libs for generating the COCs?

btw. is the 23 in the api link valid for future dhis2 versions? I noticed

it in

a few api descriptions recently …

Regards, Uwe

Morten Olav Hansen morten@dhis2.org hat am 7. Juni 2016 um 18:50

geschrieben:

Hi Uwe

Did you try out new importer? Available as /api/23/metadata in 2.23

On Tuesday, 7 June 2016, Uwe Wahser uwe@wahser.de wrote:

Dear devs,

I am experiencing problems when handling category combinations. Our

protoype

with 5 dimensions went through the process of generating

categoryOptionCombinations (~20.000 records) quite well. 7 dimensions

(~400.000)

worked as well, although it took a very long time.

Now we defined the next datamodel with 10 dimensions (expecting ~5Mio

categoryOptionCombinations) and the process dies without further

notice.

Last

words in catalina.out:

  • INFO 2016-06-07 13:29:33,783 Building object-bridge maps

(preheatCache:

true,

3 classes). (DefaultObjectBridge.java [http-bio-8180-exec-15])

  • INFO 2016-06-07 13:29:36,779 Building object-bridge maps took 2.99

seconds.

(DefaultObjectBridge.java [http-bio-8180-exec-15])

  • INFO 2016-06-07 13:29:36,896 ‘admin’ update

org.hisp.dhis.dataelement.DataElementCategoryCombo, name: Membership,

uid:

SCgLXYHqVzz (AuditLogUtil.java [http-bio-8180-exec-15])

Ten dimensions with not extraordinarily big option sets is actually not

unusual

and rather slim for multi-dimensional data-models in data warehouses,

so

I’d

expect DHIS2 to be able to handle this easily.

Could of course be a memory problem (tried up to 14g for tomcat on a

4-core

Ubuntu 14.04 server, DHIS 2.23) Before I’ll start experimenting with

other

parameters, I am hoping to get some hints on known limitations or

workarounds

from you (not allowed: reducing the number of options or categories,

sql-hacks

:slight_smile: ). Is there any info on whether optimizations on this process are

being

planned in the kernel?

Some observations on the process:

  • during generation (either when saving the categoryCombination or in

the

data

maintenance menu):

  • long names - cOCs are generated with generated names that are getting

extremely long as they are mere concats of the involved

categoryOptions.

Could

there be an option to just use the codes as basis or to leave away the

names

completely? Could be one reason for a memory problem and performance

issues.

  • long log entries - every single entry is logged in catalina.out with

several

lines of text, causing catalina to become extremely big.

  • during execution lots of Java-memory are being used and no DB-memory,

which

looks to me as if all the logic is happening in the java machine. It

might

be

more usefull to transfer more logic into SQLs to the DB (e.g. use DB

cross-joins

for combining options) as the DB will be more efficient.

  • because of the log entries I assume that every single combination is

being

persisted into the DB with a single SQL statement, causing millions of

single

SQL requests. Prefer batch SQL instead of single record processing.

  • during import/export of categoryOptionCombinations:
  • prefer batch SQL instead of single record processing
  • huge log entries in catalina.out due to several lines of text per

combination

I’d be very happy about comments.

Thanks in advance,

Uwe


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net <javascript:;>

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

Morten Olav Hansen

Senior Engineer, DHIS 2

University of Oslo

http://www.dhis2.org

Jason P. Pickering
email: jason.p.pickering@gmail.com
tel:+46764147049

Ah, sorry.. I was thinking it was the import that was slow.. so that part
is ok?

The legacy importer is slow and creates huge logs - but I will not look at that
further, given that there is a new version coming up, thank you.

That will give you the legacy importer, so going forward you would need to
use /api/{version}/{endpoint}, we will have more
info about it in the release notes.

Hmm, that way I'll have to exchange version numbers in the external api calls
after every upgrade? Or will there be eventually a format that always calls the
endpoint-version, that corresponds to the currently installed kernel version?

Regards, Uwe

···

Morten Olav Hansen <morten@dhis2.org> hat am 8. Juni 2016 um 03:27
geschrieben:
> Just to make sure, we are talking about the same thing: the problem does
> not
> appear during import, but when generating of all possible combinations
> (when
> saving the CategoryCombination or when manually evoking the update of
> categoryOptionCombinations)
>

Ah, sorry.. I was thinking it was the import that was slow.. so that part
is ok?

> so I can still use /api/metadata without version to call the current
> api-version?
>

That will give you the legacy importer, so going forward you would need to
use /api/{version}/{endpoint}, we will have more
info about it in the release notes.

And no, the UI is not switched to new importer yet (in 2.24), not 100% it
will...

>
> Thanks for your replies at this time of the day :slight_smile:
>
> Regards, Uwe
>
> ---
>
>
> > Morten Olav Hansen <morten@dhis2.org> hat am 7. Juni 2016 um 19:28
> > geschrieben:
> >
> >
> > Hi Uwe
> >
> > The improvements are mainly for speed and validation. Yes, we are now (in
> > 2.24) introducing versioned web-api, so that endpoint importer will be
> > available until 2.26 (we will support 3 versions). In 2.24, the same
> > endpoint is available at /api/24/metadata.
> >
> > If you are using cURL, or another utility.. the import part would be the
> > same, but the UI in 2.23 can not be used, as it's hardcoded to legacy
> > importer.
> >
> > --
> > Morten Olav Hansen
> > Senior Engineer, DHIS 2
> > University of Oslo
> > http://www.dhis2.org
> >
> > On Tue, Jun 7, 2016 at 11:25 PM, Uwe Wahser <uwe@wahser.de> wrote:
> >
> > > Hi Morten,
> > >
> > > no, i didn't. What would be the procedure for that? Importing
> Categories,
> > > Options and CategoryCombinations via api and having DHIS2 generate the
> > > CategoryOptionCombinations? Would that bring about any change at all or
> > > does the
> > > importer use different libs for generating the COCs?
> > >
> > > btw. is the 23 in the api link valid for future dhis2 versions? I
> noticed
> > > it in
> > > a few api descriptions recently ...
> > >
> > > Regards, Uwe
> > >
> > > > Morten Olav Hansen <morten@dhis2.org> hat am 7. Juni 2016 um 18:50
> > > > geschrieben:
> > > >
> > > >
> > > > Hi Uwe
> > > >
> > > > Did you try out new importer? Available as /api/23/metadata in 2.23
> > > >
> > > > On Tuesday, 7 June 2016, Uwe Wahser <uwe@wahser.de> wrote:
> > > >
> > > > > Dear devs,
> > > > >
> > > > > I am experiencing problems when handling category combinations. Our
> > > > > protoype
> > > > > with 5 dimensions went through the process of generating
> > > > > categoryOptionCombinations (~20.000 records) quite well. 7
> dimensions
> > > > > (~400.000)
> > > > > worked as well, although it took a very long time.
> > > > >
> > > > > Now we defined the next datamodel with 10 dimensions (expecting
> ~5Mio
> > > > > categoryOptionCombinations) and the process dies without further
> > > notice.
> > > > > Last
> > > > > words in catalina.out:
> > > > > * INFO 2016-06-07 13:29:33,783 Building object-bridge maps
> > > (preheatCache:
> > > > > true,
> > > > > 3 classes). (DefaultObjectBridge.java [http-bio-8180-exec-15])
> > > > > * INFO 2016-06-07 13:29:36,779 Building object-bridge maps took
> 2.99
> > > > > seconds.
> > > > > (DefaultObjectBridge.java [http-bio-8180-exec-15])
> > > > > * INFO 2016-06-07 13:29:36,896 'admin' update
> > > > > org.hisp.dhis.dataelement.DataElementCategoryCombo, name:
> Membership,
> > > uid:
> > > > > SCgLXYHqVzz (AuditLogUtil.java [http-bio-8180-exec-15])
> > > > >
> > > > > Ten dimensions with not extraordinarily big option sets is
> actually not
> > > > > unusual
> > > > > and rather slim for multi-dimensional data-models in data
> warehouses,
> > > so
> > > > > I'd
> > > > > expect DHIS2 to be able to handle this easily.
> > > > >
> > > > > Could of course be a memory problem (tried up to 14g for tomcat on
> a
> > > 4-core
> > > > > Ubuntu 14.04 server, DHIS 2.23) Before I'll start experimenting
> with
> > > other
> > > > > parameters, I am hoping to get some hints on known limitations or
> > > > > workarounds
> > > > > from you (not allowed: reducing the number of options or
> categories,
> > > > > sql-hacks
> > > > > :slight_smile: ). Is there any info on whether optimizations on this process
> are
> > > being
> > > > > planned in the kernel?
> > > > >
> > > > > Some observations on the process:
> > > > >
> > > > > * during generation (either when saving the categoryCombination or
> in
> > > the
> > > > > data
> > > > > maintenance menu):
> > > > > - long names - cOCs are generated with generated names that are
> getting
> > > > > extremely long as they are mere concats of the involved
> > > categoryOptions.
> > > > > Could
> > > > > there be an option to just use the codes as basis or to leave away
> the
> > > > > names
> > > > > completely? Could be one reason for a memory problem and
> performance
> > > > > issues.
> > > > > - long log entries - every single entry is logged in catalina.out
> with
> > > > > several
> > > > > lines of text, causing catalina to become extremely big.
> > > > > - during execution lots of Java-memory are being used and no
> DB-memory,
> > > > > which
> > > > > looks to me as if all the logic is happening in the java machine.
> It
> > > might
> > > > > be
> > > > > more usefull to transfer more logic into SQLs to the DB (e.g. use
> DB
> > > > > cross-joins
> > > > > for combining options) as the DB will be more efficient.
> > > > > - because of the log entries I assume that every single
> combination is
> > > > > being
> > > > > persisted into the DB with a single SQL statement, causing
> millions of
> > > > > single
> > > > > SQL requests. Prefer batch SQL instead of single record processing.
> > > > >
> > > > > * during import/export of categoryOptionCombinations:
> > > > > - prefer batch SQL instead of single record processing
> > > > > - huge log entries in catalina.out due to several lines of text per
> > > > > combination
> > > > >
> > > > > I'd be very happy about comments.
> > > > >
> > > > > Thanks in advance,
> > > > >
> > > > > Uwe
> > > > >
> > > > > _______________________________________________
> > > > > Mailing list: DHIS 2 Users in Launchpad
> > > > > Post to : dhis2-users@lists.launchpad.net <javascript:;>
> > > > > Unsubscribe : DHIS 2 Users in Launchpad
> > > > > More help : ListHelp - Launchpad Help
> > > > >
> > > >
> > > >
> > > > --
> > > > --
> > > > Morten Olav Hansen
> > > > Senior Engineer, DHIS 2
> > > > University of Oslo
> > > > http://www.dhis2.org
> > >
>

Hi Jason,

importing aggregate date into data-sets (see my reply to Lars yesterday evening:
https://lists.launchpad.net/dhis2-users/msg10452.html\)

Again: the problem is not the import, but the combination of category options.
Maybe it would already help a lot, if those bombastic strings for the names
wouldn't be created for categoryOptionCombinations.

Thanks for good ideas,

Uwe

···

---

Jason Pickering <jason.p.pickering@gmail.com> hat am 8. Juni 2016 um 09:09
geschrieben:

Hi Uwe,

Are you importing this as aggregate data or as events?

Regards,
Jason

On Wed, Jun 8, 2016 at 2:27 AM, Morten Olav Hansen <morten@dhis2.org> wrote:

> Just to make sure, we are talking about the same thing: the problem does
>> not
>> appear during import, but when generating of all possible combinations
>> (when
>> saving the CategoryCombination or when manually evoking the update of
>> categoryOptionCombinations)
>>
>
> Ah, sorry.. I was thinking it was the import that was slow.. so that part
> is ok?
>
>
>> so I can still use /api/metadata without version to call the current
>> api-version?
>>
>
> That will give you the legacy importer, so going forward you would need to
> use /api/{version}/{endpoint}, we will have more
> info about it in the release notes.
>
> And no, the UI is not switched to new importer yet (in 2.24), not 100% it
> will...
>
>
>>
>> Thanks for your replies at this time of the day :slight_smile:
>>
>> Regards, Uwe
>>
>> ---
>>
>>
>> > Morten Olav Hansen <morten@dhis2.org> hat am 7. Juni 2016 um 19:28
>> > geschrieben:
>> >
>> >
>> > Hi Uwe
>> >
>> > The improvements are mainly for speed and validation. Yes, we are now
>> (in
>> > 2.24) introducing versioned web-api, so that endpoint importer will be
>> > available until 2.26 (we will support 3 versions). In 2.24, the same
>> > endpoint is available at /api/24/metadata.
>> >
>> > If you are using cURL, or another utility.. the import part would be the
>> > same, but the UI in 2.23 can not be used, as it's hardcoded to legacy
>> > importer.
>> >
>> > --
>> > Morten Olav Hansen
>> > Senior Engineer, DHIS 2
>> > University of Oslo
>> > http://www.dhis2.org
>> >
>> > On Tue, Jun 7, 2016 at 11:25 PM, Uwe Wahser <uwe@wahser.de> wrote:
>> >
>> > > Hi Morten,
>> > >
>> > > no, i didn't. What would be the procedure for that? Importing
>> Categories,
>> > > Options and CategoryCombinations via api and having DHIS2 generate the
>> > > CategoryOptionCombinations? Would that bring about any change at all
>> or
>> > > does the
>> > > importer use different libs for generating the COCs?
>> > >
>> > > btw. is the 23 in the api link valid for future dhis2 versions? I
>> noticed
>> > > it in
>> > > a few api descriptions recently ...
>> > >
>> > > Regards, Uwe
>> > >
>> > > > Morten Olav Hansen <morten@dhis2.org> hat am 7. Juni 2016 um 18:50
>> > > > geschrieben:
>> > > >
>> > > >
>> > > > Hi Uwe
>> > > >
>> > > > Did you try out new importer? Available as /api/23/metadata in 2.23
>> > > >
>> > > > On Tuesday, 7 June 2016, Uwe Wahser <uwe@wahser.de> wrote:
>> > > >
>> > > > > Dear devs,
>> > > > >
>> > > > > I am experiencing problems when handling category combinations.
>> Our
>> > > > > protoype
>> > > > > with 5 dimensions went through the process of generating
>> > > > > categoryOptionCombinations (~20.000 records) quite well. 7
>> dimensions
>> > > > > (~400.000)
>> > > > > worked as well, although it took a very long time.
>> > > > >
>> > > > > Now we defined the next datamodel with 10 dimensions (expecting
>> ~5Mio
>> > > > > categoryOptionCombinations) and the process dies without further
>> > > notice.
>> > > > > Last
>> > > > > words in catalina.out:
>> > > > > * INFO 2016-06-07 13:29:33,783 Building object-bridge maps
>> > > (preheatCache:
>> > > > > true,
>> > > > > 3 classes). (DefaultObjectBridge.java [http-bio-8180-exec-15])
>> > > > > * INFO 2016-06-07 13:29:36,779 Building object-bridge maps took
>> 2.99
>> > > > > seconds.
>> > > > > (DefaultObjectBridge.java [http-bio-8180-exec-15])
>> > > > > * INFO 2016-06-07 13:29:36,896 'admin' update
>> > > > > org.hisp.dhis.dataelement.DataElementCategoryCombo, name:
>> Membership,
>> > > uid:
>> > > > > SCgLXYHqVzz (AuditLogUtil.java [http-bio-8180-exec-15])
>> > > > >
>> > > > > Ten dimensions with not extraordinarily big option sets is
>> actually not
>> > > > > unusual
>> > > > > and rather slim for multi-dimensional data-models in data
>> warehouses,
>> > > so
>> > > > > I'd
>> > > > > expect DHIS2 to be able to handle this easily.
>> > > > >
>> > > > > Could of course be a memory problem (tried up to 14g for tomcat
>> on a
>> > > 4-core
>> > > > > Ubuntu 14.04 server, DHIS 2.23) Before I'll start experimenting
>> with
>> > > other
>> > > > > parameters, I am hoping to get some hints on known limitations or
>> > > > > workarounds
>> > > > > from you (not allowed: reducing the number of options or
>> categories,
>> > > > > sql-hacks
>> > > > > :slight_smile: ). Is there any info on whether optimizations on this process
>> are
>> > > being
>> > > > > planned in the kernel?
>> > > > >
>> > > > > Some observations on the process:
>> > > > >
>> > > > > * during generation (either when saving the categoryCombination
>> or in
>> > > the
>> > > > > data
>> > > > > maintenance menu):
>> > > > > - long names - cOCs are generated with generated names that are
>> getting
>> > > > > extremely long as they are mere concats of the involved
>> > > categoryOptions.
>> > > > > Could
>> > > > > there be an option to just use the codes as basis or to leave
>> away the
>> > > > > names
>> > > > > completely? Could be one reason for a memory problem and
>> performance
>> > > > > issues.
>> > > > > - long log entries - every single entry is logged in catalina.out
>> with
>> > > > > several
>> > > > > lines of text, causing catalina to become extremely big.
>> > > > > - during execution lots of Java-memory are being used and no
>> DB-memory,
>> > > > > which
>> > > > > looks to me as if all the logic is happening in the java machine.
>> It
>> > > might
>> > > > > be
>> > > > > more usefull to transfer more logic into SQLs to the DB (e.g. use
>> DB
>> > > > > cross-joins
>> > > > > for combining options) as the DB will be more efficient.
>> > > > > - because of the log entries I assume that every single
>> combination is
>> > > > > being
>> > > > > persisted into the DB with a single SQL statement, causing
>> millions of
>> > > > > single
>> > > > > SQL requests. Prefer batch SQL instead of single record
>> processing.
>> > > > >
>> > > > > * during import/export of categoryOptionCombinations:
>> > > > > - prefer batch SQL instead of single record processing
>> > > > > - huge log entries in catalina.out due to several lines of text
>> per
>> > > > > combination
>> > > > >
>> > > > > I'd be very happy about comments.
>> > > > >
>> > > > > Thanks in advance,
>> > > > >
>> > > > > Uwe
>> > > > >
>> > > > > _______________________________________________
>> > > > > Mailing list: DHIS 2 Users in Launchpad
>> > > > > Post to : dhis2-users@lists.launchpad.net <javascript:;>
>> > > > > Unsubscribe : DHIS 2 Users in Launchpad
>> > > > > More help : ListHelp - Launchpad Help
>> > > > >
>> > > >
>> > > >
>> > > > --
>> > > > --
>> > > > Morten Olav Hansen
>> > > > Senior Engineer, DHIS 2
>> > > > University of Oslo
>> > > > http://www.dhis2.org
>> > >
>>
>
>
> _______________________________________________
> Mailing list: DHIS 2 developers in Launchpad
> Post to : dhis2-devs@lists.launchpad.net
> Unsubscribe : DHIS 2 developers in Launchpad
> More help : ListHelp - Launchpad Help
>
>

--
Jason P. Pickering
email: jason.p.pickering@gmail.com
tel:+46764147049

It just seems like if you have five million cat combos, you would need many more orders of magnitudes of data to support them. If the data was imported as events, instead of aggregate, you would not need to explicitly create all of those dimensions, but could still create aggregate figures from them.

It just feels like there is no way all of those cat combos are going to be filled, unless you really have a TON of data.

Regards,

Jason

···

On Wed, Jun 8, 2016 at 2:36 PM, Uwe Wahser uwe@wahser.de wrote:

Hi Jason,

importing aggregate date into data-sets (see my reply to Lars yesterday evening:

https://lists.launchpad.net/dhis2-users/msg10452.html)

Again: the problem is not the import, but the combination of category options.

Maybe it would already help a lot, if those bombastic strings for the names

wouldn’t be created for categoryOptionCombinations.

Thanks for good ideas,

Uwe


Jason Pickering jason.p.pickering@gmail.com hat am 8. Juni 2016 um 09:09

geschrieben:

Hi Uwe,

Are you importing this as aggregate data or as events?

Regards,

Jason

On Wed, Jun 8, 2016 at 2:27 AM, Morten Olav Hansen morten@dhis2.org wrote:

Just to make sure, we are talking about the same thing: the problem does

not

appear during import, but when generating of all possible combinations

(when

saving the CategoryCombination or when manually evoking the update of

categoryOptionCombinations)

Ah, sorry… I was thinking it was the import that was slow… so that part

is ok?

so I can still use /api/metadata without version to call the current

api-version?

That will give you the legacy importer, so going forward you would need to

use /api/{version}/{endpoint}, we will have more

info about it in the release notes.

And no, the UI is not switched to new importer yet (in 2.24), not 100% it

will…

Thanks for your replies at this time of the day :slight_smile:

Regards, Uwe


Morten Olav Hansen morten@dhis2.org hat am 7. Juni 2016 um 19:28

geschrieben:

Hi Uwe

The improvements are mainly for speed and validation. Yes, we are now

(in

2.24) introducing versioned web-api, so that endpoint importer will be

available until 2.26 (we will support 3 versions). In 2.24, the same

endpoint is available at /api/24/metadata.

If you are using cURL, or another utility… the import part would be the

same, but the UI in 2.23 can not be used, as it’s hardcoded to legacy

importer.

Morten Olav Hansen

Senior Engineer, DHIS 2

University of Oslo

http://www.dhis2.org

On Tue, Jun 7, 2016 at 11:25 PM, Uwe Wahser uwe@wahser.de wrote:

Hi Morten,

no, i didn’t. What would be the procedure for that? Importing

Categories,

Options and CategoryCombinations via api and having DHIS2 generate the

CategoryOptionCombinations? Would that bring about any change at all

or

does the

importer use different libs for generating the COCs?

btw. is the 23 in the api link valid for future dhis2 versions? I

noticed

it in

a few api descriptions recently …

Regards, Uwe

Morten Olav Hansen morten@dhis2.org hat am 7. Juni 2016 um 18:50

geschrieben:

Hi Uwe

Did you try out new importer? Available as /api/23/metadata in 2.23

On Tuesday, 7 June 2016, Uwe Wahser uwe@wahser.de wrote:

Dear devs,

I am experiencing problems when handling category combinations.

Our

protoype

with 5 dimensions went through the process of generating

categoryOptionCombinations (~20.000 records) quite well. 7

dimensions

(~400.000)

worked as well, although it took a very long time.

Now we defined the next datamodel with 10 dimensions (expecting

~5Mio

categoryOptionCombinations) and the process dies without further

notice.

Last

words in catalina.out:

  • INFO 2016-06-07 13:29:33,783 Building object-bridge maps

(preheatCache:

true,

3 classes). (DefaultObjectBridge.java [http-bio-8180-exec-15])

  • INFO 2016-06-07 13:29:36,779 Building object-bridge maps took

2.99

seconds.

(DefaultObjectBridge.java [http-bio-8180-exec-15])

  • INFO 2016-06-07 13:29:36,896 ‘admin’ update

org.hisp.dhis.dataelement.DataElementCategoryCombo, name:

Membership,

uid:

SCgLXYHqVzz (AuditLogUtil.java [http-bio-8180-exec-15])

Ten dimensions with not extraordinarily big option sets is

actually not

unusual

and rather slim for multi-dimensional data-models in data

warehouses,

so

I’d

expect DHIS2 to be able to handle this easily.

Could of course be a memory problem (tried up to 14g for tomcat

on a

4-core

Ubuntu 14.04 server, DHIS 2.23) Before I’ll start experimenting

with

other

parameters, I am hoping to get some hints on known limitations or

workarounds

from you (not allowed: reducing the number of options or

categories,

sql-hacks

:slight_smile: ). Is there any info on whether optimizations on this process

are

being

planned in the kernel?

Some observations on the process:

  • during generation (either when saving the categoryCombination

or in

the

data

maintenance menu):

  • long names - cOCs are generated with generated names that are

getting

extremely long as they are mere concats of the involved

categoryOptions.

Could

there be an option to just use the codes as basis or to leave

away the

names

completely? Could be one reason for a memory problem and

performance

issues.

  • long log entries - every single entry is logged in catalina.out

with

several

lines of text, causing catalina to become extremely big.

  • during execution lots of Java-memory are being used and no

DB-memory,

which

looks to me as if all the logic is happening in the java machine.

It

might

be

more usefull to transfer more logic into SQLs to the DB (e.g. use

DB

cross-joins

for combining options) as the DB will be more efficient.

  • because of the log entries I assume that every single

combination is

being

persisted into the DB with a single SQL statement, causing

millions of

single

SQL requests. Prefer batch SQL instead of single record

processing.

  • during import/export of categoryOptionCombinations:
  • prefer batch SQL instead of single record processing
  • huge log entries in catalina.out due to several lines of text

per

combination

I’d be very happy about comments.

Thanks in advance,

Uwe


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net <javascript:;>

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

Morten Olav Hansen

Senior Engineer, DHIS 2

University of Oslo

http://www.dhis2.org


Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Jason P. Pickering

email: jason.p.pickering@gmail.com

tel:+46764147049

Jason P. Pickering
email: jason.p.pickering@gmail.com
tel:+46764147049

Hi Jason,

just to clarify: it's 1 CategoryCombo with ten Categories resulting in 50 Mio
CategoryOptionCombos (I misspelled this before). Theoretically this must be
multiplied by the number of dataelements in the dataset, the number of orgunits
and the number of periods (daily over 50 Years) to get the number of expected
dataValues.

In reality this number of dataValues will not be reached as there are functional
dependencies between options, thus leaving lots of combinations empty. Actually
I cannot predict just how many combinations (aka records) will pop up from the
group by SQL on the Source-System. In our current prototype with 5 Categories in
the CatCombo we are getting 4 Mio values in total, from which only 10.000 have
to be updated every day - which is a very reasonable number. I am actually
hoping for similar numbers with the extended 10-dim version because of those
functional dependencies.

The idea of using the tracker is interesting, although I'd have to get used to
the idea of using a granular level to upload aggregated data and rethink the
whole model. I think, I'd rather try to reduce the number of categories first (I
am currently down to 10Mio COCs and it seems to work).

How do you estimate the chances to get rid of some of the heavy things from
DHIS2 core when generating categoryOptionCombinations? I am especially thinking
of the extraordinary long names and the huge log-entires for every new
categoryOptionCombination (currently over 3000 characters log for each). This
would already take a lot of data-volume out of the generation process.

Regards, Uwe

···

Jason Pickering <jason.p.pickering@gmail.com> hat am 8. Juni 2016 um 15:44
geschrieben:

It just seems like if you have five million cat combos, you would need many
more orders of magnitudes of data to support them. If the data was imported
as events, instead of aggregate, you would not need to explicitly create
all of those dimensions, but could still create aggregate figures from
them.

It just feels like there is no way all of those cat combos are going to be
filled, unless you really have a TON of data.

Regards,
Jason

On Wed, Jun 8, 2016 at 2:36 PM, Uwe Wahser <uwe@wahser.de> wrote:

> Hi Jason,
>
> importing aggregate date into data-sets (see my reply to Lars yesterday
> evening:
> https://lists.launchpad.net/dhis2-users/msg10452.html\)
>
> Again: the problem is not the import, but the combination of category
> options.
> Maybe it would already help a lot, if those bombastic strings for the names
> wouldn't be created for categoryOptionCombinations.
>
> Thanks for good ideas,
>
> Uwe
>
> ---
> > Jason Pickering <jason.p.pickering@gmail.com> hat am 8. Juni 2016 um
> 09:09
> > geschrieben:
> >
> >
> > Hi Uwe,
> >
> > Are you importing this as aggregate data or as events?
> >
> > Regards,
> > Jason
> >
> >
> > On Wed, Jun 8, 2016 at 2:27 AM, Morten Olav Hansen <morten@dhis2.org> > > wrote:
> >
> > > Just to make sure, we are talking about the same thing: the problem
> does
> > >> not
> > >> appear during import, but when generating of all possible combinations
> > >> (when
> > >> saving the CategoryCombination or when manually evoking the update of
> > >> categoryOptionCombinations)
> > >>
> > >
> > > Ah, sorry.. I was thinking it was the import that was slow.. so that
> part
> > > is ok?
> > >
> > >
> > >> so I can still use /api/metadata without version to call the current
> > >> api-version?
> > >>
> > >
> > > That will give you the legacy importer, so going forward you would
> need to
> > > use /api/{version}/{endpoint}, we will have more
> > > info about it in the release notes.
> > >
> > > And no, the UI is not switched to new importer yet (in 2.24), not 100%
> it
> > > will...
> > >
> > >
> > >>
> > >> Thanks for your replies at this time of the day :slight_smile:
> > >>
> > >> Regards, Uwe
> > >>
> > >> ---
> > >>
> > >>
> > >> > Morten Olav Hansen <morten@dhis2.org> hat am 7. Juni 2016 um 19:28
> > >> > geschrieben:
> > >> >
> > >> >
> > >> > Hi Uwe
> > >> >
> > >> > The improvements are mainly for speed and validation. Yes, we are
> now
> > >> (in
> > >> > 2.24) introducing versioned web-api, so that endpoint importer will
> be
> > >> > available until 2.26 (we will support 3 versions). In 2.24, the same
> > >> > endpoint is available at /api/24/metadata.
> > >> >
> > >> > If you are using cURL, or another utility.. the import part would
> be the
> > >> > same, but the UI in 2.23 can not be used, as it's hardcoded to
> legacy
> > >> > importer.
> > >> >
> > >> > --
> > >> > Morten Olav Hansen
> > >> > Senior Engineer, DHIS 2
> > >> > University of Oslo
> > >> > http://www.dhis2.org
> > >> >
> > >> > On Tue, Jun 7, 2016 at 11:25 PM, Uwe Wahser <uwe@wahser.de> wrote:
> > >> >
> > >> > > Hi Morten,
> > >> > >
> > >> > > no, i didn't. What would be the procedure for that? Importing
> > >> Categories,
> > >> > > Options and CategoryCombinations via api and having DHIS2
> generate the
> > >> > > CategoryOptionCombinations? Would that bring about any change at
> all
> > >> or
> > >> > > does the
> > >> > > importer use different libs for generating the COCs?
> > >> > >
> > >> > > btw. is the 23 in the api link valid for future dhis2 versions? I
> > >> noticed
> > >> > > it in
> > >> > > a few api descriptions recently ...
> > >> > >
> > >> > > Regards, Uwe
> > >> > >
> > >> > > > Morten Olav Hansen <morten@dhis2.org> hat am 7. Juni 2016 um
> 18:50
> > >> > > > geschrieben:
> > >> > > >
> > >> > > >
> > >> > > > Hi Uwe
> > >> > > >
> > >> > > > Did you try out new importer? Available as /api/23/metadata in
> 2.23
> > >> > > >
> > >> > > > On Tuesday, 7 June 2016, Uwe Wahser <uwe@wahser.de> wrote:
> > >> > > >
> > >> > > > > Dear devs,
> > >> > > > >
> > >> > > > > I am experiencing problems when handling category
> combinations.
> > >> Our
> > >> > > > > protoype
> > >> > > > > with 5 dimensions went through the process of generating
> > >> > > > > categoryOptionCombinations (~20.000 records) quite well. 7
> > >> dimensions
> > >> > > > > (~400.000)
> > >> > > > > worked as well, although it took a very long time.
> > >> > > > >
> > >> > > > > Now we defined the next datamodel with 10 dimensions
> (expecting
> > >> ~5Mio
> > >> > > > > categoryOptionCombinations) and the process dies without
> further
> > >> > > notice.
> > >> > > > > Last
> > >> > > > > words in catalina.out:
> > >> > > > > * INFO 2016-06-07 13:29:33,783 Building object-bridge maps
> > >> > > (preheatCache:
> > >> > > > > true,
> > >> > > > > 3 classes). (DefaultObjectBridge.java [http-bio-8180-exec-15])
> > >> > > > > * INFO 2016-06-07 13:29:36,779 Building object-bridge maps
> took
> > >> 2.99
> > >> > > > > seconds.
> > >> > > > > (DefaultObjectBridge.java [http-bio-8180-exec-15])
> > >> > > > > * INFO 2016-06-07 13:29:36,896 'admin' update
> > >> > > > > org.hisp.dhis.dataelement.DataElementCategoryCombo, name:
> > >> Membership,
> > >> > > uid:
> > >> > > > > SCgLXYHqVzz (AuditLogUtil.java [http-bio-8180-exec-15])
> > >> > > > >
> > >> > > > > Ten dimensions with not extraordinarily big option sets is
> > >> actually not
> > >> > > > > unusual
> > >> > > > > and rather slim for multi-dimensional data-models in data
> > >> warehouses,
> > >> > > so
> > >> > > > > I'd
> > >> > > > > expect DHIS2 to be able to handle this easily.
> > >> > > > >
> > >> > > > > Could of course be a memory problem (tried up to 14g for
> tomcat
> > >> on a
> > >> > > 4-core
> > >> > > > > Ubuntu 14.04 server, DHIS 2.23) Before I'll start
> experimenting
> > >> with
> > >> > > other
> > >> > > > > parameters, I am hoping to get some hints on known
> limitations or
> > >> > > > > workarounds
> > >> > > > > from you (not allowed: reducing the number of options or
> > >> categories,
> > >> > > > > sql-hacks
> > >> > > > > :slight_smile: ). Is there any info on whether optimizations on this
> process
> > >> are
> > >> > > being
> > >> > > > > planned in the kernel?
> > >> > > > >
> > >> > > > > Some observations on the process:
> > >> > > > >
> > >> > > > > * during generation (either when saving the
> categoryCombination
> > >> or in
> > >> > > the
> > >> > > > > data
> > >> > > > > maintenance menu):
> > >> > > > > - long names - cOCs are generated with generated names that
> are
> > >> getting
> > >> > > > > extremely long as they are mere concats of the involved
> > >> > > categoryOptions.
> > >> > > > > Could
> > >> > > > > there be an option to just use the codes as basis or to leave
> > >> away the
> > >> > > > > names
> > >> > > > > completely? Could be one reason for a memory problem and
> > >> performance
> > >> > > > > issues.
> > >> > > > > - long log entries - every single entry is logged in
> catalina.out
> > >> with
> > >> > > > > several
> > >> > > > > lines of text, causing catalina to become extremely big.
> > >> > > > > - during execution lots of Java-memory are being used and no
> > >> DB-memory,
> > >> > > > > which
> > >> > > > > looks to me as if all the logic is happening in the java
> machine.
> > >> It
> > >> > > might
> > >> > > > > be
> > >> > > > > more usefull to transfer more logic into SQLs to the DB (e.g.
> use
> > >> DB
> > >> > > > > cross-joins
> > >> > > > > for combining options) as the DB will be more efficient.
> > >> > > > > - because of the log entries I assume that every single
> > >> combination is
> > >> > > > > being
> > >> > > > > persisted into the DB with a single SQL statement, causing
> > >> millions of
> > >> > > > > single
> > >> > > > > SQL requests. Prefer batch SQL instead of single record
> > >> processing.
> > >> > > > >
> > >> > > > > * during import/export of categoryOptionCombinations:
> > >> > > > > - prefer batch SQL instead of single record processing
> > >> > > > > - huge log entries in catalina.out due to several lines of
> text
> > >> per
> > >> > > > > combination
> > >> > > > >
> > >> > > > > I'd be very happy about comments.
> > >> > > > >
> > >> > > > > Thanks in advance,
> > >> > > > >
> > >> > > > > Uwe
> > >> > > > >
> > >> > > > > _______________________________________________
> > >> > > > > Mailing list: DHIS 2 Users in Launchpad
> > >> > > > > Post to : dhis2-users@lists.launchpad.net <javascript:;>
> > >> > > > > Unsubscribe : DHIS 2 Users in Launchpad
> > >> > > > > More help : ListHelp - Launchpad Help
> > >> > > > >
> > >> > > >
> > >> > > >
> > >> > > > --
> > >> > > > --
> > >> > > > Morten Olav Hansen
> > >> > > > Senior Engineer, DHIS 2
> > >> > > > University of Oslo
> > >> > > > http://www.dhis2.org
> > >> > >
> > >>
> > >
> > >
> > > _______________________________________________
> > > Mailing list: DHIS 2 developers in Launchpad
> > > Post to : dhis2-devs@lists.launchpad.net
> > > Unsubscribe : DHIS 2 developers in Launchpad
> > > More help : ListHelp - Launchpad Help
> > >
> > >
> >
> >
> > --
> > Jason P. Pickering
> > email: jason.p.pickering@gmail.com
> > tel:+46764147049
>

--
Jason P. Pickering
email: jason.p.pickering@gmail.com
tel:+46764147049

I am not talking about tracker, but rather anonymous events. So, again, I have no idea what your data looks like, but I will take a stab.

Age: As an integer or if you have it, the date of birth
Gender: As an option set (Male/Female)

JobGroup: As an option set

Insurance scheme: As an option set

Weight: As an integer, I guess…
Size: ??

FeesPaid: As numeric

The advantage as representing this as events is that Age, Gender, Job Group, Insurance scheme can be used to aggregate “FeesPaid” in the event reports, but without explicitly defining the dimensions. Thus you only create the dimensions (and database index size) you actually need, and don’t end up with many empty cat option combos, but rather can simply count the events across those dimensions in the event reports.

Again, no idea what you data looks like, it just seems that maybe you are choosing a difficult way to represent the data, especially, if you are going to end up with a lot of cat option combos which don’t have any data.

Regards,

jason

···

On Wed, Jun 8, 2016 at 4:31 PM, Uwe Wahser uwe@wahser.de wrote:

Hi Jason,

just to clarify: it’s 1 CategoryCombo with ten Categories resulting in 50 Mio

CategoryOptionCombos (I misspelled this before). Theoretically this must be

multiplied by the number of dataelements in the dataset, the number of orgunits

and the number of periods (daily over 50 Years) to get the number of expected

dataValues.

In reality this number of dataValues will not be reached as there are functional

dependencies between options, thus leaving lots of combinations empty. Actually

I cannot predict just how many combinations (aka records) will pop up from the

group by SQL on the Source-System. In our current prototype with 5 Categories in

the CatCombo we are getting 4 Mio values in total, from which only 10.000 have

to be updated every day - which is a very reasonable number. I am actually

hoping for similar numbers with the extended 10-dim version because of those

functional dependencies.

The idea of using the tracker is interesting, although I’d have to get used to

the idea of using a granular level to upload aggregated data and rethink the

whole model. I think, I’d rather try to reduce the number of categories first (I

am currently down to 10Mio COCs and it seems to work).

How do you estimate the chances to get rid of some of the heavy things from

DHIS2 core when generating categoryOptionCombinations? I am especially thinking

of the extraordinary long names and the huge log-entires for every new

categoryOptionCombination (currently over 3000 characters log for each). This

would already take a lot of data-volume out of the generation process.

Regards, Uwe

Jason Pickering jason.p.pickering@gmail.com hat am 8. Juni 2016 um 15:44

geschrieben:

It just seems like if you have five million cat combos, you would need many

more orders of magnitudes of data to support them. If the data was imported

as events, instead of aggregate, you would not need to explicitly create

all of those dimensions, but could still create aggregate figures from

them.

It just feels like there is no way all of those cat combos are going to be

filled, unless you really have a TON of data.

Regards,

Jason

On Wed, Jun 8, 2016 at 2:36 PM, Uwe Wahser uwe@wahser.de wrote:

Hi Jason,

importing aggregate date into data-sets (see my reply to Lars yesterday

evening:

https://lists.launchpad.net/dhis2-users/msg10452.html)

Again: the problem is not the import, but the combination of category

options.

Maybe it would already help a lot, if those bombastic strings for the names

wouldn’t be created for categoryOptionCombinations.

Thanks for good ideas,

Uwe


Jason Pickering jason.p.pickering@gmail.com hat am 8. Juni 2016 um

09:09

geschrieben:

Hi Uwe,

Are you importing this as aggregate data or as events?

Regards,

Jason

On Wed, Jun 8, 2016 at 2:27 AM, Morten Olav Hansen morten@dhis2.org

wrote:

Just to make sure, we are talking about the same thing: the problem

does

not

appear during import, but when generating of all possible combinations

(when

saving the CategoryCombination or when manually evoking the update of

categoryOptionCombinations)

Ah, sorry… I was thinking it was the import that was slow… so that

part

is ok?

so I can still use /api/metadata without version to call the current

api-version?

That will give you the legacy importer, so going forward you would

need to

use /api/{version}/{endpoint}, we will have more

info about it in the release notes.

And no, the UI is not switched to new importer yet (in 2.24), not 100%

it

will…

Thanks for your replies at this time of the day :slight_smile:

Regards, Uwe


Morten Olav Hansen morten@dhis2.org hat am 7. Juni 2016 um 19:28

geschrieben:

Hi Uwe

The improvements are mainly for speed and validation. Yes, we are

now

(in

2.24) introducing versioned web-api, so that endpoint importer will

be

available until 2.26 (we will support 3 versions). In 2.24, the same

endpoint is available at /api/24/metadata.

If you are using cURL, or another utility… the import part would

be the

same, but the UI in 2.23 can not be used, as it’s hardcoded to

legacy

importer.

Morten Olav Hansen

Senior Engineer, DHIS 2

University of Oslo

http://www.dhis2.org

On Tue, Jun 7, 2016 at 11:25 PM, Uwe Wahser uwe@wahser.de wrote:

Hi Morten,

no, i didn’t. What would be the procedure for that? Importing

Categories,

Options and CategoryCombinations via api and having DHIS2

generate the

CategoryOptionCombinations? Would that bring about any change at

all

or

does the

importer use different libs for generating the COCs?

btw. is the 23 in the api link valid for future dhis2 versions? I

noticed

it in

a few api descriptions recently …

Regards, Uwe

Morten Olav Hansen morten@dhis2.org hat am 7. Juni 2016 um

18:50

geschrieben:

Hi Uwe

Did you try out new importer? Available as /api/23/metadata in

2.23

On Tuesday, 7 June 2016, Uwe Wahser uwe@wahser.de wrote:

Dear devs,

I am experiencing problems when handling category

combinations.

Our

protoype

with 5 dimensions went through the process of generating

categoryOptionCombinations (~20.000 records) quite well. 7

dimensions

(~400.000)

worked as well, although it took a very long time.

Now we defined the next datamodel with 10 dimensions

(expecting

~5Mio

categoryOptionCombinations) and the process dies without

further

notice.

Last

words in catalina.out:

  • INFO 2016-06-07 13:29:33,783 Building object-bridge maps

(preheatCache:

true,

3 classes). (DefaultObjectBridge.java [http-bio-8180-exec-15])

  • INFO 2016-06-07 13:29:36,779 Building object-bridge maps

took

2.99

seconds.

(DefaultObjectBridge.java [http-bio-8180-exec-15])

  • INFO 2016-06-07 13:29:36,896 ‘admin’ update

org.hisp.dhis.dataelement.DataElementCategoryCombo, name:

Membership,

uid:

SCgLXYHqVzz (AuditLogUtil.java [http-bio-8180-exec-15])

Ten dimensions with not extraordinarily big option sets is

actually not

unusual

and rather slim for multi-dimensional data-models in data

warehouses,

so

I’d

expect DHIS2 to be able to handle this easily.

Could of course be a memory problem (tried up to 14g for

tomcat

on a

4-core

Ubuntu 14.04 server, DHIS 2.23) Before I’ll start

experimenting

with

other

parameters, I am hoping to get some hints on known

limitations or

workarounds

from you (not allowed: reducing the number of options or

categories,

sql-hacks

:slight_smile: ). Is there any info on whether optimizations on this

process

are

being

planned in the kernel?

Some observations on the process:

  • during generation (either when saving the

categoryCombination

or in

the

data

maintenance menu):

  • long names - cOCs are generated with generated names that

are

getting

extremely long as they are mere concats of the involved

categoryOptions.

Could

there be an option to just use the codes as basis or to leave

away the

names

completely? Could be one reason for a memory problem and

performance

issues.

  • long log entries - every single entry is logged in

catalina.out

with

several

lines of text, causing catalina to become extremely big.

  • during execution lots of Java-memory are being used and no

DB-memory,

which

looks to me as if all the logic is happening in the java

machine.

It

might

be

more usefull to transfer more logic into SQLs to the DB (e.g.

use

DB

cross-joins

for combining options) as the DB will be more efficient.

  • because of the log entries I assume that every single

combination is

being

persisted into the DB with a single SQL statement, causing

millions of

single

SQL requests. Prefer batch SQL instead of single record

processing.

  • during import/export of categoryOptionCombinations:
  • prefer batch SQL instead of single record processing
  • huge log entries in catalina.out due to several lines of

text

per

combination

I’d be very happy about comments.

Thanks in advance,

Uwe


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net <javascript:;>

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

Morten Olav Hansen

Senior Engineer, DHIS 2

University of Oslo

http://www.dhis2.org


Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Jason P. Pickering

email: jason.p.pickering@gmail.com

tel:+46764147049

Jason P. Pickering

email: jason.p.pickering@gmail.com

tel:+46764147049

Jason P. Pickering
email: jason.p.pickering@gmail.com
tel:+46764147049

Hi Uwe,

cat option combo names: Currently the max length is 255 characters which I guess can be a problem when you have a high number of dimensions. I have changed it to text type (unlimited length) now in trunk and 2.23. Please try again now. Note that to produce 50m objects you will need significant memory for the JVM.

I am not sure if removing the name will make it less heavy or help, and people have been asking for stable names for option combos for a long time, so I think we will leave them where they are.

I do understand the problem with a high number of disaggs, and 10 dimensions is not unreasonable. I see you are producing the data from a SQL group by on your “case” / transactional data, and is therefore aggregate data. That said, the fact table (“data value” solution) in DHIS 2 is not really meant to cater for extremely high number of disaggregations of the same data elements, as we predefine possible disaggregations through the category option combos. So going with the event model (as Jason suggests) could be more appropriate for this type of data which is aggregated but still is very fine-grained. That means you can use the Event reports / Event visualizer apps to analyze your data (and use the /api/analytics/event resource from an API perspective).

As a work-around you could in fact define two category combinations, as data values are linked to both data elements and data sets (category option combination and attribute option combination). So you take 5 of your dimensions and create one (data element) category combo, then 5 or your dimensions and create one (data set) category combo. You now need to link your data values to two different option combos which is a bit inconvenient, but it will probably solve your immediate need, as the number of option combos goes from (opt = category options)

opt ^ 10

to

(opt ^ 5) * 2

That said, this is not a scalable solution, and using the event model might be more appropriate.

regards,

Lars

···

On Wed, Jun 8, 2016 at 5:22 PM, Jason Pickering jason.p.pickering@gmail.com wrote:

I am not talking about tracker, but rather anonymous events. So, again, I have no idea what your data looks like, but I will take a stab.

Age: As an integer or if you have it, the date of birth
Gender: As an option set (Male/Female)

JobGroup: As an option set

Insurance scheme: As an option set

Weight: As an integer, I guess…
Size: ??

FeesPaid: As numeric

The advantage as representing this as events is that Age, Gender, Job Group, Insurance scheme can be used to aggregate “FeesPaid” in the event reports, but without explicitly defining the dimensions. Thus you only create the dimensions (and database index size) you actually need, and don’t end up with many empty cat option combos, but rather can simply count the events across those dimensions in the event reports.

Again, no idea what you data looks like, it just seems that maybe you are choosing a difficult way to represent the data, especially, if you are going to end up with a lot of cat option combos which don’t have any data.

Regards,

jason