import thread

bobj · 6 February 2012 20:32

One of the difficulties of providing good user feedback on import is
the that we spawn an extra thread to do the actual importing and we
don't have very sophisticated inter-thread communication with that
worker beyond the "status message" which is a transient thing. And
better logging is not a substitute for user feedback.

Is there a really compelling reason to spawn this extra thread? Doing
the import synchronously (in the same thread as the action) would make
it much simpler to provide progressive and useful feedback to the
user. There is a general principle in UI design that you want to keep
the UI responsive during long-running operations but I am not sure
that should necessarily be the case here. It's more important to have
better feedback and you actually want the user to wait until the
process is complete. Of course this can be done between two threads
but it seems kind of unnecessary - and anyway the assumption would
still be that the user does not navigate away from the page while the
import is continuing. Does anyone have an opinion?

The other alternative would be to progressively build up (and store?)
a report of happenings during the import process and allow the user to
browse back through previous imports. This can be nice but more
complex than just running a synchronous thread.

Bob

jason · 7 February 2012 04:45

Hi Bob,

As usual I have an opinion.

First and foremost, the import process as I think everyone knows, is rather fragile. I am glad to see a few of the issues being fixed (thrown, uncaught exceptions!). In distributed systems, we do not have full control over the metadata of course as has been outlined in the blueprint for 2.7 https://blueprints.launchpad.net/dhis2/+spec/separation-of-meta-data-and-data-values. This is of course even more the case in data warehousing scenarios, where we may have many slightly different versions of DHIS2 out there, with similar, but perhaps slightly different metadata. The situation in Nigeria is a good example of this, where we have multiple parties running DHIS2. The metadata is quite similar in many of these systems, but not 100% the same. One of course would need to be very careful about importing data in this situation, and clear user feedback would be very important to try and understand exactly what is going to happen before and during an import.

Not really knowing the background of why separate threads were spawned from the beginning, it is hard for me to comment here, however, I really like the option in DHIS 1.4 which provides an option of viewing a report after an import. In 1.4 of course, the operation is synchronous, and can take an exceedingly long time, so there can be a reason for the user not to wait for this process to finish. However, given the fragility of the process in DHIS2, I usually sit and monitor the log in real time to see what is happening. Of course, this may not be appropriate or useful for most users, but regardless of how it is done, I think having the option to view a report would be a very useful piece of functionality. Because of the fact that so little information is provided to the user and its noted fragility, I normally end up doing the import of data, although eventually, this operation should be delegated to the actual owners of the system, and not an external consultant. At least in this scenario, the users could attempt to do the imports and then provide the detailed log to “tech support”, which might be an administrator, consultant, or the mailing list when something goes awry.

One (possible) easy solution would be a dedicated log, which we could configure using log4j, similar to the audit log. At least this would separate the import process away from the main log, and might make things a bit easier to diagnose. Of course, having some sort of log reading module like OpenMRS (as we discussed in our chat the other day) would make the retrieval of such a log a lot easier.

One final question for me is what state the database is left in after an aborted import. I assume there is not a BEGIN/COMMIT or SAVEPOINT block on the database which is started at the beginning of the import process? If we get half way down the import, and something fails, is everything rolled back to the state the database was to begin with, is the partial import committed?

Thanks for looking into this.

Best regards,

Jason

···

On Mon, Feb 6, 2012 at 10:32 PM, Bob Jolliffe bobjolliffe@gmail.com wrote:

One of the difficulties of providing good user feedback on import is

the that we spawn an extra thread to do the actual importing and we

don’t have very sophisticated inter-thread communication with that

worker beyond the “status message” which is a transient thing. And

better logging is not a substitute for user feedback.

Is there a really compelling reason to spawn this extra thread? Doing

the import synchronously (in the same thread as the action) would make

it much simpler to provide progressive and useful feedback to the

user. There is a general principle in UI design that you want to keep

the UI responsive during long-running operations but I am not sure

that should necessarily be the case here. It’s more important to have

better feedback and you actually want the user to wait until the

process is complete. Of course this can be done between two threads

but it seems kind of unnecessary - and anyway the assumption would

still be that the user does not navigate away from the page while the

import is continuing. Does anyone have an opinion?

The other alternative would be to progressively build up (and store?)

a report of happenings during the import process and allow the user to

browse back through previous imports. This can be nice but more

complex than just running a synchronous thread.

Bob

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

bobj · 7 February 2012 09:53

Hi Jason

Thanks for the detailed response. I'm trying to look right now at
incremental improvement of the current process by providing better
feedback to users - running a single thread instead of the background
cave process can provide a quick win here.

More generally, I would like to move completely to a dxf2 way of
importing data where the importing of metadata is separate as per the
blueprint. Currently there are two issues to solve:
(i) dimensions of data - short term we can continue to use
categoryoptioncombo between dhis instances or take the plunge with
concepts
(ii) harmonisation of uids - this is a process issue. For the data
exchange to work, the sending system (typically lower in the reporting
hierarchy) will have to import metadata from the higher (national?)
system in order to populate its uids.

I've also given some consideration to a dedicated log ... haven't
decided yet. Having an import log file would be fine but I am
wobbling over two questions: (i) in general we would prefer to use the
database for logging and (ii) it's not clear whether we want to
persist these import logs or whether the requirement is just for the
results of the most recent import. Possibly the solution here is to
configure some sort of custom jdbc appender.

Thinking again more generally, there are also scenarios of
applications potentially pushing data (rather than pulling a file
through the UI) as well as any number of other endpoint types if we
ever use an integration framework such as apache camel

In answer to your other question, the semantics around failure is
generally not well defined and needs to be addressed.

Bob

···

On 7 February 2012 04:45, Jason Pickering <jason.p.pickering@gmail.com> wrote:

Hi Bob,

As usual I have an opinion.

First and foremost, the import process as I think everyone knows, is rather
fragile. I am glad to see a few of the issues being fixed (thrown, uncaught
exceptions!). In distributed systems, we do not have full control over the
metadata of course as has been outlined in the blueprint for
2.7 https://blueprints.launchpad.net/dhis2/+spec/separation-of-meta-data-and-data-values\.
This is of course even more the case in data warehousing scenarios, where we
may have many slightly different versions of DHIS2 out there, with similar,
but perhaps slightly different metadata. The situation in Nigeria is a good
example of this, where we have multiple parties running DHIS2. The metadata
is quite similar in many of these systems, but not 100% the same. One of
course would need to be very careful about importing data in this situation,
and clear user feedback would be very important to try and understand
exactly what is going to happen before and during an import.

Not really knowing the background of why separate threads were spawned from
the beginning, it is hard for me to comment here, however, I really like the
option in DHIS 1.4 which provides an option of viewing a report after an
import. In 1.4 of course, the operation is synchronous, and can take an
exceedingly long time, so there can be a reason for the user not to wait for
this process to finish. However, given the fragility of the process in
DHIS2, I usually sit and monitor the log in real time to see what is
happening. Of course, this may not be appropriate or useful for most users,
but regardless of how it is done, I think having the option to view a report
would be a very useful piece of functionality. Because of the fact that so
little information is provided to the user and its noted fragility, I
normally end up doing the import of data, although eventually, this
operation should be delegated to the actual owners of the system, and not an
external consultant. At least in this scenario, the users could attempt to
do the imports and then provide the detailed log to "tech support", which
might be an administrator, consultant, or the mailing list when something
goes awry.

One (possible) easy solution would be a dedicated log, which we could
configure using log4j, similar to the audit log. At least this
would separate the import process away from the main log, and might make
things a bit easier to diagnose. Of course, having some sort of log reading
module like OpenMRS (as we discussed in our chat the other day) would make
the retrieval of such a log a lot easier.

One final question for me is what state the database is left in after an
aborted import. I assume there is not a BEGIN/COMMIT or SAVEPOINT block on
the database which is started at the beginning of the import process? If we
get half way down the import, and something fails, is everything rolled back
to the state the database was to begin with, is the partial import
committed?

Thanks for looking into this.

Best regards,
Jason

On Mon, Feb 6, 2012 at 10:32 PM, Bob Jolliffe <bobjolliffe@gmail.com> wrote:

One of the difficulties of providing good user feedback on import is
the that we spawn an extra thread to do the actual importing and we
don't have very sophisticated inter-thread communication with that
worker beyond the "status message" which is a transient thing. And
better logging is not a substitute for user feedback.

Is there a really compelling reason to spawn this extra thread? Doing
the import synchronously (in the same thread as the action) would make
it much simpler to provide progressive and useful feedback to the
user. There is a general principle in UI design that you want to keep
the UI responsive during long-running operations but I am not sure
that should necessarily be the case here. It's more important to have
better feedback and you actually want the user to wait until the
process is complete. Of course this can be done between two threads
but it seems kind of unnecessary - and anyway the assumption would
still be that the user does not navigate away from the page while the
import is continuing. Does anyone have an opinion?

The other alternative would be to progressively build up (and store?)
a report of happenings during the import process and allow the user to
browse back through previous imports. This can be nice but more
complex than just running a synchronous thread.

Bob

_______________________________________________
Mailing list: DHIS 2 developers in Launchpad
Post to : dhis2-devs@lists.launchpad.net
Unsubscribe : DHIS 2 developers in Launchpad
More help : ListHelp - Launchpad Help

Juma_Lungo · 7 February 2012 11:24

Hi

Is it possible to have IF Statement while defining indicators? For example, given data elements

D1 - Data element 1

D2 - Data element 2

D3 - Data element 3

Then, under Numerator indicator definition, can we say:

IF ((D1/D2)< 0.5){

D3 * 2
}else{

D3 * 6

}

if yes, what is the correct syntax?

Lungo

olatitle · 7 February 2012 14:32

Hi Lungo,

Would be good if you could also provide us with the real use case (indicator and data element names) here, that would make it easier (at least for me) to try to see how DHIS can support it.

Ola

···

Ola Hodne Titlestad (Mr)
HISP
Department of Informatics
University of Oslo

Mobile: +47 48069736
Home address: Vetlandsvn. 95B, 0685 Oslo, Norway. Googlemaps link

On 7 February 2012 12:24, Juma Lungo jlungo@yahoo.com wrote:

Hi

Is it possible to have IF Statement while defining indicators? For example, given data elements

D1 - Data element 1

D2 - Data element 2

D3 - Data element 3

Then, under Numerator indicator definition, can we say:

IF ((D1/D2)< 0.5){

D3 * 2
}else{

D3 * 6

}

if yes, what is the correct syntax?

Lungo

Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Lars · 12 February 2012 11:12

Hi,

I think the important thing here is to improve the user feedback
mechanism, not the extra thread itself. First, in my opinion it is
easier to provide user feedback this way - the current arrangement is
to start a separate thread and then request/ping that thread
continuously and return/display a status response through an ajax user
interface. I don't see how running a synchronous process and then
block the user interface will make it simpler. Second, long-running
threads are at the mercy of the container timeout - as far as I can
remember Jetty uses a default 30 second request timeout meaning large
import processes are in danger of timing out.

But yes, a total refactor of the import process is already on the plan
and we also need to improve user feedback. It might be we should do it
more like 1.4 - always do a preview of the import, then present a
report with anomalies before importing into the db. Let me think a bit
more about it.

Lars

bobj · 14 February 2012 15:24

Hi,

I think the important thing here is to improve the user feedback
mechanism, not the extra thread itself. First, in my opinion it is
easier to provide user feedback this way - the current arrangement is
to start a separate thread and then request/ping that thread
continuously and return/display a status response through an ajax user
interface. I don't see how running a synchronous process and then
block the user interface will make it simpler.

It makes it considerably simpler in that you don't have to manage
communication between two threads.

Second, long-running
threads are at the mercy of the container timeout - as far as I can
remember Jetty uses a default 30 second request timeout meaning large
import processes are in danger of timing out.

Depends really on the use case for import-export. There should really
be very few cases where routine import/export should ever take close
to 30 seconds. I recall the 300 000 data values we imported from the
dummy ihris data in Kenya took around 8 seconds. Even so, I think the
30 second timeout is related to inactivity on the thread
(maxIdleTime). So long as user feedback messages are trickling back
at faster than every 30 seconds (which they should) then it shouldn't
ever timeout.

Either way, I can see there are some advantages and disadvantages with
both approaches. But running a single synchronous import thread would
just be much simpler. The alternative requires the creation of an
inter thread communication mechanism which is more sophisticated than
the cave process status message. Of course that's doable (Jo and I
had a mechanism working about 2 years ago) but it really does seem
unnecessarily complicated for what we need to do. And with relatively
large imports, by the time the last byte has arrived over the real
internet (rather than localhost on devs laptop) the data import
process would be close to complete.

But yes, a total refactor of the import process is already on the plan
and we also need to improve user feedback. It might be we should do it
more like 1.4 - always do a preview of the import, then present a
report with anomalies before importing into the db. Let me think a bit
more about it.

Agree with regards metadata import. We could simply enforce preview
on this always.

I think with data import it might be sufficient to import and return a
report of any failed values. Partly because we want to support
pushing of datavaluesets as well as pulling through web UI. If we did
use a separate thread for this then I guess that thread would persist
an identifying import uid of some sort which could be used later to
retrieve a report. Though I still can't help thinking its much easier
to return this synchronously ...

I'll think some more about it when I've finished with some mydatamart
fixing. I have some code which imports dxf1.0 datavalues without
importing metadata. I want to implement that simply (with user
feedback) without rewriting the whole cave stuff ..

Bob

···

2012/2/12 Lars Helge Øverland <larshelge@gmail.com>:

Lars

jason · 14 February 2012 15:46

Depends really on the use case for import-export. There should really

be very few cases where routine import/export should ever take close

to 30 seconds. I recall the 300 000 data values we imported from the

dummy ihris data in Kenya took around 8 seconds. Even so, I think the

30 second timeout is related to inactivity on the thread

(maxIdleTime). So long as user feedback messages are trickling back

at faster than every 30 seconds (which they should) then it shouldn’t

ever timeout.

I am importing data now, around 800,00 data values, and it takes considerably longer than 30 seconds (more than 30 minutes actually) and this is on a very fast machine, i7,8 GB RAM, SSD. Granted, this is a 1.4 import process, but I have seen similar things for DXF imports of these sizes, so I wish I knew how you got 300k values imported in 8 seconds.

INFO 17:02:53,105 Imported CompleteDataSetRegistrations (DefaultImportObjectManager.java [Thread-11])
INFO 17:38:04,768 867341 values were imported. (DefaultImportObjectManager.java [Thread-11])
INFO 17:38:05,398 Internal process completed successfully (AbstractStatementInternalProcess.java [Thread-11])

Not sure this really matters, but obviously, I do not really care about if it is synchronous or not. I am going to go and have a cup of coffee and wait until it is finished, but would obviously like to know whether it is going to succeed or not before starting such a long process.

Regards,

Jason

bobj · 14 February 2012 15:54

Depends really on the use case for import-export. There should really
be very few cases where routine import/export should ever take close
to 30 seconds. I recall the 300 000 data values we imported from the
dummy ihris data in Kenya took around 8 seconds. Even so, I think the
30 second timeout is related to inactivity on the thread
(maxIdleTime). So long as user feedback messages are trickling back
at faster than every 30 seconds (which they should) then it shouldn't
ever timeout.

I am importing data now, around 800,00 data values, and it takes
considerably longer than 30 seconds (more than 30 minutes actually) and this
is on a very fast machine, i7,8 GB RAM, SSD. Granted, this is a 1.4 import
process, but I have seen similar things for DXF imports of these sizes, so I
wish I knew how you got 300k values imported in 8 seconds.

* INFO 17:02:53,105 Imported CompleteDataSetRegistrations
(DefaultImportObjectManager.java [Thread-11])
* INFO 17:38:04,768 867341 values were imported.
(DefaultImportObjectManager.java [Thread-11])
* INFO 17:38:05,398 Internal process completed successfully
(AbstractStatementInternalProcess.java [Thread-11])

The problem here is that nothing happens on the thread between 17:02
and 17:38. What I have done is to trickle some feedback to say 1000,
2000, 3000 , 4000 datavalues imported etc have been imported.

Not sure this really matters, but obviously, I do not really care about if
it is synchronous or not. I am going to go and have a cup of coffee and wait
until it is finished, but would obviously like to know whether it is going
to succeed or not before starting such a long process.

Ideally it should always succeed. Though some values might be
rejected for reasons which are made explicit .. non-existent orgunits
etc.

···

2012/2/14 Jason Pickering <jason.p.pickering@gmail.com>:

Regards,
Jason