Import of Events is extremely slow - any plans to improve it?

Hi

I started importing around 700,000 events 2 days ago (about 5 mill individual values) - and the import is still running, 48 hours later. Import is slowing - it seemed to be importing around 7-8 values per second, now down to 2-3 per second. It looks like that import might take around 200 hours - which makes no sense from a usability point of view. Importing OrgUnits used to be equally slow, but a revamp recently change a 10,000 OrgUnit import&update from hours to something like 5 seconds.

It seems obvious that event imports have been designed to import values one by one, which is equivalent to asking people to fetch water using a leaking tea-spoon.

I’ve searched JIRA but cannot find any issue related to this, thus this post: Are there any plans underway to make import of Event data usable for larger record sets?

Regards

Calle

···

Calle Hedberg

46D Alma Road, 7700 Rosebank, SOUTH AFRICA

Tel/fax (home): +27-21-685-6472

Cell: +27-82-853-5352

Iridium SatPhone: +8816-315-19119

Email: calle.hedberg@gmail.com

Skype: calle_hedberg


Hi Calle,

Thanks for your question.

Just curious (I don’t have answer). As we are planning to do this soon.

Are you doing this through UI or API?

If API, which endpoint enables this?

Thanks

Ime

···

On Jun 28, 2017 19:15, “Calle Hedberg” calle.hedberg@gmail.com wrote:

Hi

I started importing around 700,000 events 2 days ago (about 5 mill individual values) - and the import is still running, 48 hours later. Import is slowing - it seemed to be importing around 7-8 values per second, now down to 2-3 per second. It looks like that import might take around 200 hours - which makes no sense from a usability point of view. Importing OrgUnits used to be equally slow, but a revamp recently change a 10,000 OrgUnit import&update from hours to something like 5 seconds.

It seems obvious that event imports have been designed to import values one by one, which is equivalent to asking people to fetch water using a leaking tea-spoon.

I’ve searched JIRA but cannot find any issue related to this, thus this post: Are there any plans underway to make import of Event data usable for larger record sets?

Regards

Calle


Calle Hedberg

46D Alma Road, 7700 Rosebank, SOUTH AFRICA

Tel/fax (home): +27-21-685-6472

Cell: +27-82-853-5352

Iridium SatPhone: +8816-315-19119

Email: calle.hedberg@gmail.com

Skype: calle_hedberg



Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Ime,

I was using the UI - have not looked at the API endpoint.

For now I will use sql scripts as I’ve done before - I’m just trying to find out if anybody is working on improving what is (now) an import function that in reality only caters for small data sets. There is no question about the core developers ability to speed this up - I saw that with the import of OU meta-data - but not sure if anybody has done anything.

If no answers from core developers, the I will write a JIRA issue for it.

Regards

Calle

···

On 29 June 2017 at 02:25, Ime Asangansi imeasangansi@gmail.com wrote:

Hi Calle,

Thanks for your question.

Just curious (I don’t have answer). As we are planning to do this soon.

Are you doing this through UI or API?

If API, which endpoint enables this?

Thanks

Ime

On Jun 28, 2017 19:15, “Calle Hedberg” calle.hedberg@gmail.com wrote:

Hi

I started importing around 700,000 events 2 days ago (about 5 mill individual values) - and the import is still running, 48 hours later. Import is slowing - it seemed to be importing around 7-8 values per second, now down to 2-3 per second. It looks like that import might take around 200 hours - which makes no sense from a usability point of view. Importing OrgUnits used to be equally slow, but a revamp recently change a 10,000 OrgUnit import&update from hours to something like 5 seconds.

It seems obvious that event imports have been designed to import values one by one, which is equivalent to asking people to fetch water using a leaking tea-spoon.

I’ve searched JIRA but cannot find any issue related to this, thus this post: Are there any plans underway to make import of Event data usable for larger record sets?

Regards

Calle


Calle Hedberg

46D Alma Road, 7700 Rosebank, SOUTH AFRICA

Tel/fax (home): +27-21-685-6472

Cell: +27-82-853-5352

Iridium SatPhone: +8816-315-19119

Email: calle.hedberg@gmail.com

Skype: calle_hedberg



Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp


Calle Hedberg

46D Alma Road, 7700 Rosebank, SOUTH AFRICA

Tel/fax (home): +27-21-685-6472

Cell: +27-82-853-5352

Iridium SatPhone: +8816-315-19119

Email: calle.hedberg@gmail.com

Skype: calle_hedberg


Hi Calle

When some conditions are met, the event import can be really speedy (imports of ~100 000 in <30 minutes).

  • Recent version (2.25+)

  • PostgreSQL properly tuned (this is important)

  • Enough RAM (8 GB+)

I usually use JSON files and post them against /api/events.

David

···

On Fri, Jun 30, 2017 at 12:58 PM, Calle Hedberg calle.hedberg@gmail.com wrote:

Ime,

I was using the UI - have not looked at the API endpoint.

For now I will use sql scripts as I’ve done before - I’m just trying to find out if anybody is working on improving what is (now) an import function that in reality only caters for small data sets. There is no question about the core developers ability to speed this up - I saw that with the import of OU meta-data - but not sure if anybody has done anything.

If no answers from core developers, the I will write a JIRA issue for it.

Regards

Calle


Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

On 29 June 2017 at 02:25, Ime Asangansi imeasangansi@gmail.com wrote:

Hi Calle,

Thanks for your question.

Just curious (I don’t have answer). As we are planning to do this soon.

Are you doing this through UI or API?

If API, which endpoint enables this?

Thanks

Ime


Calle Hedberg

46D Alma Road, 7700 Rosebank, SOUTH AFRICA

Tel/fax (home): +27-21-685-6472

Cell: +27-82-853-5352

Iridium SatPhone: +8816-315-19119

Email: calle.hedberg@gmail.com

Skype: calle_hedberg


On Jun 28, 2017 19:15, “Calle Hedberg” calle.hedberg@gmail.com wrote:

Hi

I started importing around 700,000 events 2 days ago (about 5 mill individual values) - and the import is still running, 48 hours later. Import is slowing - it seemed to be importing around 7-8 values per second, now down to 2-3 per second. It looks like that import might take around 200 hours - which makes no sense from a usability point of view. Importing OrgUnits used to be equally slow, but a revamp recently change a 10,000 OrgUnit import&update from hours to something like 5 seconds.

It seems obvious that event imports have been designed to import values one by one, which is equivalent to asking people to fetch water using a leaking tea-spoon.

I’ve searched JIRA but cannot find any issue related to this, thus this post: Are there any plans underway to make import of Event data usable for larger record sets?

Regards

Calle


Calle Hedberg

46D Alma Road, 7700 Rosebank, SOUTH AFRICA

Tel/fax (home): +27-21-685-6472

Cell: +27-82-853-5352

Iridium SatPhone: +8816-315-19119

Email: calle.hedberg@gmail.com

Skype: calle_hedberg



Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

David Huser
DHIS2 Support Specialist

dhuser@baosystems.com | https://baosystems.com | Skype: dafhus | 2900 K Street, Suite 406, Washington D.C. 20007

David,

I’m importing into 2.26, postgresql is tuned as far as possible (the test was on my laptop), and 12GB RAM. I was using CSV, put there should not be any significant difference between JSON and CSV.

I will import the 2.5 mill events directly and see what time that takes, comparatively.

Regards

Calle

···

On 30 June 2017 at 18:05, David Huser dhuser@baosystems.com wrote:

Hi Calle

When some conditions are met, the event import can be really speedy (imports of ~100 000 in <30 minutes).

  • Recent version (2.25+)
  • PostgreSQL properly tuned (this is important)
  • Enough RAM (8 GB+)

I usually use JSON files and post them against /api/events.

David

On Fri, Jun 30, 2017 at 12:58 PM, Calle Hedberg calle.hedberg@gmail.com wrote:

Ime,

I was using the UI - have not looked at the API endpoint.

For now I will use sql scripts as I’ve done before - I’m just trying to find out if anybody is working on improving what is (now) an import function that in reality only caters for small data sets. There is no question about the core developers ability to speed this up - I saw that with the import of OU meta-data - but not sure if anybody has done anything.

If no answers from core developers, the I will write a JIRA issue for it.

Regards

Calle


Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

David Huser
DHIS2 Support Specialist

dhuser@baosystems.com | https://baosystems.com | Skype: dafhus | 2900 K Street, Suite 406, Washington D.C. 20007

On 29 June 2017 at 02:25, Ime Asangansi imeasangansi@gmail.com wrote:

Hi Calle,

Thanks for your question.

Just curious (I don’t have answer). As we are planning to do this soon.

Are you doing this through UI or API?

If API, which endpoint enables this?

Thanks

Ime


Calle Hedberg

46D Alma Road, 7700 Rosebank, SOUTH AFRICA

Tel/fax (home): +27-21-685-6472

Cell: +27-82-853-5352

Iridium SatPhone: +8816-315-19119

Email: calle.hedberg@gmail.com

Skype: calle_hedberg


On Jun 28, 2017 19:15, “Calle Hedberg” calle.hedberg@gmail.com wrote:

Hi

I started importing around 700,000 events 2 days ago (about 5 mill individual values) - and the import is still running, 48 hours later. Import is slowing - it seemed to be importing around 7-8 values per second, now down to 2-3 per second. It looks like that import might take around 200 hours - which makes no sense from a usability point of view. Importing OrgUnits used to be equally slow, but a revamp recently change a 10,000 OrgUnit import&update from hours to something like 5 seconds.

It seems obvious that event imports have been designed to import values one by one, which is equivalent to asking people to fetch water using a leaking tea-spoon.

I’ve searched JIRA but cannot find any issue related to this, thus this post: Are there any plans underway to make import of Event data usable for larger record sets?

Regards

Calle


Calle Hedberg

46D Alma Road, 7700 Rosebank, SOUTH AFRICA

Tel/fax (home): +27-21-685-6472

Cell: +27-82-853-5352

Iridium SatPhone: +8816-315-19119

Email: calle.hedberg@gmail.com

Skype: calle_hedberg



Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp


Calle Hedberg

46D Alma Road, 7700 Rosebank, SOUTH AFRICA

Tel/fax (home): +27-21-685-6472

Cell: +27-82-853-5352

Iridium SatPhone: +8816-315-19119

Email: calle.hedberg@gmail.com

Skype: calle_hedberg


Hi Calle

Depending on the configuration of the events (how many org units, data elements etc), yes, event import can still feel a bit slow. We did some updates in 225/226, but as you are already using 226 this didn’t help your case.

Yes. when we did the metadata importer we managed to speedup it a lot, but this required a complete rewrite, and to have the same gains in event import we might have to do the same (which means it will have to wait a while).

Feel free to write a issue for it.

···

Morten Olav Hansen

Senior Engineer, DHIS 2

University of Oslo

http://www.dhis2.org

On Sat, Jul 1, 2017 at 2:20 PM, Calle Hedberg calle.hedberg@gmail.com wrote:

David,

I’m importing into 2.26, postgresql is tuned as far as possible (the test was on my laptop), and 12GB RAM. I was using CSV, put there should not be any significant difference between JSON and CSV.

I will import the 2.5 mill events directly and see what time that takes, comparatively.

Regards

Calle


Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

On 30 June 2017 at 18:05, David Huser dhuser@baosystems.com wrote:

Hi Calle

When some conditions are met, the event import can be really speedy (imports of ~100 000 in <30 minutes).

  • Recent version (2.25+)
  • PostgreSQL properly tuned (this is important)
  • Enough RAM (8 GB+)

I usually use JSON files and post them against /api/events.

David


Calle Hedberg

46D Alma Road, 7700 Rosebank, SOUTH AFRICA

Tel/fax (home): +27-21-685-6472

Cell: +27-82-853-5352

Iridium SatPhone: +8816-315-19119

Email: calle.hedberg@gmail.com

Skype: calle_hedberg


On Fri, Jun 30, 2017 at 12:58 PM, Calle Hedberg calle.hedberg@gmail.com wrote:

Ime,

I was using the UI - have not looked at the API endpoint.

For now I will use sql scripts as I’ve done before - I’m just trying to find out if anybody is working on improving what is (now) an import function that in reality only caters for small data sets. There is no question about the core developers ability to speed this up - I saw that with the import of OU meta-data - but not sure if anybody has done anything.

If no answers from core developers, the I will write a JIRA issue for it.

Regards

Calle


Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

David Huser
DHIS2 Support Specialist

dhuser@baosystems.com | https://baosystems.com | Skype: dafhus | 2900 K Street, Suite 406, Washington D.C. 20007

On 29 June 2017 at 02:25, Ime Asangansi imeasangansi@gmail.com wrote:

Hi Calle,

Thanks for your question.

Just curious (I don’t have answer). As we are planning to do this soon.

Are you doing this through UI or API?

If API, which endpoint enables this?

Thanks

Ime


Calle Hedberg

46D Alma Road, 7700 Rosebank, SOUTH AFRICA

Tel/fax (home): +27-21-685-6472

Cell: +27-82-853-5352

Iridium SatPhone: +8816-315-19119

Email: calle.hedberg@gmail.com

Skype: calle_hedberg


On Jun 28, 2017 19:15, “Calle Hedberg” calle.hedberg@gmail.com wrote:

Hi

I started importing around 700,000 events 2 days ago (about 5 mill individual values) - and the import is still running, 48 hours later. Import is slowing - it seemed to be importing around 7-8 values per second, now down to 2-3 per second. It looks like that import might take around 200 hours - which makes no sense from a usability point of view. Importing OrgUnits used to be equally slow, but a revamp recently change a 10,000 OrgUnit import&update from hours to something like 5 seconds.

It seems obvious that event imports have been designed to import values one by one, which is equivalent to asking people to fetch water using a leaking tea-spoon.

I’ve searched JIRA but cannot find any issue related to this, thus this post: Are there any plans underway to make import of Event data usable for larger record sets?

Regards

Calle


Calle Hedberg

46D Alma Road, 7700 Rosebank, SOUTH AFRICA

Tel/fax (home): +27-21-685-6472

Cell: +27-82-853-5352

Iridium SatPhone: +8816-315-19119

Email: calle.hedberg@gmail.com

Skype: calle_hedberg



Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp