I started importing around 700,000 events 2 days ago (about 5 mill individual values) - and the import is still running, 48 hours later. Import is slowing - it seemed to be importing around 7-8 values per second, now down to 2-3 per second. It looks like that import might take around 200 hours - which makes no sense from a usability point of view. Importing OrgUnits used to be equally slow, but a revamp recently change a 10,000 OrgUnit import&update from hours to something like 5 seconds.
It seems obvious that event imports have been designed to import values one by one, which is equivalent to asking people to fetch water using a leaking tea-spoon.
I’ve searched JIRA but cannot find any issue related to this, thus this post: Are there any plans underway to make import of Event data usable for larger record sets?
I started importing around 700,000 events 2 days ago (about 5 mill individual values) - and the import is still running, 48 hours later. Import is slowing - it seemed to be importing around 7-8 values per second, now down to 2-3 per second. It looks like that import might take around 200 hours - which makes no sense from a usability point of view. Importing OrgUnits used to be equally slow, but a revamp recently change a 10,000 OrgUnit import&update from hours to something like 5 seconds.
It seems obvious that event imports have been designed to import values one by one, which is equivalent to asking people to fetch water using a leaking tea-spoon.
I’ve searched JIRA but cannot find any issue related to this, thus this post: Are there any plans underway to make import of Event data usable for larger record sets?
I was using the UI - have not looked at the API endpoint.
For now I will use sql scripts as I’ve done before - I’m just trying to find out if anybody is working on improving what is (now) an import function that in reality only caters for small data sets. There is no question about the core developers ability to speed this up - I saw that with the import of OU meta-data - but not sure if anybody has done anything.
If no answers from core developers, the I will write a JIRA issue for it.
I started importing around 700,000 events 2 days ago (about 5 mill individual values) - and the import is still running, 48 hours later. Import is slowing - it seemed to be importing around 7-8 values per second, now down to 2-3 per second. It looks like that import might take around 200 hours - which makes no sense from a usability point of view. Importing OrgUnits used to be equally slow, but a revamp recently change a 10,000 OrgUnit import&update from hours to something like 5 seconds.
It seems obvious that event imports have been designed to import values one by one, which is equivalent to asking people to fetch water using a leaking tea-spoon.
I’ve searched JIRA but cannot find any issue related to this, thus this post: Are there any plans underway to make import of Event data usable for larger record sets?
I was using the UI - have not looked at the API endpoint.
For now I will use sql scripts as I’ve done before - I’m just trying to find out if anybody is working on improving what is (now) an import function that in reality only caters for small data sets. There is no question about the core developers ability to speed this up - I saw that with the import of OU meta-data - but not sure if anybody has done anything.
If no answers from core developers, the I will write a JIRA issue for it.
I started importing around 700,000 events 2 days ago (about 5 mill individual values) - and the import is still running, 48 hours later. Import is slowing - it seemed to be importing around 7-8 values per second, now down to 2-3 per second. It looks like that import might take around 200 hours - which makes no sense from a usability point of view. Importing OrgUnits used to be equally slow, but a revamp recently change a 10,000 OrgUnit import&update from hours to something like 5 seconds.
It seems obvious that event imports have been designed to import values one by one, which is equivalent to asking people to fetch water using a leaking tea-spoon.
I’ve searched JIRA but cannot find any issue related to this, thus this post: Are there any plans underway to make import of Event data usable for larger record sets?
I’m importing into 2.26, postgresql is tuned as far as possible (the test was on my laptop), and 12GB RAM. I was using CSV, put there should not be any significant difference between JSON and CSV.
I will import the 2.5 mill events directly and see what time that takes, comparatively.
I was using the UI - have not looked at the API endpoint.
For now I will use sql scripts as I’ve done before - I’m just trying to find out if anybody is working on improving what is (now) an import function that in reality only caters for small data sets. There is no question about the core developers ability to speed this up - I saw that with the import of OU meta-data - but not sure if anybody has done anything.
If no answers from core developers, the I will write a JIRA issue for it.
I started importing around 700,000 events 2 days ago (about 5 mill individual values) - and the import is still running, 48 hours later. Import is slowing - it seemed to be importing around 7-8 values per second, now down to 2-3 per second. It looks like that import might take around 200 hours - which makes no sense from a usability point of view. Importing OrgUnits used to be equally slow, but a revamp recently change a 10,000 OrgUnit import&update from hours to something like 5 seconds.
It seems obvious that event imports have been designed to import values one by one, which is equivalent to asking people to fetch water using a leaking tea-spoon.
I’ve searched JIRA but cannot find any issue related to this, thus this post: Are there any plans underway to make import of Event data usable for larger record sets?
Depending on the configuration of the events (how many org units, data elements etc), yes, event import can still feel a bit slow. We did some updates in 225/226, but as you are already using 226 this didn’t help your case.
Yes. when we did the metadata importer we managed to speedup it a lot, but this required a complete rewrite, and to have the same gains in event import we might have to do the same (which means it will have to wait a while).
I’m importing into 2.26, postgresql is tuned as far as possible (the test was on my laptop), and 12GB RAM. I was using CSV, put there should not be any significant difference between JSON and CSV.
I will import the 2.5 mill events directly and see what time that takes, comparatively.
I was using the UI - have not looked at the API endpoint.
For now I will use sql scripts as I’ve done before - I’m just trying to find out if anybody is working on improving what is (now) an import function that in reality only caters for small data sets. There is no question about the core developers ability to speed this up - I saw that with the import of OU meta-data - but not sure if anybody has done anything.
If no answers from core developers, the I will write a JIRA issue for it.
I started importing around 700,000 events 2 days ago (about 5 mill individual values) - and the import is still running, 48 hours later. Import is slowing - it seemed to be importing around 7-8 values per second, now down to 2-3 per second. It looks like that import might take around 200 hours - which makes no sense from a usability point of view. Importing OrgUnits used to be equally slow, but a revamp recently change a 10,000 OrgUnit import&update from hours to something like 5 seconds.
It seems obvious that event imports have been designed to import values one by one, which is equivalent to asking people to fetch water using a leaking tea-spoon.
I’ve searched JIRA but cannot find any issue related to this, thus this post: Are there any plans underway to make import of Event data usable for larger record sets?