Importing DHS survey data in DHIS

Hi all,
I wanted to hear if anyone has any experience with the DHS API (http://api.dhsprogram.com/#/index.html), and using it to import survey results into DHIS?

Olav

Not here unfortunately…just doing csv imports from DHS Excel files. Would be useful for our data warehouse.

Randy

···

On Jan 29, 2016 2:59 PM, “Olav Poppe” olav.poppe@me.com wrote:

Hi all,
I wanted to hear if anyone has any experience with the DHS API (http://api.dhsprogram.com/#/index.html), and using it to import survey results into DHIS?

Olav


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

This message and its attachments are confidential and solely for the intended recipients. If received in error, please delete them and notify the sender via reply e-mail immediately.

Hi Olav & Randy,

I am currently banging on kettle (aka Pentaho DI) to extract data from a source-system (SQL-ERP in our case) into DHIS2 dataSets in json format. In our current test-scenario (2 dataElements in a dataSet with a categoryCombination of 5 categories) we are currently updating ca. 4 mio dataValues every night in a pseudo-delta mode (reading all data from source, comparing to what is there in DHIS2 already, then only pushing records for creating, updating or deleting dataValues into the api: ca. 150k per night in 1 hour, initial load was 7hrs). We still have to prove, that this is feasible when setting up the first real life dataSet where there will be more categories and more dataElements, thus exploding the number of dataValues.

Getting there was a bit painful, but now it seems to work. I chose kettle instead of Talend ETL (both open source) as it seemed to be easier to get used to. However, from a data warehouse perspective I'd prefer to have DHIS2 offering some sort of an integrated ETL landscape on the long run, which would also allow to aggregate data from tracker into dataSets, tracker to tracker, dataSets to dataSets etc.

Our current version of the kettle transformations and jobs were designed to be generic (not for a specific dataSet, but you have to design your own extractor which could be a simple csv-reader or maybe a DHS api-call). If you are interested, I will share them. Just be aware that they are currently in a very early and rough state and not documented. You'd have to bring along the willingness to dig yourself into kettle and be pain resistant to a certain degree :slight_smile:

I'd be interested to hear from other experiences ...

Have a nice sunday,

Uwe

···

---

Am 29.01.2016 um 17:31 schrieb Wilson, Randy:

Not here unfortunately...just doing csv imports from DHS Excel files. Would be useful for our data warehouse.
Randy

On Jan 29, 2016 2:59 PM, "Olav Poppe" <olav.poppe@me.com > <mailto:olav.poppe@me.com>> wrote:

    Hi all,
    I wanted to hear if anyone has any experience with the DHS API
    (The DHS Program API), and using it to import
    survey results into DHIS?

    Olav

    _______________________________________________
    Mailing list: https://launchpad.net/~dhis2-users
    <https://launchpad.net/~dhis2-users&gt;
    Post to : dhis2-users@lists.launchpad.net
    <mailto:dhis2-users@lists.launchpad.net>
    Unsubscribe : https://launchpad.net/~dhis2-users
    <https://launchpad.net/~dhis2-users&gt;
    More help : https://help.launchpad.net/ListHelp

/This message and its attachments are confidential and solely for the intended recipients. If received in error, please delete them and notify the sender via reply e-mail immediately./

_______________________________________________
Mailing list: https://launchpad.net/~dhis2-users
Post to : dhis2-users@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-users
More help : https://help.launchpad.net/ListHelp

Hi Randy and Uwe,
thanks, interesting to hear you experiences. Uwe, what you are working on sounds quite a bit more complicated, and not least with far more data. I image that with household surveys, it would be a matter of < 100 indicators for < 200 orgunits for 2-3 periods, i.e. a fraction of what you are dealing with!

Olav

···

Hi Olav & Randy,

  I am currently banging on kettle (aka Pentaho DI) to extract data from a source-system (SQL-ERP in our case) into DHIS2 dataSets in json format. In our current test-scenario (2 dataElements in a dataSet with a categoryCombination of 5 categories) we are currently updating ca. 4 mio dataValues every night in a pseudo-delta mode (reading all data from source, comparing to what is there in DHIS2 already, then only pushing records for creating, updating or deleting dataValues into the api: ca. 150k per night in 1 hour, initial load was 7hrs). We still have to prove, that this is feasible when setting up the first real life dataSet where there will be more categories and more dataElements, thus exploding the number of dataValues.

  Getting there was a bit painful, but now it seems to work. I chose kettle instead of Talend ETL (both open source) as it seemed to be easier to get used to. However, from a data warehouse perspective I'd prefer to have DHIS2 offering some sort of an integrated ETL landscape on the long run, which would also allow to aggregate data from tracker into dataSets, tracker to tracker, dataSets to dataSets etc.

  Our current version of the kettle transformations and jobs were designed to be generic (not for a specific dataSet, but you have to design your own extractor which could be a simple csv-reader or maybe a DHS api-call). If you are interested, I will share them. Just be aware that they are currently in a very early and rough state and not documented. You'd have to bring along the willingness to dig yourself into kettle and be pain resistant to a certain degree :-)

  I'd be interested to hear from other experiences ...

  Have a nice sunday,

  Uwe

  ---

  Am 29.01.2016 um 17:31 schrieb Wilson, Randy:
    Not here unfortunately...just doing csv imports from DHS Excel files. Would be useful for our data warehouse.

    Randy
    On Jan 29, 2016 2:59 PM, "Olav Poppe" <olav.poppe@me.com        > wrote:

Hi all,
I wanted to hear if anyone has any experience with the DHS API (http://api.dhsprogram.com/#/index.html ), and using it to import survey results into DHIS?

Olav

      _______________________________________________

      Mailing list: [https://launchpad.net/~dhis2-users](https://launchpad.net/%7Edhis2-users)

      Post to     : dhis2-users@lists.launchpad.net

      Unsubscribe : [https://launchpad.net/~dhis2-users](https://launchpad.net/%7Edhis2-users)

      More help   : [https://help.launchpad.net/ListHelp](https://help.launchpad.net/ListHelp)
  *          This message and its attachments are confidential and solely for the intended recipients. If received in error, please delete them and notify the sender via reply e-mail immediately.*
_______________________________________________ Mailing list: Post to : Unsubscribe : More help :

https://launchpad.net/~dhis2-usersdhis2-users@lists.launchpad.nethttps://launchpad.net/~dhis2-usershttps://help.launchpad.net/ListHelp

Dear Uwe,

Have you tried to send data via the endpoint api/dataValueSets, it may be faster. Just stage your data and push it once.

http://dhis2.github.io/dhis2-docs/master/en/developer/html/ch01s13.html#d5e1372

Also to note, is how you send it, I have seen curl taking ages to submit individual values via the api. You need to send it as once file via once request or implement concurrency.

Alex

···

On Tue, Feb 2, 2016 at 5:13 PM, Olav Poppe olav.poppe@me.com wrote:

Hi Randy and Uwe,
thanks, interesting to hear you experiences. Uwe, what you are working on sounds quite a bit more complicated, and not least with far more data. I image that with household surveys, it would be a matter of < 100 indicators for < 200 orgunits for 2-3 periods, i.e. a fraction of what you are dealing with!

Olav

  1. jan. 2016 kl. 09.29 skrev uwe wahser uwe@wahser.de:

Hi Olav & Randy,

  I am currently banging on kettle (aka Pentaho DI) to extract data from a source-system (SQL-ERP in our case) into DHIS2 dataSets in json format. In our current test-scenario (2 dataElements in a dataSet with a categoryCombination of 5 categories) we are currently updating ca. 4 mio dataValues every night in a pseudo-delta mode (reading all data from source, comparing to what is there in DHIS2 already, then only pushing records for creating, updating or deleting dataValues into the api: ca. 150k per night in 1 hour, initial load was 7hrs). We still have to prove, that this is feasible when setting up the first real life dataSet where there will be more categories and more dataElements, thus exploding the number of dataValues.



  Getting there was a bit painful, but now it seems to work. I chose kettle instead of Talend ETL (both open source) as it seemed to be easier to get used to. However, from a data warehouse perspective I'd prefer to have DHIS2 offering some sort of an integrated ETL landscape on the long run, which would also allow to aggregate data from tracker into dataSets, tracker to tracker, dataSets to dataSets etc.



  Our current version of the kettle transformations and jobs were designed to be generic (not for a specific dataSet, but you have to design your own extractor which could be a simple csv-reader or maybe a DHS api-call). If you are interested, I will share them. Just be aware that they are currently in a very early and rough state and not documented. You'd have to bring along the willingness to dig yourself into kettle and be pain resistant to a certain degree :-)



  I'd be interested to hear from other experiences ...



  Have a nice sunday,



  Uwe



  ---



  Am 29.01.2016 um 17:31 schrieb Wilson, Randy:
    Not here unfortunately...just doing csv imports from DHS Excel files. Would be useful for our data warehouse.

    Randy
    On Jan 29, 2016 2:59 PM, "Olav Poppe" <olav.poppe@me.com        > wrote:

Hi all,
I wanted to hear if anyone has any experience with the DHS API (http://api.dhsprogram.com/#/index.html ), and using it to import survey results into DHIS?

Olav

      _______________________________________________

      Mailing list: [https://launchpad.net/~dhis2-users](https://launchpad.net/%7Edhis2-users)

      Post to     : dhis2-users@lists.launchpad.net

      Unsubscribe : [https://launchpad.net/~dhis2-users](https://launchpad.net/%7Edhis2-users)

      More help   : [https://help.launchpad.net/ListHelp](https://help.launchpad.net/ListHelp)
  *          This message and its attachments are confidential and solely for the intended recipients. If received in error, please delete them and notify the sender via reply e-mail immediately.*
_______________________________________________ Mailing list: [https://launchpad.net/~dhis2-users](https://launchpad.net/~dhis2-users)
Post to : dhis2-users@lists.launchpad.net
Unsubscribe : [https://launchpad.net/~dhis2-users](https://launchpad.net/~dhis2-users)
More help : [https://help.launchpad.net/ListHelp](https://help.launchpad.net/ListHelp)

Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

Alex Tumwesigye

Technical Advisor - DHIS2 (Consultant),
Ministry of Health/AFENET

Kampala

Uganda

IT Consultant - BarefootPower Uganda Ltd, SmartSolar, Kenya

IT Specialist (Servers, Networks and Security, Health Information Systems - DHIS2 ) & Solar Consultant

+256 774149 775, + 256 759 800161

"I don’t want to be anything other than what I have been - one tree hill "

One of the interesting ideas from Uwe’s approach is that DHS has apparently standardized definitions for all indicators - presumably there is a code that we can use in DHIS-2 so that interoperability will be simplified. Uwe might want to extend the data element attributes to capture more of the metadata that is available in DHS to define the indicators. Also, I wonder if you plan to bring in the raw data (numerators & denominators) as data elements and build the calculations into DHIS-2, or bring in the calculated indicator values as data elements.

One of the challenges that we face in our Data Warehouse is that it contains indicators calculated based on both routine and population survey data. We have to be very careful of the indicator names so that people know which come from which source. For example: from DHS we have “Contraceptive prevalence rate - modern methods” while we estimate that from the routine HMIS data but call it “Contraceptive utilisation rate from health facilities - modern methods”.

Randy

This message and its attachments are confidential and solely for the intended recipients. If received in error, please delete them and notify the sender via reply e-mail immediately.

···

On Tue, Feb 2, 2016 at 4:13 PM, Olav Poppe olav.poppe@me.com wrote:

Hi Randy and Uwe,
thanks, interesting to hear you experiences. Uwe, what you are working on sounds quite a bit more complicated, and not least with far more data. I image that with household surveys, it would be a matter of < 100 indicators for < 200 orgunits for 2-3 periods, i.e. a fraction of what you are dealing with!

Olav

  1. jan. 2016 kl. 09.29 skrev uwe wahser uwe@wahser.de:

Hi Olav & Randy,

  I am currently banging on kettle (aka Pentaho DI) to extract data from a source-system (SQL-ERP in our case) into DHIS2 dataSets in json format. In our current test-scenario (2 dataElements in a dataSet with a categoryCombination of 5 categories) we are currently updating ca. 4 mio dataValues every night in a pseudo-delta mode (reading all data from source, comparing to what is there in DHIS2 already, then only pushing records for creating, updating or deleting dataValues into the api: ca. 150k per night in 1 hour, initial load was 7hrs). We still have to prove, that this is feasible when setting up the first real life dataSet where there will be more categories and more dataElements, thus exploding the number of dataValues.



  Getting there was a bit painful, but now it seems to work. I chose kettle instead of Talend ETL (both open source) as it seemed to be easier to get used to. However, from a data warehouse perspective I'd prefer to have DHIS2 offering some sort of an integrated ETL landscape on the long run, which would also allow to aggregate data from tracker into dataSets, tracker to tracker, dataSets to dataSets etc.



  Our current version of the kettle transformations and jobs were designed to be generic (not for a specific dataSet, but you have to design your own extractor which could be a simple csv-reader or maybe a DHS api-call). If you are interested, I will share them. Just be aware that they are currently in a very early and rough state and not documented. You'd have to bring along the willingness to dig yourself into kettle and be pain resistant to a certain degree :-)



  I'd be interested to hear from other experiences ...



  Have a nice sunday,



  Uwe



  ---



  Am 29.01.2016 um 17:31 schrieb Wilson, Randy:
    Not here unfortunately...just doing csv imports from DHS Excel files. Would be useful for our data warehouse.

    Randy
    On Jan 29, 2016 2:59 PM, "Olav Poppe" <olav.poppe@me.com        > wrote:

Hi all,
I wanted to hear if anyone has any experience with the DHS API (http://api.dhsprogram.com/#/index.html ), and using it to import survey results into DHIS?

Olav

      _______________________________________________

      Mailing list: [https://launchpad.net/~dhis2-users](https://launchpad.net/%7Edhis2-users)

      Post to     : dhis2-users@lists.launchpad.net

      Unsubscribe : [https://launchpad.net/~dhis2-users](https://launchpad.net/%7Edhis2-users)

      More help   : [https://help.launchpad.net/ListHelp](https://help.launchpad.net/ListHelp)
  *          This message and its attachments are confidential and solely for the intended recipients. If received in error, please delete them and notify the sender via reply e-mail immediately.*
_______________________________________________ Mailing list: [https://launchpad.net/~dhis2-users](https://launchpad.net/~dhis2-users)
Post to : dhis2-users@lists.launchpad.net
Unsubscribe : [https://launchpad.net/~dhis2-users](https://launchpad.net/~dhis2-users)
More help : [https://help.launchpad.net/ListHelp](https://help.launchpad.net/ListHelp)

Randy Wilson
*Team Leader: *Knowledge Management, Data Use and Research

Rwanda Health System Strengthening Activity

Management Sciences for Health

Rwanda-Kigali

Direct: +250 788308835

E-mail: rwilson@msh.org

Skype: wilsonrandy_us

Stronger health systems. Greater health impact.

www.msh.org

Hi Olav,
I have not worked with the DHS API per se, but have imported lots of data using the same approach which they outline here (http://api.dhsprogram.com/#/samples-r.cfm)

I have written up a walkthrough of getting data out of one DHIS instance and into another one, and I think the basic principles would be the same (http://rpubs.com/jason_p_pickering/139589)

Metadata needs to be mapped (or created), the data needs to be reshaped, and correctly formatted.

It should not be too difficult. I used R, but there are other examples with Python and JavaScript on their examples page.

Regards,

Jason

···

On Tue, Feb 2, 2016 at 3:31 PM, Alex Tumwesigye atumwesigye@gmail.com wrote:

Dear Uwe,

Have you tried to send data via the endpoint api/dataValueSets, it may be faster. Just stage your data and push it once.

http://dhis2.github.io/dhis2-docs/master/en/developer/html/ch01s13.html#d5e1372

Also to note, is how you send it, I have seen curl taking ages to submit individual values via the api. You need to send it as once file via once request or implement concurrency.

Alex


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

On Tue, Feb 2, 2016 at 5:13 PM, Olav Poppe olav.poppe@me.com wrote:

Hi Randy and Uwe,
thanks, interesting to hear you experiences. Uwe, what you are working on sounds quite a bit more complicated, and not least with far more data. I image that with household surveys, it would be a matter of < 100 indicators for < 200 orgunits for 2-3 periods, i.e. a fraction of what you are dealing with!

Olav

  1. jan. 2016 kl. 09.29 skrev uwe wahser uwe@wahser.de:

Hi Olav & Randy,

  I am currently banging on kettle (aka Pentaho DI) to extract data from a source-system (SQL-ERP in our case) into DHIS2 dataSets in json format. In our current test-scenario (2 dataElements in a dataSet with a categoryCombination of 5 categories) we are currently updating ca. 4 mio dataValues every night in a pseudo-delta mode (reading all data from source, comparing to what is there in DHIS2 already, then only pushing records for creating, updating or deleting dataValues into the api: ca. 150k per night in 1 hour, initial load was 7hrs). We still have to prove, that this is feasible when setting up the first real life dataSet where there will be more categories and more dataElements, thus exploding the number of dataValues.



  Getting there was a bit painful, but now it seems to work. I chose kettle instead of Talend ETL (both open source) as it seemed to be easier to get used to. However, from a data warehouse perspective I'd prefer to have DHIS2 offering some sort of an integrated ETL landscape on the long run, which would also allow to aggregate data from tracker into dataSets, tracker to tracker, dataSets to dataSets etc.



  Our current version of the kettle transformations and jobs were designed to be generic (not for a specific dataSet, but you have to design your own extractor which could be a simple csv-reader or maybe a DHS api-call). If you are interested, I will share them. Just be aware that they are currently in a very early and rough state and not documented. You'd have to bring along the willingness to dig yourself into kettle and be pain resistant to a certain degree :-)



  I'd be interested to hear from other experiences ...



  Have a nice sunday,



  Uwe



  ---



  Am 29.01.2016 um 17:31 schrieb Wilson, Randy:
    Not here unfortunately...just doing csv imports from DHS Excel files. Would be useful for our data warehouse.

    Randy
    On Jan 29, 2016 2:59 PM, "Olav Poppe" <olav.poppe@me.com        > wrote:

Hi all,
I wanted to hear if anyone has any experience with the DHS API (http://api.dhsprogram.com/#/index.html ), and using it to import survey results into DHIS?

Olav

      _______________________________________________

      Mailing list: [https://launchpad.net/~dhis2-users](https://launchpad.net/%7Edhis2-users)

      Post to     : dhis2-users@lists.launchpad.net

      Unsubscribe : [https://launchpad.net/~dhis2-users](https://launchpad.net/%7Edhis2-users)

      More help   : [https://help.launchpad.net/ListHelp](https://help.launchpad.net/ListHelp)
  *          This message and its attachments are confidential and solely for the intended recipients. If received in error, please delete them and notify the sender via reply e-mail immediately.*
_______________________________________________ Mailing list: [https://launchpad.net/~dhis2-users](https://launchpad.net/~dhis2-users)
Post to : dhis2-users@lists.launchpad.net
Unsubscribe : [https://launchpad.net/~dhis2-users](https://launchpad.net/~dhis2-users)
More help : [https://help.launchpad.net/ListHelp](https://help.launchpad.net/ListHelp)

Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp


Alex Tumwesigye

Technical Advisor - DHIS2 (Consultant),
Ministry of Health/AFENET

Kampala

Uganda

IT Consultant - BarefootPower Uganda Ltd, SmartSolar, Kenya

IT Specialist (Servers, Networks and Security, Health Information Systems - DHIS2 ) & Solar Consultant

+256 774149 775, + 256 759 800161

"I don’t want to be anything other than what I have been - one tree hill "

Jason P. Pickering
email: jason.p.pickering@gmail.com
tel:+46764147049

Hi Alex,

thanks for the suggestions. That's actually the api I am using: Per dataSet I
post one request for deletion, one for creation and one for update in parallel.
Kettle has a transformation for converting tabular data into one json record and
another one for POSTing that json-chunk to the api in one request. I also saw
your curl-observation when sending single values in the beginning, when there
wasn't a DELETE option for the batch and I had to delete on a single
record-basis.

Actually I was surprised that the performance of the api is rather acceptable:
on our server it's roughly 375k records per hour for creating/updating/deleting
(no network delays since kettle is running on the same server as DHIS2 thus
POSTing to localhost). But I am thinking of breaking the load into parallel
packages as you suggested e.g. per dataElement, mainly in order to avoid memory
dumps from kettle - the json-converter is quite hungry. Is DHIS2 able to detect
memory shortages from parallel api-imports without dumping?

Does anyone have experience with more permanent options, like posting CSV to
dataValueSets or using the new ADX api? Actually I'd prefer DHIS2 offering an
api where I can POST a CSV-like structure per dataSet like
[ou,pe,Category1,Category2,etc,DataElement1,DataElement2,etc]. I suppose that
this would reduce the volume of data to be transferred significantly, not sure
about the performance.

Regards,

Uwe

···

---

Alex Tumwesigye <atumwesigye@gmail.com> hat am 2. Februar 2016 um 17:31
geschrieben:

Dear Uwe,

Have you tried to send data via the endpoint api/dataValueSets, it may be
faster. Just stage your data and push it once.
http://dhis2.github.io/dhis2-docs/master/en/developer/html/ch01s13.html#d5e1372

Also to note, is how you send it, I have seen curl taking ages to submit
individual values via the api. You need to send it as once file via once
request or implement concurrency.

Alex

On Tue, Feb 2, 2016 at 5:13 PM, Olav Poppe <olav.poppe@me.com> wrote:

> Hi Randy and Uwe,
> thanks, interesting to hear you experiences. Uwe, what you are working on
> sounds quite a bit more complicated, and not least with far more data. I
> image that with household surveys, it would be a matter of < 100 indicators
> for < 200 orgunits for 2-3 periods, i.e. a fraction of what you are dealing
> with!
>
> Olav
>
>
>
>
>
>
> 31. jan. 2016 kl. 09.29 skrev uwe wahser <uwe@wahser.de>:
>
> Hi Olav & Randy,
>
> I am currently banging on kettle (aka Pentaho DI) to extract data from a
> source-system (SQL-ERP in our case) into DHIS2 dataSets in json format. In
> our current test-scenario (2 dataElements in a dataSet with a
> categoryCombination of 5 categories) we are currently updating ca. 4 mio
> dataValues every night in a pseudo-delta mode (reading all data from
> source, comparing to what is there in DHIS2 already, then only pushing
> records for creating, updating or deleting dataValues into the api: ca.
> 150k per night in 1 hour, initial load was 7hrs). We still have to prove,
> that this is feasible when setting up the first real life dataSet where
> there will be more categories and more dataElements, thus exploding the
> number of dataValues.
>
> Getting there was a bit painful, but now it seems to work. I chose kettle
> instead of Talend ETL (both open source) as it seemed to be easier to get
> used to. However, from a data warehouse perspective I'd prefer to have
> DHIS2 offering some sort of an integrated ETL landscape on the long run,
> which would also allow to aggregate data from tracker into dataSets,
> tracker to tracker, dataSets to dataSets etc.
>
> Our current version of the kettle transformations and jobs were designed
> to be generic (not for a specific dataSet, but you have to design your own
> extractor which could be a simple csv-reader or maybe a DHS api-call). If
> you are interested, I will share them. Just be aware that they are
> currently in a very early and rough state and not documented. You'd have to
> bring along the willingness to dig yourself into kettle and be pain
> resistant to a certain degree :slight_smile:
>
> I'd be interested to hear from other experiences ...
>
> Have a nice sunday,
>
> Uwe
>
> ---
>
> Am 29.01.2016 um 17:31 schrieb Wilson, Randy:
>
> Not here unfortunately...just doing csv imports from DHS Excel files.
> Would be useful for our data warehouse.
> Randy
> On Jan 29, 2016 2:59 PM, "Olav Poppe" <olav.poppe@me.com> wrote:
>
>> Hi all,
>> I wanted to hear if anyone has any experience with the DHS API (
>> The DHS Program API), and using it to import survey
>> results into DHIS?
>>
>> Olav
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~dhis2-users
>> Post to : dhis2-users@lists.launchpad.net
>> Unsubscribe : https://launchpad.net/~dhis2-users
>> More help : https://help.launchpad.net/ListHelp
>>
>>
> *This message and its attachments are confidential and solely for the
> intended recipients. If received in error, please delete them and notify
> the sender via reply e-mail immediately.*
>
> _______________________________________________
> Mailing list: https://launchpad.net/~dhis2-users
> Post to : dhis2-users@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~dhis2-users
> More help : https://help.launchpad.net/ListHelp
>
>
>
>
> _______________________________________________
> Mailing list: https://launchpad.net/~dhis2-users
> Post to : dhis2-users@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~dhis2-users
> More help : https://help.launchpad.net/ListHelp
>
>

--
Alex Tumwesigye

Technical Advisor - DHIS2 (Consultant),
Ministry of Health/AFENET
Kampala
Uganda

IT Consultant - BarefootPower Uganda Ltd, SmartSolar, Kenya

IT Specialist (Servers, Networks and Security, Health Information Systems -
DHIS2 ) & Solar Consultant

+256 774149 775, + 256 759 800161

"I don't want to be anything other than what I have been - one tree hill "

Hi Jason,

thanks for sharing the links. As I can see on a quick glance, you are also
experimenting with the ADX-api - did you observe any significant performance
differences between ADX and dataValueSets apis?

Regards,

Uwe

···

Jason Pickering <jason.p.pickering@gmail.com> hat am 2. Februar 2016 um 18:21
geschrieben:

Hi Olav,
I have not worked with the DHS API per se, but have imported lots of data
using the same approach which they outline here (
The DHS Program API)

I have written up a walkthrough of getting data out of one DHIS instance
and into another one, and I think the basic principles would be the same (
http://rpubs.com/jason_p_pickering/139589\)

Metadata needs to be mapped (or created), the data needs to be reshaped,
and correctly formatted.

It should not be too difficult. I used R, but there are other examples with
Python and JavaScript on their examples page.

Regards,
Jason

On Tue, Feb 2, 2016 at 3:31 PM, Alex Tumwesigye <atumwesigye@gmail.com> > wrote:

> Dear Uwe,
>
> Have you tried to send data via the endpoint api/dataValueSets, it may be
> faster. Just stage your data and push it once.
>
> http://dhis2.github.io/dhis2-docs/master/en/developer/html/ch01s13.html#d5e1372
>
> Also to note, is how you send it, I have seen curl taking ages to submit
> individual values via the api. You need to send it as once file via once
> request or implement concurrency.
>
> Alex
>
> On Tue, Feb 2, 2016 at 5:13 PM, Olav Poppe <olav.poppe@me.com> wrote:
>
>> Hi Randy and Uwe,
>> thanks, interesting to hear you experiences. Uwe, what you are working on
>> sounds quite a bit more complicated, and not least with far more data. I
>> image that with household surveys, it would be a matter of < 100 indicators
>> for < 200 orgunits for 2-3 periods, i.e. a fraction of what you are dealing
>> with!
>>
>> Olav
>>
>>
>>
>>
>>
>>
>> 31. jan. 2016 kl. 09.29 skrev uwe wahser <uwe@wahser.de>:
>>
>> Hi Olav & Randy,
>>
>> I am currently banging on kettle (aka Pentaho DI) to extract data from a
>> source-system (SQL-ERP in our case) into DHIS2 dataSets in json format. In
>> our current test-scenario (2 dataElements in a dataSet with a
>> categoryCombination of 5 categories) we are currently updating ca. 4 mio
>> dataValues every night in a pseudo-delta mode (reading all data from
>> source, comparing to what is there in DHIS2 already, then only pushing
>> records for creating, updating or deleting dataValues into the api: ca.
>> 150k per night in 1 hour, initial load was 7hrs). We still have to prove,
>> that this is feasible when setting up the first real life dataSet where
>> there will be more categories and more dataElements, thus exploding the
>> number of dataValues.
>>
>> Getting there was a bit painful, but now it seems to work. I chose kettle
>> instead of Talend ETL (both open source) as it seemed to be easier to get
>> used to. However, from a data warehouse perspective I'd prefer to have
>> DHIS2 offering some sort of an integrated ETL landscape on the long run,
>> which would also allow to aggregate data from tracker into dataSets,
>> tracker to tracker, dataSets to dataSets etc.
>>
>> Our current version of the kettle transformations and jobs were designed
>> to be generic (not for a specific dataSet, but you have to design your own
>> extractor which could be a simple csv-reader or maybe a DHS api-call). If
>> you are interested, I will share them. Just be aware that they are
>> currently in a very early and rough state and not documented. You'd have to
>> bring along the willingness to dig yourself into kettle and be pain
>> resistant to a certain degree :slight_smile:
>>
>> I'd be interested to hear from other experiences ...
>>
>> Have a nice sunday,
>>
>> Uwe
>>
>> ---
>>
>> Am 29.01.2016 um 17:31 schrieb Wilson, Randy:
>>
>> Not here unfortunately...just doing csv imports from DHS Excel files.
>> Would be useful for our data warehouse.
>> Randy
>> On Jan 29, 2016 2:59 PM, "Olav Poppe" <olav.poppe@me.com> wrote:
>>
>>> Hi all,
>>> I wanted to hear if anyone has any experience with the DHS API (
>>> The DHS Program API), and using it to import survey
>>> results into DHIS?
>>>
>>> Olav
>>>
>>> _______________________________________________
>>> Mailing list: https://launchpad.net/~dhis2-users
>>> Post to : dhis2-users@lists.launchpad.net
>>> Unsubscribe : https://launchpad.net/~dhis2-users
>>> More help : https://help.launchpad.net/ListHelp
>>>
>>>
>> *This message and its attachments are confidential and solely for the
>> intended recipients. If received in error, please delete them and notify
>> the sender via reply e-mail immediately.*
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~dhis2-users
>> Post to : dhis2-users@lists.launchpad.net
>> Unsubscribe : https://launchpad.net/~dhis2-users
>> More help : https://help.launchpad.net/ListHelp
>>
>>
>>
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~dhis2-users
>> Post to : dhis2-users@lists.launchpad.net
>> Unsubscribe : https://launchpad.net/~dhis2-users
>> More help : https://help.launchpad.net/ListHelp
>>
>>
>
>
> --
> Alex Tumwesigye
>
> Technical Advisor - DHIS2 (Consultant),
> Ministry of Health/AFENET
> Kampala
> Uganda
>
> IT Consultant - BarefootPower Uganda Ltd, SmartSolar, Kenya
>
> IT Specialist (Servers, Networks and Security, Health Information Systems
> - DHIS2 ) & Solar Consultant
>
> +256 774149 775, + 256 759 800161
>
> "I don't want to be anything other than what I have been - one tree hill "
>
> _______________________________________________
> Mailing list: https://launchpad.net/~dhis2-users
> Post to : dhis2-users@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~dhis2-users
> More help : https://help.launchpad.net/ListHelp
>
>

--
Jason P. Pickering
email: jason.p.pickering@gmail.com
tel:+46764147049
_______________________________________________
Mailing list: https://launchpad.net/~dhis2-users
Post to : dhis2-users@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-users
More help : https://help.launchpad.net/ListHelp

This was a very trivial lab test,so not really conclusive at all. I would just give it a try and see. If you see differences, please let the devs know.

Given the scale of what you are attempting, have you considered using direct SQL injection? Not that I am recommending that route as there are many pitfalls, but it might be an option if implemented properly, especially considering your reported architecture.

Regards

Jason

···

On Tue, Feb 2, 2016, 17:04 Uwe Wahser uwe@wahser.de wrote:

Hi Jason,

thanks for sharing the links. As I can see on a quick glance, you are also

experimenting with the ADX-api - did you observe any significant performance

differences between ADX and dataValueSets apis?

Regards,

Uwe

Jason Pickering jason.p.pickering@gmail.com hat am 2. Februar 2016 um 18:21

geschrieben:

Hi Olav,

I have not worked with the DHS API per se, but have imported lots of data

using the same approach which they outline here (

http://api.dhsprogram.com/#/samples-r.cfm)

I have written up a walkthrough of getting data out of one DHIS instance

and into another one, and I think the basic principles would be the same (

http://rpubs.com/jason_p_pickering/139589)

Metadata needs to be mapped (or created), the data needs to be reshaped,

and correctly formatted.

It should not be too difficult. I used R, but there are other examples with

Python and JavaScript on their examples page.

Regards,

Jason

On Tue, Feb 2, 2016 at 3:31 PM, Alex Tumwesigye atumwesigye@gmail.com > > > wrote:

Dear Uwe,

Have you tried to send data via the endpoint api/dataValueSets, it may be

faster. Just stage your data and push it once.

http://dhis2.github.io/dhis2-docs/master/en/developer/html/ch01s13.html#d5e1372

Also to note, is how you send it, I have seen curl taking ages to submit

individual values via the api. You need to send it as once file via once

request or implement concurrency.

Alex

On Tue, Feb 2, 2016 at 5:13 PM, Olav Poppe olav.poppe@me.com wrote:

Hi Randy and Uwe,

thanks, interesting to hear you experiences. Uwe, what you are working on

sounds quite a bit more complicated, and not least with far more data. I

image that with household surveys, it would be a matter of < 100 indicators

for < 200 orgunits for 2-3 periods, i.e. a fraction of what you are dealing

with!

Olav

  1. jan. 2016 kl. 09.29 skrev uwe wahser uwe@wahser.de:

Hi Olav & Randy,

I am currently banging on kettle (aka Pentaho DI) to extract data from a

source-system (SQL-ERP in our case) into DHIS2 dataSets in json format. In

our current test-scenario (2 dataElements in a dataSet with a

categoryCombination of 5 categories) we are currently updating ca. 4 mio

dataValues every night in a pseudo-delta mode (reading all data from

source, comparing to what is there in DHIS2 already, then only pushing

records for creating, updating or deleting dataValues into the api: ca.

150k per night in 1 hour, initial load was 7hrs). We still have to prove,

that this is feasible when setting up the first real life dataSet where

there will be more categories and more dataElements, thus exploding the

number of dataValues.

Getting there was a bit painful, but now it seems to work. I chose kettle

instead of Talend ETL (both open source) as it seemed to be easier to get

used to. However, from a data warehouse perspective I’d prefer to have

DHIS2 offering some sort of an integrated ETL landscape on the long run,

which would also allow to aggregate data from tracker into dataSets,

tracker to tracker, dataSets to dataSets etc.

Our current version of the kettle transformations and jobs were designed

to be generic (not for a specific dataSet, but you have to design your own

extractor which could be a simple csv-reader or maybe a DHS api-call). If

you are interested, I will share them. Just be aware that they are

currently in a very early and rough state and not documented. You’d have to

bring along the willingness to dig yourself into kettle and be pain

resistant to a certain degree :slight_smile:

I’d be interested to hear from other experiences …

Have a nice sunday,

Uwe


Am 29.01.2016 um 17:31 schrieb Wilson, Randy:

Not here unfortunately…just doing csv imports from DHS Excel files.

Would be useful for our data warehouse.

Randy

On Jan 29, 2016 2:59 PM, “Olav Poppe” olav.poppe@me.com wrote:

Hi all,

I wanted to hear if anyone has any experience with the DHS API (

http://api.dhsprogram.com/#/index.html), and using it to import survey

results into DHIS?

Olav


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

*This message and its attachments are confidential and solely for the

intended recipients. If received in error, please delete them and notify

the sender via reply e-mail immediately.*


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

Alex Tumwesigye

Technical Advisor - DHIS2 (Consultant),

Ministry of Health/AFENET

Kampala

Uganda

IT Consultant - BarefootPower Uganda Ltd, SmartSolar, Kenya

IT Specialist (Servers, Networks and Security, Health Information Systems

  • DHIS2 ) & Solar Consultant

+256 774149 775, + 256 759 800161

"I don’t want to be anything other than what I have been - one tree hill "


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

Jason P. Pickering

email: jason.p.pickering@gmail.com

tel:+46764147049


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

Hi Uwe,

ADX will not be faster than DXF, as for ADX, the stream is first converted into DXF and then passed on to the regular importer.

Lars

···

On Tue, Feb 2, 2016 at 5:33 PM, Jason Pickering jason.p.pickering@gmail.com wrote:

This was a very trivial lab test,so not really conclusive at all. I would just give it a try and see. If you see differences, please let the devs know.

Given the scale of what you are attempting, have you considered using direct SQL injection? Not that I am recommending that route as there are many pitfalls, but it might be an option if implemented properly, especially considering your reported architecture.

Regards

Jason

On Tue, Feb 2, 2016, 17:04 Uwe Wahser uwe@wahser.de wrote:

Hi Jason,

thanks for sharing the links. As I can see on a quick glance, you are also

experimenting with the ADX-api - did you observe any significant performance

differences between ADX and dataValueSets apis?

Regards,

Uwe

Jason Pickering jason.p.pickering@gmail.com hat am 2. Februar 2016 um 18:21

geschrieben:

Hi Olav,

I have not worked with the DHS API per se, but have imported lots of data

using the same approach which they outline here (

http://api.dhsprogram.com/#/samples-r.cfm)

I have written up a walkthrough of getting data out of one DHIS instance

and into another one, and I think the basic principles would be the same (

http://rpubs.com/jason_p_pickering/139589)

Metadata needs to be mapped (or created), the data needs to be reshaped,

and correctly formatted.

It should not be too difficult. I used R, but there are other examples with

Python and JavaScript on their examples page.

Regards,

Jason

On Tue, Feb 2, 2016 at 3:31 PM, Alex Tumwesigye atumwesigye@gmail.com

wrote:

Dear Uwe,

Have you tried to send data via the endpoint api/dataValueSets, it may be

faster. Just stage your data and push it once.

http://dhis2.github.io/dhis2-docs/master/en/developer/html/ch01s13.html#d5e1372

Also to note, is how you send it, I have seen curl taking ages to submit

individual values via the api. You need to send it as once file via once

request or implement concurrency.

Alex

On Tue, Feb 2, 2016 at 5:13 PM, Olav Poppe olav.poppe@me.com wrote:

Hi Randy and Uwe,

thanks, interesting to hear you experiences. Uwe, what you are working on

sounds quite a bit more complicated, and not least with far more data. I

image that with household surveys, it would be a matter of < 100 indicators

for < 200 orgunits for 2-3 periods, i.e. a fraction of what you are dealing

with!

Olav

  1. jan. 2016 kl. 09.29 skrev uwe wahser uwe@wahser.de:

Hi Olav & Randy,

I am currently banging on kettle (aka Pentaho DI) to extract data from a

source-system (SQL-ERP in our case) into DHIS2 dataSets in json format. In

our current test-scenario (2 dataElements in a dataSet with a

categoryCombination of 5 categories) we are currently updating ca. 4 mio

dataValues every night in a pseudo-delta mode (reading all data from

source, comparing to what is there in DHIS2 already, then only pushing

records for creating, updating or deleting dataValues into the api: ca.

150k per night in 1 hour, initial load was 7hrs). We still have to prove,

that this is feasible when setting up the first real life dataSet where

there will be more categories and more dataElements, thus exploding the

number of dataValues.

Getting there was a bit painful, but now it seems to work. I chose kettle

instead of Talend ETL (both open source) as it seemed to be easier to get

used to. However, from a data warehouse perspective I’d prefer to have

DHIS2 offering some sort of an integrated ETL landscape on the long run,

which would also allow to aggregate data from tracker into dataSets,

tracker to tracker, dataSets to dataSets etc.

Our current version of the kettle transformations and jobs were designed

to be generic (not for a specific dataSet, but you have to design your own

extractor which could be a simple csv-reader or maybe a DHS api-call). If

you are interested, I will share them. Just be aware that they are

currently in a very early and rough state and not documented. You’d have to

bring along the willingness to dig yourself into kettle and be pain

resistant to a certain degree :slight_smile:

I’d be interested to hear from other experiences …

Have a nice sunday,

Uwe


Am 29.01.2016 um 17:31 schrieb Wilson, Randy:

Not here unfortunately…just doing csv imports from DHS Excel files.

Would be useful for our data warehouse.

Randy

On Jan 29, 2016 2:59 PM, “Olav Poppe” olav.poppe@me.com wrote:

Hi all,

I wanted to hear if anyone has any experience with the DHS API (

http://api.dhsprogram.com/#/index.html), and using it to import survey

results into DHIS?

Olav


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

*This message and its attachments are confidential and solely for the

intended recipients. If received in error, please delete them and notify

the sender via reply e-mail immediately.*


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

Alex Tumwesigye

Technical Advisor - DHIS2 (Consultant),

Ministry of Health/AFENET

Kampala

Uganda

IT Consultant - BarefootPower Uganda Ltd, SmartSolar, Kenya

IT Specialist (Servers, Networks and Security, Health Information Systems

  • DHIS2 ) & Solar Consultant

+256 774149 775, + 256 759 800161

"I don’t want to be anything other than what I have been - one tree hill "


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

Jason P. Pickering

email: jason.p.pickering@gmail.com

tel:+46764147049


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

Lars Helge Øverland

Lead developer, DHIS 2

University of Oslo

Skype: larshelgeoverland

http://www.dhis2.org

Hi Randy,

currently I am just loading the bare dataSet. But you are right: a normal nightly load run should first update the meta-data and update the dataValues afterwards, otherwise you'd have values being rejected, if they were coded to new OrgUnits or category options that are not yet in DHIS2. However, we are not yet there, but that would be one of the next activities.

Also, as you have stated, the current version expects the categoryOptions to be compliant with those in DHIS2. Mappings have to be done in the custom extractor. As you state it is easier if there is no mapping needed, but from my previous DWH-experiences I know that this is normally desired since analysis data can normally grouped into broader categories than those from the operational systems, thus reducing the number of combinations in the cubes.

Our main benefit for the moment is that the ETL process compares the dataValues to what is already present in the DHIS2 and then decides whether to update an existing value, create a new value or to delete a value that doesn't not come any more (data are extracted in full, but uploaded in pseudo delta). Also the transformation from tabular data to the DHIS2 api-format is done, including the mapping to DHIS2-IDs for category option combinations.

Regards,

Uwe

···

---

Am 02.02.2016 um 18:04 schrieb Wilson, Randy:

One of the interesting ideas from Uwe's approach is that DHS has apparently standardized definitions for all indicators - presumably there is a code that we can use in DHIS-2 so that interoperability will be simplified. Uwe might want to extend the data element attributes to capture more of the metadata that is available in DHS to define the indicators. Also, I wonder if you plan to bring in the raw data (numerators & denominators) as data elements and build the calculations into DHIS-2, or bring in the calculated indicator values as data elements.

One of the challenges that we face in our Data Warehouse is that it contains indicators calculated based on both routine and population survey data. We have to be very careful of the indicator names so that people know which come from which source. For example: from DHS we have "Contraceptive prevalence rate - modern methods" while we estimate that from the routine HMIS data but call it "Contraceptive utilisation rate from health facilities - modern methods".

Randy

On Tue, Feb 2, 2016 at 4:13 PM, Olav Poppe <olav.poppe@me.com > <mailto:olav.poppe@me.com>> wrote:

    Hi Randy and Uwe,
    thanks, interesting to hear you experiences. Uwe, what you are
    working on sounds quite a bit more complicated, and not least with
    far more data. I image that with household surveys, it would be a
    matter of < 100 indicators for < 200 orgunits for 2-3 periods,
    i.e. a fraction of what you are dealing with!

    Olav

    31. jan. 2016 kl. 09.29 skrev uwe wahser <uwe@wahser.de
    <mailto:uwe@wahser.de>>:

    Hi Olav & Randy,

    I am currently banging on kettle (aka Pentaho DI) to extract data
    from a source-system (SQL-ERP in our case) into DHIS2 dataSets in
    json format. In our current test-scenario (2 dataElements in a
    dataSet with a categoryCombination of 5 categories) we are
    currently updating ca. 4 mio dataValues every night in a
    pseudo-delta mode (reading all data from source, comparing to
    what is there in DHIS2 already, then only pushing records for
    creating, updating or deleting dataValues into the api: ca. 150k
    per night in 1 hour, initial load was 7hrs). We still have to
    prove, that this is feasible when setting up the first real life
    dataSet where there will be more categories and more
    dataElements, thus exploding the number of dataValues.

    Getting there was a bit painful, but now it seems to work. I
    chose kettle instead of Talend ETL (both open source) as it
    seemed to be easier to get used to. However, from a data
    warehouse perspective I'd prefer to have DHIS2 offering some sort
    of an integrated ETL landscape on the long run, which would also
    allow to aggregate data from tracker into dataSets, tracker to
    tracker, dataSets to dataSets etc.

    Our current version of the kettle transformations and jobs were
    designed to be generic (not for a specific dataSet, but you have
    to design your own extractor which could be a simple csv-reader
    or maybe a DHS api-call). If you are interested, I will share
    them. Just be aware that they are currently in a very early and
    rough state and not documented. You'd have to bring along the
    willingness to dig yourself into kettle and be pain resistant to
    a certain degree :slight_smile:

    I'd be interested to hear from other experiences ...

    Have a nice sunday,

    Uwe

    ---

    Am 29.01.2016 um 17:31 schrieb Wilson, Randy:

    Not here unfortunately...just doing csv imports from DHS Excel
    files. Would be useful for our data warehouse.
    Randy

    On Jan 29, 2016 2:59 PM, "Olav Poppe" <olav.poppe@me.com >>> <mailto:olav.poppe@me.com>> wrote:

        Hi all,
        I wanted to hear if anyone has any experience with the DHS
        API (The DHS Program API), and using it
        to import survey results into DHIS?

        Olav

        _______________________________________________
        Mailing list: https://launchpad.net/~dhis2-users
        <https://launchpad.net/~dhis2-users&gt;
        Post to : dhis2-users@lists.launchpad.net
        <mailto:dhis2-users@lists.launchpad.net>
        Unsubscribe : https://launchpad.net/~dhis2-users
        <https://launchpad.net/~dhis2-users&gt;
        More help : https://help.launchpad.net/ListHelp

    /This message and its attachments are confidential and solely
    for the intended recipients. If received in error, please delete
    them and notify the sender via reply e-mail immediately./

    _______________________________________________
    Mailing list:https://launchpad.net/~dhis2-users
    <https://launchpad.net/~dhis2-users&gt;
    Post to :dhis2-users@lists.launchpad.net
    <mailto:dhis2-users@lists.launchpad.net>
    Unsubscribe :https://launchpad.net/~dhis2-users
    <https://launchpad.net/~dhis2-users&gt;
    More help :https://help.launchpad.net/ListHelp

--
*Randy Wilson*
/Team Leader: //Knowledge Management, Data Use and Research/
Rwanda Health System Strengthening Activity
Management Sciences for Health
Rwanda-Kigali
Direct: +250 788308835
E-mail: rwilson@msh.org <mailto:rwilson@msh.org>
Skype: wilsonrandy_us
<http://www.msh.org/&gt;
Stronger health systems. Greater health impact.
<Facebook; <https://twitter.com/MSHHealthImpact&gt; <https://www.youtube.com/user/MSHHealthImpact&gt;
www.msh.org <http://www.msh.org/&gt;

/This message and its attachments are confidential and solely for the intended recipients. If received in error, please delete them and notify the sender via reply e-mail immediately./

Ok, that sounds like ADX might even be a bit slower eventually, if the transformation process outweighs a potentially reduced datavolume. I might just stick with the json.

@Jason: I also thought about SQL-Injection shortly, but I am fearing internal changes of the data-model, which I'd have to understand fully in the first place. Of course the api's also change more than I expected, but at least that is announced :smiley:

Uwe

···

---

Am 02.02.2016 um 19:49 schrieb Lars Helge Øverland:

Hi Uwe,

ADX will not be faster than DXF, as for ADX, the stream is first converted into DXF and then passed on to the regular importer.

Lars

On Tue, Feb 2, 2016 at 5:33 PM, Jason Pickering > <jason.p.pickering@gmail.com <mailto:jason.p.pickering@gmail.com>> wrote:

    This was a very trivial lab test,so not really conclusive at all. I would just give it a try and see. If you see differences, please
    let the devs know.

    Given the scale of what you are attempting, have you considered
    using direct SQL injection? Not that I am recommending that route
    as there are many pitfalls, but it might be an option if
    implemented properly, especially considering your reported
    architecture.

    Regards
    Jason

    On Tue, Feb 2, 2016, 17:04 Uwe Wahser <uwe@wahser.de > <mailto:uwe@wahser.de>> wrote:

        Hi Jason,

        thanks for sharing the links. As I can see on a quick glance,
        you are also
        experimenting with the ADX-api - did you observe any
        significant performance
        differences between ADX and dataValueSets apis?

        Regards,

        Uwe

        > Jason Pickering <jason.p.pickering@gmail.com
        <mailto:jason.p.pickering@gmail.com>> hat am 2. Februar 2016
        um 18:21
        > geschrieben:
        >
        > Hi Olav,
        > I have not worked with the DHS API per se, but have imported
        lots of data
        > using the same approach which they outline here (
        > The DHS Program API)
        >
        > I have written up a walkthrough of getting data out of one
        DHIS instance
        > and into another one, and I think the basic principles would
        be the same (
        > http://rpubs.com/jason_p_pickering/139589\)
        >
        > Metadata needs to be mapped (or created), the data needs to
        be reshaped,
        > and correctly formatted.
        >
        > It should not be too difficult. I used R, but there are
        other examples with
        > Python and JavaScript on their examples page.
        >
        > Regards,
        > Jason
        >
        > On Tue, Feb 2, 2016 at 3:31 PM, Alex Tumwesigye > <atumwesigye@gmail.com <mailto:atumwesigye@gmail.com>> > > wrote:
        >
        > > Dear Uwe,
        > >
        > > Have you tried to send data via the endpoint
        api/dataValueSets, it may be
        > > faster. Just stage your data and push it once.
        > >
        http://dhis2.github.io/dhis2-docs/master/en/developer/html/ch01s13.html#d5e1372
        > >
        > > Also to note, is how you send it, I have seen curl taking
        ages to submit
        > > individual values via the api. You need to send it as once
        file via once
        > > request or implement concurrency.
        > >
        > > Alex
        > >
        > > On Tue, Feb 2, 2016 at 5:13 PM, Olav Poppe > <olav.poppe@me.com <mailto:olav.poppe@me.com>> wrote:
        > >
        > >> Hi Randy and Uwe,
        > >> thanks, interesting to hear you experiences. Uwe, what
        you are working on
        > >> sounds quite a bit more complicated, and not least with
        far more data. I
        > >> image that with household surveys, it would be a matter
        of < 100 indicators
        > >> for < 200 orgunits for 2-3 periods, i.e. a fraction of
        what you are dealing
        > >> with!
        > >>
        > >> Olav
        > >>
        > >> 31. jan. 2016 kl. 09.29 skrev uwe wahser <uwe@wahser.de
        <mailto:uwe@wahser.de>>:
        > >>
        > >> Hi Olav & Randy,
        > >>
        > >> I am currently banging on kettle (aka Pentaho DI) to
        extract data from a
        > >> source-system (SQL-ERP in our case) into DHIS2 dataSets
        in json format. In
        > >> our current test-scenario (2 dataElements in a dataSet with a
        > >> categoryCombination of 5 categories) we are currently
        updating ca. 4 mio
        > >> dataValues every night in a pseudo-delta mode (reading
        all data from
        > >> source, comparing to what is there in DHIS2 already, then
        only pushing
        > >> records for creating, updating or deleting dataValues
        into the api: ca.
        > >> 150k per night in 1 hour, initial load was 7hrs). We
        still have to prove,
        > >> that this is feasible when setting up the first real life
        dataSet where
        > >> there will be more categories and more dataElements, thus
        exploding the
        > >> number of dataValues.
        > >>
        > >> Getting there was a bit painful, but now it seems to
        work. I chose kettle
        > >> instead of Talend ETL (both open source) as it seemed to
        be easier to get
        > >> used to. However, from a data warehouse perspective I'd
        prefer to have
        > >> DHIS2 offering some sort of an integrated ETL landscape
        on the long run,
        > >> which would also allow to aggregate data from tracker
        into dataSets,
        > >> tracker to tracker, dataSets to dataSets etc.
        > >>
        > >> Our current version of the kettle transformations and
        jobs were designed
        > >> to be generic (not for a specific dataSet, but you have
        to design your own
        > >> extractor which could be a simple csv-reader or maybe a
        DHS api-call). If
        > >> you are interested, I will share them. Just be aware that
        they are
        > >> currently in a very early and rough state and not
        documented. You'd have to
        > >> bring along the willingness to dig yourself into kettle
        and be pain
        > >> resistant to a certain degree :slight_smile:
        > >>
        > >> I'd be interested to hear from other experiences ...
        > >>
        > >> Have a nice sunday,
        > >>
        > >> Uwe
        > >>
        > >> ---
        > >>
        > >> Am 29.01.2016 um 17:31 schrieb Wilson, Randy:
        > >>
        > >> Not here unfortunately...just doing csv imports from DHS
        Excel files.
        > >> Would be useful for our data warehouse.
        > >> Randy
        > >> On Jan 29, 2016 2:59 PM, "Olav Poppe" <olav.poppe@me.com > <mailto:olav.poppe@me.com>> wrote:
        > >>
        > >>> Hi all,
        > >>> I wanted to hear if anyone has any experience with the
        DHS API (
        > >>> The DHS Program API), and using it to
        import survey
        > >>> results into DHIS?
        > >>>
        > >>> Olav
        > >>>
        > >>> _______________________________________________
        > >>> Mailing list: https://launchpad.net/~dhis2-users
        <https://launchpad.net/~dhis2-users&gt;
        > >>> Post to : dhis2-users@lists.launchpad.net
        <mailto:dhis2-users@lists.launchpad.net>
        > >>> Unsubscribe : https://launchpad.net/~dhis2-users
        <https://launchpad.net/~dhis2-users&gt;
        > >>> More help : https://help.launchpad.net/ListHelp
        > >>>
        > >> *This message and its attachments are confidential and
        solely for the
        > >> intended recipients. If received in error, please delete
        them and notify
        > >> the sender via reply e-mail immediately.*
        > >>
        > >> _______________________________________________
        > >> Mailing list: https://launchpad.net/~dhis2-users
        <https://launchpad.net/~dhis2-users&gt;
        > >> Post to : dhis2-users@lists.launchpad.net
        <mailto:dhis2-users@lists.launchpad.net>
        > >> Unsubscribe : https://launchpad.net/~dhis2-users
        <https://launchpad.net/~dhis2-users&gt;
        > >> More help : https://help.launchpad.net/ListHelp
        > >>
        > >> _______________________________________________
        > >> Mailing list: https://launchpad.net/~dhis2-users
        <https://launchpad.net/~dhis2-users&gt;
        > >> Post to : dhis2-users@lists.launchpad.net
        <mailto:dhis2-users@lists.launchpad.net>
        > >> Unsubscribe : https://launchpad.net/~dhis2-users
        <https://launchpad.net/~dhis2-users&gt;
        > >> More help : https://help.launchpad.net/ListHelp
        > >>
        > >
        > > --
        > > Alex Tumwesigye
        > >
        > > Technical Advisor - DHIS2 (Consultant),
        > > Ministry of Health/AFENET
        > > Kampala
        > > Uganda
        > >
        > > IT Consultant - BarefootPower Uganda Ltd, SmartSolar, Kenya
        > >
        > > IT Specialist (Servers, Networks and Security, Health
        Information Systems
        > > - DHIS2 ) & Solar Consultant
        > >
        > > +256 774149 775, + 256 759 800161
        <tel:%2B%20256%20759%20800161>
        > >
        > > "I don't want to be anything other than what I have been -
        one tree hill "
        > >
        > > _______________________________________________
        > > Mailing list: https://launchpad.net/~dhis2-users
        <https://launchpad.net/~dhis2-users&gt;
        > > Post to : dhis2-users@lists.launchpad.net
        <mailto:dhis2-users@lists.launchpad.net>
        > > Unsubscribe : https://launchpad.net/~dhis2-users
        <https://launchpad.net/~dhis2-users&gt;
        > > More help : https://help.launchpad.net/ListHelp
        > >
        >
        > --
        > Jason P. Pickering
        > email: jason.p.pickering@gmail.com
        <mailto:jason.p.pickering@gmail.com>
        > tel:+46764147049 <tel:%2B46764147049>
        > _______________________________________________
        > Mailing list: https://launchpad.net/~dhis2-users
        <https://launchpad.net/~dhis2-users&gt;
        > Post to : dhis2-users@lists.launchpad.net
        <mailto:dhis2-users@lists.launchpad.net>
        > Unsubscribe : https://launchpad.net/~dhis2-users
        <https://launchpad.net/~dhis2-users&gt;
        > More help : https://help.launchpad.net/ListHelp

    _______________________________________________
    Mailing list: https://launchpad.net/~dhis2-users
    <https://launchpad.net/~dhis2-users&gt;
    Post to : dhis2-users@lists.launchpad.net
    <mailto:dhis2-users@lists.launchpad.net>
    Unsubscribe : https://launchpad.net/~dhis2-users
    <https://launchpad.net/~dhis2-users&gt;
    More help : https://help.launchpad.net/ListHelp

--
Lars Helge Øverland
Lead developer, DHIS 2
University of Oslo
Skype: larshelgeoverland
http://www.dhis2.org <https://www.dhis2.org>

Lars is right that ADX won't be faster than dxf. Both because it
internally converts to dxf on import and because it abstracts away the
categoryoptioncombo. The first isn't really very costly but the other
is.

This means that that the two systems only have to match categories and
categoryoptions which is a much easier mapping to maintain.

But if you need raw speed it is going to be faster to produce dxf
style categoryoptioncombos as that is closest to the way the data gets
stored. I am going to speed up the adx import code, but will still
always be slower :slight_smile:

···

On 2 February 2016 at 20:07, uwe wahser <uwe@wahser.de> wrote:

Ok, that sounds like ADX might even be a bit slower eventually, if the
transformation process outweighs a potentially reduced datavolume. I might
just stick with the json.

@Jason: I also thought about SQL-Injection shortly, but I am fearing
internal changes of the data-model, which I'd have to understand fully in
the first place. Of course the api's also change more than I expected, but
at least that is announced :smiley:

Uwe

---

Am 02.02.2016 um 19:49 schrieb Lars Helge Øverland:

Hi Uwe,

ADX will not be faster than DXF, as for ADX, the stream is first converted
into DXF and then passed on to the regular importer.

Lars

On Tue, Feb 2, 2016 at 5:33 PM, Jason Pickering > <jason.p.pickering@gmail.com> wrote:

This was a very trivial lab test,so not really conclusive at all. I would
just give it a try and see. If you see differences, please let the devs
know.

Given the scale of what you are attempting, have you considered using
direct SQL injection? Not that I am recommending that route as there are
many pitfalls, but it might be an option if implemented properly, especially
considering your reported architecture.

Regards
Jason

On Tue, Feb 2, 2016, 17:04 Uwe Wahser <uwe@wahser.de> wrote:

Hi Jason,

thanks for sharing the links. As I can see on a quick glance, you are
also
experimenting with the ADX-api - did you observe any significant
performance
differences between ADX and dataValueSets apis?

Regards,

Uwe

> Jason Pickering <jason.p.pickering@gmail.com> hat am 2. Februar 2016 um
> 18:21
> geschrieben:
>
>
> Hi Olav,
> I have not worked with the DHS API per se, but have imported lots of
> data
> using the same approach which they outline here (
> The DHS Program API)
>
> I have written up a walkthrough of getting data out of one DHIS
> instance
> and into another one, and I think the basic principles would be the
> same (
> http://rpubs.com/jason_p_pickering/139589\)
>
> Metadata needs to be mapped (or created), the data needs to be
> reshaped,
> and correctly formatted.
>
> It should not be too difficult. I used R, but there are other examples
> with
> Python and JavaScript on their examples page.
>
> Regards,
> Jason
>
>
> On Tue, Feb 2, 2016 at 3:31 PM, Alex Tumwesigye <atumwesigye@gmail.com> >>> > wrote:
>
> > Dear Uwe,
> >
> > Have you tried to send data via the endpoint api/dataValueSets, it
> > may be
> > faster. Just stage your data and push it once.
> >
> >
> > http://dhis2.github.io/dhis2-docs/master/en/developer/html/ch01s13.html#d5e1372
> >
> > Also to note, is how you send it, I have seen curl taking ages to
> > submit
> > individual values via the api. You need to send it as once file via
> > once
> > request or implement concurrency.
> >
> > Alex
> >
> > On Tue, Feb 2, 2016 at 5:13 PM, Olav Poppe <olav.poppe@me.com> wrote:
> >
> >> Hi Randy and Uwe,
> >> thanks, interesting to hear you experiences. Uwe, what you are
> >> working on
> >> sounds quite a bit more complicated, and not least with far more
> >> data. I
> >> image that with household surveys, it would be a matter of < 100
> >> indicators
> >> for < 200 orgunits for 2-3 periods, i.e. a fraction of what you are
> >> dealing
> >> with!
> >>
> >> Olav
> >>
> >>
> >>
> >>
> >>
> >>
> >> 31. jan. 2016 kl. 09.29 skrev uwe wahser <uwe@wahser.de>:
> >>
> >> Hi Olav & Randy,
> >>
> >> I am currently banging on kettle (aka Pentaho DI) to extract data
> >> from a
> >> source-system (SQL-ERP in our case) into DHIS2 dataSets in json
> >> format. In
> >> our current test-scenario (2 dataElements in a dataSet with a
> >> categoryCombination of 5 categories) we are currently updating ca. 4
> >> mio
> >> dataValues every night in a pseudo-delta mode (reading all data from
> >> source, comparing to what is there in DHIS2 already, then only
> >> pushing
> >> records for creating, updating or deleting dataValues into the api:
> >> ca.
> >> 150k per night in 1 hour, initial load was 7hrs). We still have to
> >> prove,
> >> that this is feasible when setting up the first real life dataSet
> >> where
> >> there will be more categories and more dataElements, thus exploding
> >> the
> >> number of dataValues.
> >>
> >> Getting there was a bit painful, but now it seems to work. I chose
> >> kettle
> >> instead of Talend ETL (both open source) as it seemed to be easier
> >> to get
> >> used to. However, from a data warehouse perspective I'd prefer to
> >> have
> >> DHIS2 offering some sort of an integrated ETL landscape on the long
> >> run,
> >> which would also allow to aggregate data from tracker into dataSets,
> >> tracker to tracker, dataSets to dataSets etc.
> >>
> >> Our current version of the kettle transformations and jobs were
> >> designed
> >> to be generic (not for a specific dataSet, but you have to design
> >> your own
> >> extractor which could be a simple csv-reader or maybe a DHS
> >> api-call). If
> >> you are interested, I will share them. Just be aware that they are
> >> currently in a very early and rough state and not documented. You'd
> >> have to
> >> bring along the willingness to dig yourself into kettle and be pain
> >> resistant to a certain degree :slight_smile:
> >>
> >> I'd be interested to hear from other experiences ...
> >>
> >> Have a nice sunday,
> >>
> >> Uwe
> >>
> >> ---
> >>
> >> Am 29.01.2016 um 17:31 schrieb Wilson, Randy:
> >>
> >> Not here unfortunately...just doing csv imports from DHS Excel
> >> files.
> >> Would be useful for our data warehouse.
> >> Randy
> >> On Jan 29, 2016 2:59 PM, "Olav Poppe" <olav.poppe@me.com> wrote:
> >>
> >>> Hi all,
> >>> I wanted to hear if anyone has any experience with the DHS API (
> >>> The DHS Program API), and using it to import
> >>> survey
> >>> results into DHIS?
> >>>
> >>> Olav
> >>>
> >>> _______________________________________________
> >>> Mailing list: https://launchpad.net/~dhis2-users
> >>> Post to : dhis2-users@lists.launchpad.net
> >>> Unsubscribe : https://launchpad.net/~dhis2-users
> >>> More help : https://help.launchpad.net/ListHelp
> >>>
> >>>
> >> *This message and its attachments are confidential and solely for
> >> the
> >> intended recipients. If received in error, please delete them and
> >> notify
> >> the sender via reply e-mail immediately.*
> >>
> >> _______________________________________________
> >> Mailing list: https://launchpad.net/~dhis2-users
> >> Post to : dhis2-users@lists.launchpad.net
> >> Unsubscribe : https://launchpad.net/~dhis2-users
> >> More help : https://help.launchpad.net/ListHelp
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> Mailing list: https://launchpad.net/~dhis2-users
> >> Post to : dhis2-users@lists.launchpad.net
> >> Unsubscribe : https://launchpad.net/~dhis2-users
> >> More help : https://help.launchpad.net/ListHelp
> >>
> >>
> >
> >
> > --
> > Alex Tumwesigye
> >
> > Technical Advisor - DHIS2 (Consultant),
> > Ministry of Health/AFENET
> > Kampala
> > Uganda
> >
> > IT Consultant - BarefootPower Uganda Ltd, SmartSolar, Kenya
> >
> > IT Specialist (Servers, Networks and Security, Health Information
> > Systems
> > - DHIS2 ) & Solar Consultant
> >
> > +256 774149 775, + 256 759 800161
> >
> > "I don't want to be anything other than what I have been - one tree
> > hill "
> >
> > _______________________________________________
> > Mailing list: https://launchpad.net/~dhis2-users
> > Post to : dhis2-users@lists.launchpad.net
> > Unsubscribe : https://launchpad.net/~dhis2-users
> > More help : https://help.launchpad.net/ListHelp
> >
> >
>
>
> --
> Jason P. Pickering
> email: jason.p.pickering@gmail.com
> tel:+46764147049
> _______________________________________________
> Mailing list: https://launchpad.net/~dhis2-users
> Post to : dhis2-users@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~dhis2-users
> More help : https://help.launchpad.net/ListHelp

_______________________________________________
Mailing list: https://launchpad.net/~dhis2-users
Post to : dhis2-users@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-users
More help : https://help.launchpad.net/ListHelp

--
Lars Helge Øverland
Lead developer, DHIS 2
University of Oslo
Skype: larshelgeoverland
http://www.dhis2.org

_______________________________________________
Mailing list: https://launchpad.net/~dhis2-users
Post to : dhis2-users@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-users
More help : https://help.launchpad.net/ListHelp

Hi Uwe,

make sure that you have tuned Postgres properly through postgresql.conf, especially the last 5 settings are crucial for getting good write performance.

http://dhis2.github.io/dhis2-docs/master/en/implementer/html/ch08s03.html#d5e464

checkpoint_segments = 32

PostgreSQL writes new transactions to a log file called WAL segments which are 16MB in size. When a number of segments have been written a checkpoint occurs. Setting this number to a larger value will thus improve performance for write-heavy systems such as DHIS 2.

checkpoint_completion_target = 0.8

Determines the percentage of segment completion before a checkpoint occurs. Setting this to a high value will thus spread the writes out and lower the average write overhead.

wal_buffers = 16MB

Sets the memory used for buffering during the WAL write process. Increasing this value might improve throughput in write-heavy systems.

synchronous_commit = off

Specifies whether transaction commits will wait for WAL records to be written to the disk before returning to the client or not. Setting this to off will improve performance considerably. It also implies that there is a slight delay between the transaction is reported successful to the client and it actually being safe, but the database state cannot be corrupted and this is a good alternative for performance-intensive and write-heavy systems like DHIS 2.

wal_writer_delay = 10000ms

Specifies the delay between WAL write operations. Setting this to a high value will improve performance on write-heavy systems since potentially many write operations can be executed within a single flush to disk.

···

On Tue, Feb 2, 2016 at 9:18 PM, Bob Jolliffe bobjolliffe@gmail.com wrote:

Lars is right that ADX won’t be faster than dxf. Both because it

internally converts to dxf on import and because it abstracts away the

categoryoptioncombo. The first isn’t really very costly but the other

is.

This means that that the two systems only have to match categories and

categoryoptions which is a much easier mapping to maintain.

But if you need raw speed it is going to be faster to produce dxf

style categoryoptioncombos as that is closest to the way the data gets

stored. I am going to speed up the adx import code, but will still

always be slower :slight_smile:

On 2 February 2016 at 20:07, uwe wahser uwe@wahser.de wrote:

Ok, that sounds like ADX might even be a bit slower eventually, if the

transformation process outweighs a potentially reduced datavolume. I might

just stick with the json.

@Jason: I also thought about SQL-Injection shortly, but I am fearing

internal changes of the data-model, which I’d have to understand fully in

the first place. Of course the api’s also change more than I expected, but

at least that is announced :smiley:

Uwe


Am 02.02.2016 um 19:49 schrieb Lars Helge Øverland:

Hi Uwe,

ADX will not be faster than DXF, as for ADX, the stream is first converted

into DXF and then passed on to the regular importer.

Lars

On Tue, Feb 2, 2016 at 5:33 PM, Jason Pickering

jason.p.pickering@gmail.com wrote:

This was a very trivial lab test,so not really conclusive at all. I would

just give it a try and see. If you see differences, please let the devs

know.

Given the scale of what you are attempting, have you considered using

direct SQL injection? Not that I am recommending that route as there are

many pitfalls, but it might be an option if implemented properly, especially

considering your reported architecture.

Regards

Jason

On Tue, Feb 2, 2016, 17:04 Uwe Wahser uwe@wahser.de wrote:

Hi Jason,

thanks for sharing the links. As I can see on a quick glance, you are

also

experimenting with the ADX-api - did you observe any significant

performance

differences between ADX and dataValueSets apis?

Regards,

Uwe

Jason Pickering jason.p.pickering@gmail.com hat am 2. Februar 2016 um

18:21

geschrieben:

Hi Olav,

I have not worked with the DHS API per se, but have imported lots of

data

using the same approach which they outline here (

http://api.dhsprogram.com/#/samples-r.cfm)

I have written up a walkthrough of getting data out of one DHIS

instance

and into another one, and I think the basic principles would be the

same (

http://rpubs.com/jason_p_pickering/139589)

Metadata needs to be mapped (or created), the data needs to be

reshaped,

and correctly formatted.

It should not be too difficult. I used R, but there are other examples

with

Python and JavaScript on their examples page.

Regards,

Jason

On Tue, Feb 2, 2016 at 3:31 PM, Alex Tumwesigye atumwesigye@gmail.com

wrote:

Dear Uwe,

Have you tried to send data via the endpoint api/dataValueSets, it

may be

faster. Just stage your data and push it once.

http://dhis2.github.io/dhis2-docs/master/en/developer/html/ch01s13.html#d5e1372

Also to note, is how you send it, I have seen curl taking ages to

submit

individual values via the api. You need to send it as once file via

once

request or implement concurrency.

Alex

On Tue, Feb 2, 2016 at 5:13 PM, Olav Poppe olav.poppe@me.com wrote:

Hi Randy and Uwe,

thanks, interesting to hear you experiences. Uwe, what you are

working on

sounds quite a bit more complicated, and not least with far more

data. I

image that with household surveys, it would be a matter of < 100

indicators

for < 200 orgunits for 2-3 periods, i.e. a fraction of what you are

dealing

with!

Olav

  1. jan. 2016 kl. 09.29 skrev uwe wahser uwe@wahser.de:

Hi Olav & Randy,

I am currently banging on kettle (aka Pentaho DI) to extract data

from a

source-system (SQL-ERP in our case) into DHIS2 dataSets in json

format. In

our current test-scenario (2 dataElements in a dataSet with a

categoryCombination of 5 categories) we are currently updating ca. 4

mio

dataValues every night in a pseudo-delta mode (reading all data from

source, comparing to what is there in DHIS2 already, then only

pushing

records for creating, updating or deleting dataValues into the api:

ca.

150k per night in 1 hour, initial load was 7hrs). We still have to

prove,

that this is feasible when setting up the first real life dataSet

where

there will be more categories and more dataElements, thus exploding

the

number of dataValues.

Getting there was a bit painful, but now it seems to work. I chose

kettle

instead of Talend ETL (both open source) as it seemed to be easier

to get

used to. However, from a data warehouse perspective I’d prefer to

have

DHIS2 offering some sort of an integrated ETL landscape on the long

run,

which would also allow to aggregate data from tracker into dataSets,

tracker to tracker, dataSets to dataSets etc.

Our current version of the kettle transformations and jobs were

designed

to be generic (not for a specific dataSet, but you have to design

your own

extractor which could be a simple csv-reader or maybe a DHS

api-call). If

you are interested, I will share them. Just be aware that they are

currently in a very early and rough state and not documented. You’d

have to

bring along the willingness to dig yourself into kettle and be pain

resistant to a certain degree :slight_smile:

I’d be interested to hear from other experiences …

Have a nice sunday,

Uwe


Am 29.01.2016 um 17:31 schrieb Wilson, Randy:

Not here unfortunately…just doing csv imports from DHS Excel

files.

Would be useful for our data warehouse.

Randy

On Jan 29, 2016 2:59 PM, “Olav Poppe” olav.poppe@me.com wrote:

Hi all,

I wanted to hear if anyone has any experience with the DHS API (

http://api.dhsprogram.com/#/index.html), and using it to import

survey

results into DHIS?

Olav


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

*This message and its attachments are confidential and solely for

the

intended recipients. If received in error, please delete them and

notify

the sender via reply e-mail immediately.*


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

Alex Tumwesigye

Technical Advisor - DHIS2 (Consultant),

Ministry of Health/AFENET

Kampala

Uganda

IT Consultant - BarefootPower Uganda Ltd, SmartSolar, Kenya

IT Specialist (Servers, Networks and Security, Health Information

Systems

  • DHIS2 ) & Solar Consultant

+256 774149 775, + 256 759 800161

"I don’t want to be anything other than what I have been - one tree

hill "


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

Jason P. Pickering

email: jason.p.pickering@gmail.com

tel:+46764147049


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

Lars Helge Øverland

Lead developer, DHIS 2

University of Oslo

Skype: larshelgeoverland

http://www.dhis2.org


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

Lars Helge Øverland

Lead developer, DHIS 2

University of Oslo

Skype: larshelgeoverland

http://www.dhis2.org

Hi Lars,

thanks for the hint. Currently I am just running on standards. I'll do a bit of
monitoring first before adjusting those values. I'll post the difference.

Regards,

Uwe

···

---

Lars Helge Øverland <larshelge@gmail.com> hat am 2. Februar 2016 um 23:43
geschrieben:

Hi Uwe,

make sure that you have tuned Postgres properly through postgresql.conf,
especially the last 5 settings are crucial for getting good write
performance.

http://dhis2.github.io/dhis2-docs/master/en/implementer/html/ch08s03.html#d5e464

checkpoint_segments = 32

PostgreSQL writes new transactions to a log file called WAL segments which
are 16MB in size. When a number of segments have been written a checkpoint
occurs. Setting this number to a larger value will thus improve performance
for write-heavy systems such as DHIS 2.

checkpoint_completion_target = 0.8

Determines the percentage of segment completion before a checkpoint occurs.
Setting this to a high value will thus spread the writes out and lower the
average write overhead.

wal_buffers = 16MB

Sets the memory used for buffering during the WAL write process. Increasing
this value might improve throughput in write-heavy systems.

synchronous_commit = off

Specifies whether transaction commits will wait for WAL records to be
written to the disk before returning to the client or not. Setting this to
off will improve performance considerably. It also implies that there is a
slight delay between the transaction is reported successful to the client
and it actually being safe, but the database state cannot be corrupted and
this is a good alternative for performance-intensive and write-heavy
systems like DHIS 2.

wal_writer_delay = 10000ms

Specifies the delay between WAL write operations. Setting this to a high
value will improve performance on write-heavy systems since potentially
many write operations can be executed within a single flush to disk.

On Tue, Feb 2, 2016 at 9:18 PM, Bob Jolliffe <bobjolliffe@gmail.com> wrote:

> Lars is right that ADX won't be faster than dxf. Both because it
> internally converts to dxf on import and because it abstracts away the
> categoryoptioncombo. The first isn't really very costly but the other
> is.
>
> This means that that the two systems only have to match categories and
> categoryoptions which is a much easier mapping to maintain.
>
> But if you need raw speed it is going to be faster to produce dxf
> style categoryoptioncombos as that is closest to the way the data gets
> stored. I am going to speed up the adx import code, but will still
> always be slower :slight_smile:
>
> On 2 February 2016 at 20:07, uwe wahser <uwe@wahser.de> wrote:
> > Ok, that sounds like ADX might even be a bit slower eventually, if the
> > transformation process outweighs a potentially reduced datavolume. I
> might
> > just stick with the json.
> >
> > @Jason: I also thought about SQL-Injection shortly, but I am fearing
> > internal changes of the data-model, which I'd have to understand fully in
> > the first place. Of course the api's also change more than I expected,
> but
> > at least that is announced :smiley:
> >
> > Uwe
> >
> > ---
> >
> >
> >
> > Am 02.02.2016 um 19:49 schrieb Lars Helge Øverland:
> >
> > Hi Uwe,
> >
> > ADX will not be faster than DXF, as for ADX, the stream is first
> converted
> > into DXF and then passed on to the regular importer.
> >
> > Lars
> >
> > On Tue, Feb 2, 2016 at 5:33 PM, Jason Pickering > > > <jason.p.pickering@gmail.com> wrote:
> >>
> >> This was a very trivial lab test,so not really conclusive at all. I
> would
> >> just give it a try and see. If you see differences, please let the devs
> >> know.
> >>
> >> Given the scale of what you are attempting, have you considered using
> >> direct SQL injection? Not that I am recommending that route as there are
> >> many pitfalls, but it might be an option if implemented properly,
> especially
> >> considering your reported architecture.
> >>
> >> Regards
> >> Jason
> >>
> >>
> >> On Tue, Feb 2, 2016, 17:04 Uwe Wahser <uwe@wahser.de> wrote:
> >>>
> >>> Hi Jason,
> >>>
> >>> thanks for sharing the links. As I can see on a quick glance, you are
> >>> also
> >>> experimenting with the ADX-api - did you observe any significant
> >>> performance
> >>> differences between ADX and dataValueSets apis?
> >>>
> >>> Regards,
> >>>
> >>> Uwe
> >>>
> >>> > Jason Pickering <jason.p.pickering@gmail.com> hat am 2. Februar
> 2016 um
> >>> > 18:21
> >>> > geschrieben:
> >>> >
> >>> >
> >>> > Hi Olav,
> >>> > I have not worked with the DHS API per se, but have imported lots of
> >>> > data
> >>> > using the same approach which they outline here (
> >>> > The DHS Program API)
> >>> >
> >>> > I have written up a walkthrough of getting data out of one DHIS
> >>> > instance
> >>> > and into another one, and I think the basic principles would be the
> >>> > same (
> >>> > http://rpubs.com/jason_p_pickering/139589\)
> >>> >
> >>> > Metadata needs to be mapped (or created), the data needs to be
> >>> > reshaped,
> >>> > and correctly formatted.
> >>> >
> >>> > It should not be too difficult. I used R, but there are other
> examples
> >>> > with
> >>> > Python and JavaScript on their examples page.
> >>> >
> >>> > Regards,
> >>> > Jason
> >>> >
> >>> >
> >>> > On Tue, Feb 2, 2016 at 3:31 PM, Alex Tumwesigye < > > atumwesigye@gmail.com> > > >>> > wrote:
> >>> >
> >>> > > Dear Uwe,
> >>> > >
> >>> > > Have you tried to send data via the endpoint api/dataValueSets, it
> >>> > > may be
> >>> > > faster. Just stage your data and push it once.
> >>> > >
> >>> > >
> >>> > >
> http://dhis2.github.io/dhis2-docs/master/en/developer/html/ch01s13.html#d5e1372
> >>> > >
> >>> > > Also to note, is how you send it, I have seen curl taking ages to
> >>> > > submit
> >>> > > individual values via the api. You need to send it as once file via
> >>> > > once
> >>> > > request or implement concurrency.
> >>> > >
> >>> > > Alex
> >>> > >
> >>> > > On Tue, Feb 2, 2016 at 5:13 PM, Olav Poppe <olav.poppe@me.com> > > wrote:
> >>> > >
> >>> > >> Hi Randy and Uwe,
> >>> > >> thanks, interesting to hear you experiences. Uwe, what you are
> >>> > >> working on
> >>> > >> sounds quite a bit more complicated, and not least with far more
> >>> > >> data. I
> >>> > >> image that with household surveys, it would be a matter of < 100
> >>> > >> indicators
> >>> > >> for < 200 orgunits for 2-3 periods, i.e. a fraction of what you
> are
> >>> > >> dealing
> >>> > >> with!
> >>> > >>
> >>> > >> Olav
> >>> > >>
> >>> > >>
> >>> > >>
> >>> > >>
> >>> > >>
> >>> > >>
> >>> > >> 31. jan. 2016 kl. 09.29 skrev uwe wahser <uwe@wahser.de>:
> >>> > >>
> >>> > >> Hi Olav & Randy,
> >>> > >>
> >>> > >> I am currently banging on kettle (aka Pentaho DI) to extract data
> >>> > >> from a
> >>> > >> source-system (SQL-ERP in our case) into DHIS2 dataSets in json
> >>> > >> format. In
> >>> > >> our current test-scenario (2 dataElements in a dataSet with a
> >>> > >> categoryCombination of 5 categories) we are currently updating
> ca. 4
> >>> > >> mio
> >>> > >> dataValues every night in a pseudo-delta mode (reading all data
> from
> >>> > >> source, comparing to what is there in DHIS2 already, then only
> >>> > >> pushing
> >>> > >> records for creating, updating or deleting dataValues into the
> api:
> >>> > >> ca.
> >>> > >> 150k per night in 1 hour, initial load was 7hrs). We still have to
> >>> > >> prove,
> >>> > >> that this is feasible when setting up the first real life dataSet
> >>> > >> where
> >>> > >> there will be more categories and more dataElements, thus
> exploding
> >>> > >> the
> >>> > >> number of dataValues.
> >>> > >>
> >>> > >> Getting there was a bit painful, but now it seems to work. I chose
> >>> > >> kettle
> >>> > >> instead of Talend ETL (both open source) as it seemed to be easier
> >>> > >> to get
> >>> > >> used to. However, from a data warehouse perspective I'd prefer to
> >>> > >> have
> >>> > >> DHIS2 offering some sort of an integrated ETL landscape on the
> long
> >>> > >> run,
> >>> > >> which would also allow to aggregate data from tracker into
> dataSets,
> >>> > >> tracker to tracker, dataSets to dataSets etc.
> >>> > >>
> >>> > >> Our current version of the kettle transformations and jobs were
> >>> > >> designed
> >>> > >> to be generic (not for a specific dataSet, but you have to design
> >>> > >> your own
> >>> > >> extractor which could be a simple csv-reader or maybe a DHS
> >>> > >> api-call). If
> >>> > >> you are interested, I will share them. Just be aware that they are
> >>> > >> currently in a very early and rough state and not documented.
> You'd
> >>> > >> have to
> >>> > >> bring along the willingness to dig yourself into kettle and be
> pain
> >>> > >> resistant to a certain degree :slight_smile:
> >>> > >>
> >>> > >> I'd be interested to hear from other experiences ...
> >>> > >>
> >>> > >> Have a nice sunday,
> >>> > >>
> >>> > >> Uwe
> >>> > >>
> >>> > >> ---
> >>> > >>
> >>> > >> Am 29.01.2016 um 17:31 schrieb Wilson, Randy:
> >>> > >>
> >>> > >> Not here unfortunately...just doing csv imports from DHS Excel
> >>> > >> files.
> >>> > >> Would be useful for our data warehouse.
> >>> > >> Randy
> >>> > >> On Jan 29, 2016 2:59 PM, "Olav Poppe" <olav.poppe@me.com> wrote:
> >>> > >>
> >>> > >>> Hi all,
> >>> > >>> I wanted to hear if anyone has any experience with the DHS API (
> >>> > >>> The DHS Program API), and using it to import
> >>> > >>> survey
> >>> > >>> results into DHIS?
> >>> > >>>
> >>> > >>> Olav
> >>> > >>>
> >>> > >>> _______________________________________________
> >>> > >>> Mailing list: https://launchpad.net/~dhis2-users
> >>> > >>> Post to : dhis2-users@lists.launchpad.net
> >>> > >>> Unsubscribe : https://launchpad.net/~dhis2-users
> >>> > >>> More help : https://help.launchpad.net/ListHelp
> >>> > >>>
> >>> > >>>
> >>> > >> *This message and its attachments are confidential and solely for
> >>> > >> the
> >>> > >> intended recipients. If received in error, please delete them and
> >>> > >> notify
> >>> > >> the sender via reply e-mail immediately.*
> >>> > >>
> >>> > >> _______________________________________________
> >>> > >> Mailing list: https://launchpad.net/~dhis2-users
> >>> > >> Post to : dhis2-users@lists.launchpad.net
> >>> > >> Unsubscribe : https://launchpad.net/~dhis2-users
> >>> > >> More help : https://help.launchpad.net/ListHelp
> >>> > >>
> >>> > >>
> >>> > >>
> >>> > >>
> >>> > >> _______________________________________________
> >>> > >> Mailing list: https://launchpad.net/~dhis2-users
> >>> > >> Post to : dhis2-users@lists.launchpad.net
> >>> > >> Unsubscribe : https://launchpad.net/~dhis2-users
> >>> > >> More help : https://help.launchpad.net/ListHelp
> >>> > >>
> >>> > >>
> >>> > >
> >>> > >
> >>> > > --
> >>> > > Alex Tumwesigye
> >>> > >
> >>> > > Technical Advisor - DHIS2 (Consultant),
> >>> > > Ministry of Health/AFENET
> >>> > > Kampala
> >>> > > Uganda
> >>> > >
> >>> > > IT Consultant - BarefootPower Uganda Ltd, SmartSolar, Kenya
> >>> > >
> >>> > > IT Specialist (Servers, Networks and Security, Health Information
> >>> > > Systems
> >>> > > - DHIS2 ) & Solar Consultant
> >>> > >
> >>> > > +256 774149 775, + 256 759 800161
> >>> > >
> >>> > > "I don't want to be anything other than what I have been - one tree
> >>> > > hill "
> >>> > >
> >>> > > _______________________________________________
> >>> > > Mailing list: https://launchpad.net/~dhis2-users
> >>> > > Post to : dhis2-users@lists.launchpad.net
> >>> > > Unsubscribe : https://launchpad.net/~dhis2-users
> >>> > > More help : https://help.launchpad.net/ListHelp
> >>> > >
> >>> > >
> >>> >
> >>> >
> >>> > --
> >>> > Jason P. Pickering
> >>> > email: jason.p.pickering@gmail.com
> >>> > tel:+46764147049
> >>> > _______________________________________________
> >>> > Mailing list: https://launchpad.net/~dhis2-users
> >>> > Post to : dhis2-users@lists.launchpad.net
> >>> > Unsubscribe : https://launchpad.net/~dhis2-users
> >>> > More help : https://help.launchpad.net/ListHelp
> >>
> >>
> >> _______________________________________________
> >> Mailing list: https://launchpad.net/~dhis2-users
> >> Post to : dhis2-users@lists.launchpad.net
> >> Unsubscribe : https://launchpad.net/~dhis2-users
> >> More help : https://help.launchpad.net/ListHelp
> >>
> >
> >
> >
> > --
> > Lars Helge Øverland
> > Lead developer, DHIS 2
> > University of Oslo
> > Skype: larshelgeoverland
> > http://www.dhis2.org
> >
> >
> >
> > _______________________________________________
> > Mailing list: https://launchpad.net/~dhis2-users
> > Post to : dhis2-users@lists.launchpad.net
> > Unsubscribe : https://launchpad.net/~dhis2-users
> > More help : https://help.launchpad.net/ListHelp
> >
>

--
Lars Helge Øverland
Lead developer, DHIS 2
University of Oslo
Skype: larshelgeoverland
http://www.dhis2.org <https://www.dhis2.org>

Sure. I usually get 2-5000 values per second on import so there should be room for some improvement.

regards,

Lars

···

On Wed, Feb 3, 2016 at 8:52 AM, Uwe Wahser uwe@wahser.de wrote:

Hi Lars,

thanks for the hint. Currently I am just running on standards. I’ll do a bit of

monitoring first before adjusting those values. I’ll post the difference.

Regards,

Uwe


Lars Helge Øverland larshelge@gmail.com hat am 2. Februar 2016 um 23:43

geschrieben:

Hi Uwe,

make sure that you have tuned Postgres properly through postgresql.conf,

especially the last 5 settings are crucial for getting good write

performance.

http://dhis2.github.io/dhis2-docs/master/en/implementer/html/ch08s03.html#d5e464

checkpoint_segments = 32

PostgreSQL writes new transactions to a log file called WAL segments which

are 16MB in size. When a number of segments have been written a checkpoint

occurs. Setting this number to a larger value will thus improve performance

for write-heavy systems such as DHIS 2.

checkpoint_completion_target = 0.8

Determines the percentage of segment completion before a checkpoint occurs.

Setting this to a high value will thus spread the writes out and lower the

average write overhead.

wal_buffers = 16MB

Sets the memory used for buffering during the WAL write process. Increasing

this value might improve throughput in write-heavy systems.

synchronous_commit = off

Specifies whether transaction commits will wait for WAL records to be

written to the disk before returning to the client or not. Setting this to

off will improve performance considerably. It also implies that there is a

slight delay between the transaction is reported successful to the client

and it actually being safe, but the database state cannot be corrupted and

this is a good alternative for performance-intensive and write-heavy

systems like DHIS 2.

wal_writer_delay = 10000ms

Specifies the delay between WAL write operations. Setting this to a high

value will improve performance on write-heavy systems since potentially

many write operations can be executed within a single flush to disk.

On Tue, Feb 2, 2016 at 9:18 PM, Bob Jolliffe bobjolliffe@gmail.com wrote:

Lars is right that ADX won’t be faster than dxf. Both because it

internally converts to dxf on import and because it abstracts away the

categoryoptioncombo. The first isn’t really very costly but the other

is.

This means that that the two systems only have to match categories and

categoryoptions which is a much easier mapping to maintain.

But if you need raw speed it is going to be faster to produce dxf

style categoryoptioncombos as that is closest to the way the data gets

stored. I am going to speed up the adx import code, but will still

always be slower :slight_smile:

On 2 February 2016 at 20:07, uwe wahser uwe@wahser.de wrote:

Ok, that sounds like ADX might even be a bit slower eventually, if the

transformation process outweighs a potentially reduced datavolume. I

might

just stick with the json.

@Jason: I also thought about SQL-Injection shortly, but I am fearing

internal changes of the data-model, which I’d have to understand fully in

the first place. Of course the api’s also change more than I expected,

but

at least that is announced :smiley:

Uwe


Am 02.02.2016 um 19:49 schrieb Lars Helge Øverland:

Hi Uwe,

ADX will not be faster than DXF, as for ADX, the stream is first

converted

into DXF and then passed on to the regular importer.

Lars

On Tue, Feb 2, 2016 at 5:33 PM, Jason Pickering

jason.p.pickering@gmail.com wrote:

This was a very trivial lab test,so not really conclusive at all. I

would

just give it a try and see. If you see differences, please let the devs

know.

Given the scale of what you are attempting, have you considered using

direct SQL injection? Not that I am recommending that route as there are

many pitfalls, but it might be an option if implemented properly,

especially

considering your reported architecture.

Regards

Jason

On Tue, Feb 2, 2016, 17:04 Uwe Wahser uwe@wahser.de wrote:

Hi Jason,

thanks for sharing the links. As I can see on a quick glance, you are

also

experimenting with the ADX-api - did you observe any significant

performance

differences between ADX and dataValueSets apis?

Regards,

Uwe

Jason Pickering jason.p.pickering@gmail.com hat am 2. Februar

2016 um

18:21

geschrieben:

Hi Olav,

I have not worked with the DHS API per se, but have imported lots of

data

using the same approach which they outline here (

http://api.dhsprogram.com/#/samples-r.cfm)

I have written up a walkthrough of getting data out of one DHIS

instance

and into another one, and I think the basic principles would be the

same (

http://rpubs.com/jason_p_pickering/139589)

Metadata needs to be mapped (or created), the data needs to be

reshaped,

and correctly formatted.

It should not be too difficult. I used R, but there are other

examples

with

Python and JavaScript on their examples page.

Regards,

Jason

On Tue, Feb 2, 2016 at 3:31 PM, Alex Tumwesigye <

atumwesigye@gmail.com>

wrote:

Dear Uwe,

Have you tried to send data via the endpoint api/dataValueSets, it

may be

faster. Just stage your data and push it once.

http://dhis2.github.io/dhis2-docs/master/en/developer/html/ch01s13.html#d5e1372

Also to note, is how you send it, I have seen curl taking ages to

submit

individual values via the api. You need to send it as once file via

once

request or implement concurrency.

Alex

On Tue, Feb 2, 2016 at 5:13 PM, Olav Poppe olav.poppe@me.com

wrote:

Hi Randy and Uwe,

thanks, interesting to hear you experiences. Uwe, what you are

working on

sounds quite a bit more complicated, and not least with far more

data. I

image that with household surveys, it would be a matter of < 100

indicators

for < 200 orgunits for 2-3 periods, i.e. a fraction of what you

are

dealing

with!

Olav

  1. jan. 2016 kl. 09.29 skrev uwe wahser uwe@wahser.de:

Hi Olav & Randy,

I am currently banging on kettle (aka Pentaho DI) to extract data

from a

source-system (SQL-ERP in our case) into DHIS2 dataSets in json

format. In

our current test-scenario (2 dataElements in a dataSet with a

categoryCombination of 5 categories) we are currently updating

ca. 4

mio

dataValues every night in a pseudo-delta mode (reading all data

from

source, comparing to what is there in DHIS2 already, then only

pushing

records for creating, updating or deleting dataValues into the

api:

ca.

150k per night in 1 hour, initial load was 7hrs). We still have to

prove,

that this is feasible when setting up the first real life dataSet

where

there will be more categories and more dataElements, thus

exploding

the

number of dataValues.

Getting there was a bit painful, but now it seems to work. I chose

kettle

instead of Talend ETL (both open source) as it seemed to be easier

to get

used to. However, from a data warehouse perspective I’d prefer to

have

DHIS2 offering some sort of an integrated ETL landscape on the

long

run,

which would also allow to aggregate data from tracker into

dataSets,

tracker to tracker, dataSets to dataSets etc.

Our current version of the kettle transformations and jobs were

designed

to be generic (not for a specific dataSet, but you have to design

your own

extractor which could be a simple csv-reader or maybe a DHS

api-call). If

you are interested, I will share them. Just be aware that they are

currently in a very early and rough state and not documented.

You’d

have to

bring along the willingness to dig yourself into kettle and be

pain

resistant to a certain degree :slight_smile:

I’d be interested to hear from other experiences …

Have a nice sunday,

Uwe


Am 29.01.2016 um 17:31 schrieb Wilson, Randy:

Not here unfortunately…just doing csv imports from DHS Excel

files.

Would be useful for our data warehouse.

Randy

On Jan 29, 2016 2:59 PM, “Olav Poppe” olav.poppe@me.com wrote:

Hi all,

I wanted to hear if anyone has any experience with the DHS API (

http://api.dhsprogram.com/#/index.html), and using it to import

survey

results into DHIS?

Olav


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

*This message and its attachments are confidential and solely for

the

intended recipients. If received in error, please delete them and

notify

the sender via reply e-mail immediately.*


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

Alex Tumwesigye

Technical Advisor - DHIS2 (Consultant),

Ministry of Health/AFENET

Kampala

Uganda

IT Consultant - BarefootPower Uganda Ltd, SmartSolar, Kenya

IT Specialist (Servers, Networks and Security, Health Information

Systems

  • DHIS2 ) & Solar Consultant

+256 774149 775, + 256 759 800161

"I don’t want to be anything other than what I have been - one tree

hill "


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

Jason P. Pickering

email: jason.p.pickering@gmail.com

tel:+46764147049


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

Lars Helge Øverland

Lead developer, DHIS 2

University of Oslo

Skype: larshelgeoverland

http://www.dhis2.org


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

Lars Helge Øverland

Lead developer, DHIS 2

University of Oslo

Skype: larshelgeoverland

http://www.dhis2.org <https://www.dhis2.org>

Lars Helge Øverland

Lead developer, DHIS 2

University of Oslo

Skype: larshelgeoverland

http://www.dhis2.org

Thanks Jason. I realise I should learn R.

I was thinking that it should be fairly simple to make a DHIS app that would let you interact with the DHS API to

  1. select a country

  2. select a survey/year for that country

  3. select the indicators available for that survey

If importing sub-national data, you would have to have some basic orgunit matching as well (though the number is limited for household surveys), but that’s it.

But it does not look like anyone has made that app yet unfortunately.

Olav

···

On Tue, Feb 2, 2016 at 3:31 PM, Alex Tumwesigye atumwesigye@gmail.com wrote:

Dear Uwe,

Have you tried to send data via the endpoint api/dataValueSets, it may be faster. Just stage your data and push it once.

http://dhis2.github.io/dhis2-docs/master/en/developer/html/ch01s13.html#d5e1372

Also to note, is how you send it, I have seen curl taking ages to submit individual values via the api. You need to send it as once file via once request or implement concurrency.

Alex


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

On Tue, Feb 2, 2016 at 5:13 PM, Olav Poppe olav.poppe@me.com wrote:

Hi Randy and Uwe,
thanks, interesting to hear you experiences. Uwe, what you are working on sounds quite a bit more complicated, and not least with far more data. I image that with household surveys, it would be a matter of < 100 indicators for < 200 orgunits for 2-3 periods, i.e. a fraction of what you are dealing with!

Olav

  1. jan. 2016 kl. 09.29 skrev uwe wahser uwe@wahser.de:

Hi Olav & Randy,

  I am currently banging on kettle (aka Pentaho DI) to extract data from a source-system (SQL-ERP in our case) into DHIS2 dataSets in json format. In our current test-scenario (2 dataElements in a dataSet with a categoryCombination of 5 categories) we are currently updating ca. 4 mio dataValues every night in a pseudo-delta mode (reading all data from source, comparing to what is there in DHIS2 already, then only pushing records for creating, updating or deleting dataValues into the api: ca. 150k per night in 1 hour, initial load was 7hrs). We still have to prove, that this is feasible when setting up the first real life dataSet where there will be more categories and more dataElements, thus exploding the number of dataValues.



  Getting there was a bit painful, but now it seems to work. I chose kettle instead of Talend ETL (both open source) as it seemed to be easier to get used to. However, from a data warehouse perspective I'd prefer to have DHIS2 offering some sort of an integrated ETL landscape on the long run, which would also allow to aggregate data from tracker into dataSets, tracker to tracker, dataSets to dataSets etc.



  Our current version of the kettle transformations and jobs were designed to be generic (not for a specific dataSet, but you have to design your own extractor which could be a simple csv-reader or maybe a DHS api-call). If you are interested, I will share them. Just be aware that they are currently in a very early and rough state and not documented. You'd have to bring along the willingness to dig yourself into kettle and be pain resistant to a certain degree :-)



  I'd be interested to hear from other experiences ...



  Have a nice sunday,



  Uwe



  ---



  Am 29.01.2016 um 17:31 schrieb Wilson, Randy:
    Not here unfortunately...just doing csv imports from DHS Excel files. Would be useful for our data warehouse.

    Randy
    On Jan 29, 2016 2:59 PM, "Olav Poppe" <olav.poppe@me.com        > wrote:

Hi all,
I wanted to hear if anyone has any experience with the DHS API (http://api.dhsprogram.com/#/index.html ), and using it to import survey results into DHIS?

Olav

      _______________________________________________

      Mailing list: [https://launchpad.net/~dhis2-users](https://launchpad.net/%7Edhis2-users)

      Post to     : dhis2-users@lists.launchpad.net

      Unsubscribe : [https://launchpad.net/~dhis2-users](https://launchpad.net/%7Edhis2-users)

      More help   : [https://help.launchpad.net/ListHelp](https://help.launchpad.net/ListHelp)
  *          This message and its attachments are confidential and solely for the intended recipients. If received in error, please delete them and notify the sender via reply e-mail immediately.*
_______________________________________________ Mailing list: [https://launchpad.net/~dhis2-users](https://launchpad.net/~dhis2-users)
Post to : dhis2-users@lists.launchpad.net
Unsubscribe : [https://launchpad.net/~dhis2-users](https://launchpad.net/~dhis2-users)
More help : [https://help.launchpad.net/ListHelp](https://help.launchpad.net/ListHelp)

Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp


Alex Tumwesigye

Technical Advisor - DHIS2 (Consultant),
Ministry of Health/AFENET

Kampala

Uganda

IT Consultant - BarefootPower Uganda Ltd, SmartSolar, Kenya

IT Specialist (Servers, Networks and Security, Health Information Systems - DHIS2 ) & Solar Consultant

+256 774149 775, + 256 759 800161

"I don’t want to be anything other than what I have been - one tree hill "

Jason P. Pickering
email: jason.p.pickering@gmail.com
tel:+46764147049

Hi - we wrapped up a pilot a few months ago with Johns Hopkins University National Evaluation Platform:

http://www.jhsph.edu/research/centers-and-institutes/institute-for-international-programs/current-projects/national-evaluation-platform/

to do something similar. In their case they wanted to bring the DHS data into stata to clean/prep it first, then export both metadata and the data itself for import into DHIS 2. We wrote a DHIS 2 user app + parallel java app to handle the import and it’s being piloted in 4 countries now, to combine it with their routine health information where possible. We didn’t use the DHS API’s and I’d have to check on what cleanup/prep JHU was doing. Our work ends up being more of a generic stata import app although I’m cc’ing in Lorill Crees who has been very involved with it and can comment more.

JHU has tasked us to open source the work so we should be able to share code. We don’t have it on online but I can check if we can send you a copy?

Our next round of work is on R integration so end users can conduct simple analyses. We’ve done our technical spikes and are doing proof of concept work now.

Aaron

···

On Wed, Feb 3, 2016 at 3:10 AM, Olav Poppe olav.poppe@me.com wrote:

Thanks Jason. I realise I should learn R.

I was thinking that it should be fairly simple to make a DHIS app that would let you interact with the DHS API to

  1. select a country
  1. select a survey/year for that country
  1. select the indicators available for that survey

If importing sub-national data, you would have to have some basic orgunit matching as well (though the number is limited for household surveys), but that’s it.

But it does not look like anyone has made that app yet unfortunately.

Olav

  1. feb. 2016 kl. 16.21 skrev Jason Pickering jason.p.pickering@gmail.com:

Hi Olav,
I have not worked with the DHS API per se, but have imported lots of data using the same approach which they outline here (http://api.dhsprogram.com/#/samples-r.cfm)

I have written up a walkthrough of getting data out of one DHIS instance and into another one, and I think the basic principles would be the same (http://rpubs.com/jason_p_pickering/139589)

Metadata needs to be mapped (or created), the data needs to be reshaped, and correctly formatted.

It should not be too difficult. I used R, but there are other examples with Python and JavaScript on their examples page.

Regards,

Jason