Realtime Analytics

Thing is, how long does it take for the event to actually get reported?

If it shows up on a person’s dashboard in ten minutes, but they are out having lunch, or it has taken 3 days for the event to be reported, what’s the use? Its no longer real time. Not even close to it.

I agree there are use cases where things needs to be sped up, and it can be very simply with a very small curl script. You can even check and see if analytics is running first just to be sure you do not trigger it again.

But lets not over-complicate things, and think about how real-time the data is which is actually being aggregated.

Regards,

Jason

···

On Fri, Apr 22, 2016 at 2:43 PM, Alex Tumwesigye atumwesigye@gmail.com wrote:

Hi Knut,

We are not talking of hours here, dashboards needs to be updated in almost real time and outbreaks needed to be detected as data is entered (once thresholds are met). So 10 minutes as an option in the dropdown for analytics is ok for me and I would consider it ok. Adding it would make the system almost real time and this would cater for the IDSR requirement. A lag of 10 minutes for an update of dashboard on screen is ok.

I wish, it could be added to the system than doing it using the API given the reasons above.

Alex


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

On Fri, Apr 22, 2016 at 3:27 PM, Knut Staring knutst@gmail.com wrote:

Hi Alex - I agree that IDSR requires “immediate” response - but I we are usually talking hours, not minutes, right?

Knut


Alex Tumwesigye

Technical Advisor - DHIS2 (Consultant),
Ministry of Health/AFENET

Kampala

Uganda
+256 774149 775, + 256 759 800161

Skype ID: talexie

IT Consultant (Servers, Networks and Security, Health Information Systems - DHIS2, Disease Outbreak & Surveillance Systems) & Solar Consultant

"I don’t want to be anything other than what I have been - one tree hill "

On Fri, Apr 22, 2016 at 2:23 PM, Alex Tumwesigye atumwesigye@gmail.com wrote:

Jason/Morten,

I agree that real time analytics would not be easily implemented. May be it is time to separate Aggregate and Tracker analytics as a start but it also depends on what the tracker is being used for e.g surveys may require no immediate analytics but IDSR/outbreaks may require instant update of analytics. We are looking at IDSR features and as I see, real time analytics will be a requirement. The IDSR requires real time analytics since people need to respond to outbreaks and handle outbreak responses and management in real time so that you can intervene and stop the outbreak.

Here is what I propose (just a thought from my discussion with Calle)

We set a configurable (checked) variable/attribute that indicates that if this variable is changed, the analytics process is started for example changing population data has a very big impact on indicators that depend on population,entering lab results or requests or patient updates for IDSR requires immediate analytics, etc. If we had this attribute/variable, then we would use it to identify the corresponding meta data that might need to be / may have changed and update (through temp tables) the only affected analytics tables. This way we can control the load required to run analytics in realtime since the system will only be updating affected changes.

Using the API to trigger analytics every after X minutes may be feasible but not sustainable as we do not control how many threads may be running as the API call through curl does not easily get feedback/update if the previous analytics process has been completed for the new one to start otherwise we can end up in a forever loop if the server resources are not enough.

Alex


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

Knut Staring

Dept. of Informatics, University of Oslo

Norway: +4791880522

Skype: knutstar

http://dhis2.org

On Fri, Apr 22, 2016 at 2:50 PM, Morten Olav Hansen morten@dhis2.org wrote:

1 ns is fine… but i depends on what the ramifications are… what if you start a new job every 1 ms? is that ok? what happens during table swap? is analytic blocked?

Maybe this is already documented… just curious :slight_smile:


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

Alex Tumwesigye

Technical Advisor - DHIS2 (Consultant),
Ministry of Health/AFENET

Kampala

Uganda
+256 774149 775, + 256 759 800161

Skype ID: talexie

IT Consultant (Servers, Networks and Security, Health Information Systems - DHIS2, Disease Outbreak & Surveillance Systems) & Solar Consultant

"I don’t want to be anything other than what I have been - one tree hill "

Morten Olav Hansen

Senior Engineer, DHIS 2

University of Oslo

http://www.dhis2.org

On Fri, Apr 22, 2016 at 6:48 PM, Jason Pickering jason.p.pickering@gmail.com wrote:

Why not ever 1 nanosecond? Its always going to take some amount of time, question is what is reasonable.

One could argue that real-time analytics in an aggregate data system is not needed. This is not real time data. Its not even close to it.

Look at Google Analytics. They provide reports once a day, and you do not see a whole lot of people complaning. Yes, you can get some limited real time information from this as well, but its limited. The data must be processed first, and that takes computational time. Same with DHIS2.

From my experience, people think they need “real time analytics” when they really are just in a rush. Data takes time to review and analayze and whether its available now, 1 nanosecond from now, or 10 minutes from now, makes no difference in the end, as the amount of time which is required to digest that information is on a totally different time scale (hours days or weeks). Once an hour is probably easily achievable depending on the scale of the system however.

As for the call to the API, just create a Bash script and call it as frequently as you like with a cron task.

This is a very simple one, but you should really check for things like “Is analytics already running and should I trigger another run?”

#!/bin/sh

/usr/bin/curl “localhost:8080/api/resourceTables/analytics?skipResourceTables=true&lastYears=2” -X POST -u admin:district >/dev/null 2>&1

On Fri, Apr 22, 2016 at 1:34 PM, Morten Olav Hansen morten@dhis2.org wrote:

Well, it sounds like a bad solution for me :wink: if every 10 min works fine… why not every 5 min… why not every 1 min…


Jason P. Pickering
email: jason.p.pickering@gmail.com
tel:+46764147049

Morten Olav Hansen

Senior Engineer, DHIS 2

University of Oslo

http://www.dhis2.org

On Fri, Apr 22, 2016 at 6:31 PM, Ibrahim Bayoh ibrahim.bayoh@sl.ehealthafrica.org wrote:

@Jason,Knut and Morten, Having analytics table run every 10 mins sounds like a good place to start. but am kind of not sure how to implement this with the API calls and am sure this is not possible through the user interface. If you guys can point me in the right direction with an example of some sort that will greatly helpful and highly appreciated.

Thanks.

On Fri, Apr 22, 2016 at 11:23 AM, Jason Pickering jason.p.pickering@gmail.com wrote:

No they are swapped now, so first they are built as temp tables, then swapped. So, this is the brief point in time in which things may not be available.

Every 10 minutes might be OK, depending on your server, loading, and amount of data. It just requires some experimentation.

Point is, “real time” analytics is not possible. Near-real time may be.


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp


Ibrahim Rashid Bayoh
Information Systems Coordinator,

eHealth Africa(Sierra Leone)

117 Wilkinson Rd, Freetown, Sierra Leone

Mobile: +232 88-765-638

ibrahim.bayoh@sl.ehealthafrica.org

http://ehealthafrica.org/

On Fri, Apr 22, 2016 at 1:03 PM, Morten Olav Hansen morten@dhis2.org wrote:

But I thought all analytic tables was cleared out during re-generation? is this not true?


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp


Jason P. Pickering
email: jason.p.pickering@gmail.com
tel:+46764147049

Morten Olav Hansen

Senior Engineer, DHIS 2

University of Oslo

http://www.dhis2.org

On Fri, Apr 22, 2016 at 6:00 PM, Knut Staring knutst@gmail.com wrote:

Depends on the data. I trigger analytics every 10 min.

Hi All,
Initially i thought DHIS2 had fully or out-of-the-box realtime analytics and Dashboards. But after further inspection i realized that current data entry will not available for analysis or dashboards until the next day or by manually doing export tables. This is a real bottle neck to the intended implementation am working on. I have looked at Scheduling, but the options for analytics table is not ideal in my case. Is there way this can be reduced or removed to gain real-time dashboards and analytics?

Thanks,

Bayoh.


Ibrahim Rashid Bayoh
Information Systems Coordinator,

eHealth Africa(Sierra Leone)

117 Wilkinson Rd, Freetown, Sierra Leone

Mobile: +232 88-765-638

ibrahim.bayoh@sl.ehealthafrica.org

http://ehealthafrica.org/


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

Jason P. Pickering
email: jason.p.pickering@gmail.com
tel:+46764147049

Dear Jason,

I agree to not complicating things but as we look at IDSR module as a generic module with DHIS2. We may need to think twice about the whole process unless we want to keep it as a separate piece managed outside the main DHIS2.

Scripts can be written, I have no problem with that by the way but how many (of those using DHIS2) can write or manage simple scripts? If we make this complicated, we run a risk of people not using some of the modules.

Alex

···

On Fri, Apr 22, 2016 at 3:49 PM, Jason Pickering jason.p.pickering@gmail.com wrote:

Thing is, how long does it take for the event to actually get reported?

If it shows up on a person’s dashboard in ten minutes, but they are out having lunch, or it has taken 3 days for the event to be reported, what’s the use? Its no longer real time. Not even close to it.

I agree there are use cases where things needs to be sped up, and it can be very simply with a very small curl script. You can even check and see if analytics is running first just to be sure you do not trigger it again.

But lets not over-complicate things, and think about how real-time the data is which is actually being aggregated.

Regards,

Jason

On Fri, Apr 22, 2016 at 2:43 PM, Alex Tumwesigye atumwesigye@gmail.com wrote:

Hi Knut,

We are not talking of hours here, dashboards needs to be updated in almost real time and outbreaks needed to be detected as data is entered (once thresholds are met). So 10 minutes as an option in the dropdown for analytics is ok for me and I would consider it ok. Adding it would make the system almost real time and this would cater for the IDSR requirement. A lag of 10 minutes for an update of dashboard on screen is ok.

I wish, it could be added to the system than doing it using the API given the reasons above.

Alex


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp


Jason P. Pickering
email: jason.p.pickering@gmail.com
tel:+46764147049

On Fri, Apr 22, 2016 at 3:27 PM, Knut Staring knutst@gmail.com wrote:

Hi Alex - I agree that IDSR requires “immediate” response - but I we are usually talking hours, not minutes, right?

Knut


Alex Tumwesigye

Technical Advisor - DHIS2 (Consultant),
Ministry of Health/AFENET

Kampala

Uganda
+256 774149 775, + 256 759 800161

Skype ID: talexie

IT Consultant (Servers, Networks and Security, Health Information Systems - DHIS2, Disease Outbreak & Surveillance Systems) & Solar Consultant

"I don’t want to be anything other than what I have been - one tree hill "

On Fri, Apr 22, 2016 at 2:23 PM, Alex Tumwesigye atumwesigye@gmail.com wrote:

Jason/Morten,

I agree that real time analytics would not be easily implemented. May be it is time to separate Aggregate and Tracker analytics as a start but it also depends on what the tracker is being used for e.g surveys may require no immediate analytics but IDSR/outbreaks may require instant update of analytics. We are looking at IDSR features and as I see, real time analytics will be a requirement. The IDSR requires real time analytics since people need to respond to outbreaks and handle outbreak responses and management in real time so that you can intervene and stop the outbreak.

Here is what I propose (just a thought from my discussion with Calle)

We set a configurable (checked) variable/attribute that indicates that if this variable is changed, the analytics process is started for example changing population data has a very big impact on indicators that depend on population,entering lab results or requests or patient updates for IDSR requires immediate analytics, etc. If we had this attribute/variable, then we would use it to identify the corresponding meta data that might need to be / may have changed and update (through temp tables) the only affected analytics tables. This way we can control the load required to run analytics in realtime since the system will only be updating affected changes.

Using the API to trigger analytics every after X minutes may be feasible but not sustainable as we do not control how many threads may be running as the API call through curl does not easily get feedback/update if the previous analytics process has been completed for the new one to start otherwise we can end up in a forever loop if the server resources are not enough.

Alex


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

Knut Staring

Dept. of Informatics, University of Oslo

Norway: +4791880522

Skype: knutstar

http://dhis2.org

On Fri, Apr 22, 2016 at 2:50 PM, Morten Olav Hansen morten@dhis2.org wrote:

1 ns is fine… but i depends on what the ramifications are… what if you start a new job every 1 ms? is that ok? what happens during table swap? is analytic blocked?

Maybe this is already documented… just curious :slight_smile:


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

Alex Tumwesigye

Technical Advisor - DHIS2 (Consultant),
Ministry of Health/AFENET

Kampala

Uganda
+256 774149 775, + 256 759 800161

Skype ID: talexie

IT Consultant (Servers, Networks and Security, Health Information Systems - DHIS2, Disease Outbreak & Surveillance Systems) & Solar Consultant

"I don’t want to be anything other than what I have been - one tree hill "

Morten Olav Hansen

Senior Engineer, DHIS 2

University of Oslo

http://www.dhis2.org

On Fri, Apr 22, 2016 at 6:48 PM, Jason Pickering jason.p.pickering@gmail.com wrote:

Why not ever 1 nanosecond? Its always going to take some amount of time, question is what is reasonable.

One could argue that real-time analytics in an aggregate data system is not needed. This is not real time data. Its not even close to it.

Look at Google Analytics. They provide reports once a day, and you do not see a whole lot of people complaning. Yes, you can get some limited real time information from this as well, but its limited. The data must be processed first, and that takes computational time. Same with DHIS2.

From my experience, people think they need “real time analytics” when they really are just in a rush. Data takes time to review and analayze and whether its available now, 1 nanosecond from now, or 10 minutes from now, makes no difference in the end, as the amount of time which is required to digest that information is on a totally different time scale (hours days or weeks). Once an hour is probably easily achievable depending on the scale of the system however.

As for the call to the API, just create a Bash script and call it as frequently as you like with a cron task.

This is a very simple one, but you should really check for things like “Is analytics already running and should I trigger another run?”

#!/bin/sh

/usr/bin/curl “localhost:8080/api/resourceTables/analytics?skipResourceTables=true&lastYears=2” -X POST -u admin:district >/dev/null 2>&1

On Fri, Apr 22, 2016 at 1:34 PM, Morten Olav Hansen morten@dhis2.org wrote:

Well, it sounds like a bad solution for me :wink: if every 10 min works fine… why not every 5 min… why not every 1 min…


Jason P. Pickering
email: jason.p.pickering@gmail.com
tel:+46764147049

Morten Olav Hansen

Senior Engineer, DHIS 2

University of Oslo

http://www.dhis2.org

On Fri, Apr 22, 2016 at 6:31 PM, Ibrahim Bayoh ibrahim.bayoh@sl.ehealthafrica.org wrote:

@Jason,Knut and Morten, Having analytics table run every 10 mins sounds like a good place to start. but am kind of not sure how to implement this with the API calls and am sure this is not possible through the user interface. If you guys can point me in the right direction with an example of some sort that will greatly helpful and highly appreciated.

Thanks.

On Fri, Apr 22, 2016 at 11:23 AM, Jason Pickering jason.p.pickering@gmail.com wrote:

No they are swapped now, so first they are built as temp tables, then swapped. So, this is the brief point in time in which things may not be available.

Every 10 minutes might be OK, depending on your server, loading, and amount of data. It just requires some experimentation.

Point is, “real time” analytics is not possible. Near-real time may be.


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp


Ibrahim Rashid Bayoh
Information Systems Coordinator,

eHealth Africa(Sierra Leone)

117 Wilkinson Rd, Freetown, Sierra Leone

Mobile: +232 88-765-638

ibrahim.bayoh@sl.ehealthafrica.org

http://ehealthafrica.org/

On Fri, Apr 22, 2016 at 1:03 PM, Morten Olav Hansen morten@dhis2.org wrote:

But I thought all analytic tables was cleared out during re-generation? is this not true?


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp


Jason P. Pickering
email: jason.p.pickering@gmail.com
tel:+46764147049

Morten Olav Hansen

Senior Engineer, DHIS 2

University of Oslo

http://www.dhis2.org

On Fri, Apr 22, 2016 at 6:00 PM, Knut Staring knutst@gmail.com wrote:

Depends on the data. I trigger analytics every 10 min.

Hi All,
Initially i thought DHIS2 had fully or out-of-the-box realtime analytics and Dashboards. But after further inspection i realized that current data entry will not available for analysis or dashboards until the next day or by manually doing export tables. This is a real bottle neck to the intended implementation am working on. I have looked at Scheduling, but the options for analytics table is not ideal in my case. Is there way this can be reduced or removed to gain real-time dashboards and analytics?

Thanks,

Bayoh.


Ibrahim Rashid Bayoh
Information Systems Coordinator,

eHealth Africa(Sierra Leone)

117 Wilkinson Rd, Freetown, Sierra Leone

Mobile: +232 88-765-638

ibrahim.bayoh@sl.ehealthafrica.org

http://ehealthafrica.org/


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

Alex Tumwesigye

Technical Advisor - DHIS2 (Consultant),
Ministry of Health/AFENET

Kampala

Uganda
+256 774149 775, + 256 759 800161

Skype ID: talexie

IT Consultant (Servers, Networks and Security, Health Information Systems - DHIS2, Disease Outbreak & Surveillance Systems) & Solar Consultant

"I don’t want to be anything other than what I have been - one tree hill "

Hi Alex,

What I am saying here is the underlying data itself is not real time. If we were dealing with milli–second stock trades or telemetry data from a rocket stored as events, the need to aggregate data in real-time is critical. However, we are typically dealing, event in the best case, are events which may get reported several times a day. However, the event itself has already “aged” and is no longer real-time. As an example: The patient comes, samples are taken, sent to the lab, confirmed, and reviewed by a clinician. All of that takes time. The event then in the best case, get reported shortly there after. But what if it doesn’t? What if the internet is down? What if there is no one to report it?

Speeding up DHIS2 analytics is really easy. Buy a big powerful server, and call a shell script every half an hour. But, that is not necessarily going to improve the “real time” nature of the data. You have to look downstream for that. Ergo, having a system to aggregate non-real time data in real time, seems both pointless and non environmentally friendly to me.

What does make sense is to write a simple shell script to better meet your use case. Its easy, but I do not think it will really make the data more real time. It will make it appear to be more real-time and is really quite easily achievable.

Regards,

Jason

···

On Fri, Apr 22, 2016 at 2:58 PM, Alex Tumwesigye atumwesigye@gmail.com wrote:

Dear Jason,

I agree to not complicating things but as we look at IDSR module as a generic module with DHIS2. We may need to think twice about the whole process unless we want to keep it as a separate piece managed outside the main DHIS2.

Scripts can be written, I have no problem with that by the way but how many (of those using DHIS2) can write or manage simple scripts? If we make this complicated, we run a risk of people not using some of the modules.

Alex

On Fri, Apr 22, 2016 at 3:49 PM, Jason Pickering jason.p.pickering@gmail.com wrote:

Thing is, how long does it take for the event to actually get reported?

If it shows up on a person’s dashboard in ten minutes, but they are out having lunch, or it has taken 3 days for the event to be reported, what’s the use? Its no longer real time. Not even close to it.

I agree there are use cases where things needs to be sped up, and it can be very simply with a very small curl script. You can even check and see if analytics is running first just to be sure you do not trigger it again.

But lets not over-complicate things, and think about how real-time the data is which is actually being aggregated.

Regards,

Jason


Alex Tumwesigye

Technical Advisor - DHIS2 (Consultant),
Ministry of Health/AFENET

Kampala

Uganda
+256 774149 775, + 256 759 800161

Skype ID: talexie

IT Consultant (Servers, Networks and Security, Health Information Systems - DHIS2, Disease Outbreak & Surveillance Systems) & Solar Consultant

"I don’t want to be anything other than what I have been - one tree hill "

On Fri, Apr 22, 2016 at 2:43 PM, Alex Tumwesigye atumwesigye@gmail.com wrote:

Hi Knut,

We are not talking of hours here, dashboards needs to be updated in almost real time and outbreaks needed to be detected as data is entered (once thresholds are met). So 10 minutes as an option in the dropdown for analytics is ok for me and I would consider it ok. Adding it would make the system almost real time and this would cater for the IDSR requirement. A lag of 10 minutes for an update of dashboard on screen is ok.

I wish, it could be added to the system than doing it using the API given the reasons above.

Alex


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp


Jason P. Pickering
email: jason.p.pickering@gmail.com
tel:+46764147049

On Fri, Apr 22, 2016 at 3:27 PM, Knut Staring knutst@gmail.com wrote:

Hi Alex - I agree that IDSR requires “immediate” response - but I we are usually talking hours, not minutes, right?

Knut


Alex Tumwesigye

Technical Advisor - DHIS2 (Consultant),
Ministry of Health/AFENET

Kampala

Uganda
+256 774149 775, + 256 759 800161

Skype ID: talexie

IT Consultant (Servers, Networks and Security, Health Information Systems - DHIS2, Disease Outbreak & Surveillance Systems) & Solar Consultant

"I don’t want to be anything other than what I have been - one tree hill "

On Fri, Apr 22, 2016 at 2:23 PM, Alex Tumwesigye atumwesigye@gmail.com wrote:

Jason/Morten,

I agree that real time analytics would not be easily implemented. May be it is time to separate Aggregate and Tracker analytics as a start but it also depends on what the tracker is being used for e.g surveys may require no immediate analytics but IDSR/outbreaks may require instant update of analytics. We are looking at IDSR features and as I see, real time analytics will be a requirement. The IDSR requires real time analytics since people need to respond to outbreaks and handle outbreak responses and management in real time so that you can intervene and stop the outbreak.

Here is what I propose (just a thought from my discussion with Calle)

We set a configurable (checked) variable/attribute that indicates that if this variable is changed, the analytics process is started for example changing population data has a very big impact on indicators that depend on population,entering lab results or requests or patient updates for IDSR requires immediate analytics, etc. If we had this attribute/variable, then we would use it to identify the corresponding meta data that might need to be / may have changed and update (through temp tables) the only affected analytics tables. This way we can control the load required to run analytics in realtime since the system will only be updating affected changes.

Using the API to trigger analytics every after X minutes may be feasible but not sustainable as we do not control how many threads may be running as the API call through curl does not easily get feedback/update if the previous analytics process has been completed for the new one to start otherwise we can end up in a forever loop if the server resources are not enough.

Alex


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

Knut Staring

Dept. of Informatics, University of Oslo

Norway: +4791880522

Skype: knutstar

http://dhis2.org

On Fri, Apr 22, 2016 at 2:50 PM, Morten Olav Hansen morten@dhis2.org wrote:

1 ns is fine… but i depends on what the ramifications are… what if you start a new job every 1 ms? is that ok? what happens during table swap? is analytic blocked?

Maybe this is already documented… just curious :slight_smile:


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

Alex Tumwesigye

Technical Advisor - DHIS2 (Consultant),
Ministry of Health/AFENET

Kampala

Uganda
+256 774149 775, + 256 759 800161

Skype ID: talexie

IT Consultant (Servers, Networks and Security, Health Information Systems - DHIS2, Disease Outbreak & Surveillance Systems) & Solar Consultant

"I don’t want to be anything other than what I have been - one tree hill "

Morten Olav Hansen

Senior Engineer, DHIS 2

University of Oslo

http://www.dhis2.org

On Fri, Apr 22, 2016 at 6:48 PM, Jason Pickering jason.p.pickering@gmail.com wrote:

Why not ever 1 nanosecond? Its always going to take some amount of time, question is what is reasonable.

One could argue that real-time analytics in an aggregate data system is not needed. This is not real time data. Its not even close to it.

Look at Google Analytics. They provide reports once a day, and you do not see a whole lot of people complaning. Yes, you can get some limited real time information from this as well, but its limited. The data must be processed first, and that takes computational time. Same with DHIS2.

From my experience, people think they need “real time analytics” when they really are just in a rush. Data takes time to review and analayze and whether its available now, 1 nanosecond from now, or 10 minutes from now, makes no difference in the end, as the amount of time which is required to digest that information is on a totally different time scale (hours days or weeks). Once an hour is probably easily achievable depending on the scale of the system however.

As for the call to the API, just create a Bash script and call it as frequently as you like with a cron task.

This is a very simple one, but you should really check for things like “Is analytics already running and should I trigger another run?”

#!/bin/sh

/usr/bin/curl “localhost:8080/api/resourceTables/analytics?skipResourceTables=true&lastYears=2” -X POST -u admin:district >/dev/null 2>&1

On Fri, Apr 22, 2016 at 1:34 PM, Morten Olav Hansen morten@dhis2.org wrote:

Well, it sounds like a bad solution for me :wink: if every 10 min works fine… why not every 5 min… why not every 1 min…


Jason P. Pickering
email: jason.p.pickering@gmail.com
tel:+46764147049

Morten Olav Hansen

Senior Engineer, DHIS 2

University of Oslo

http://www.dhis2.org

On Fri, Apr 22, 2016 at 6:31 PM, Ibrahim Bayoh ibrahim.bayoh@sl.ehealthafrica.org wrote:

@Jason,Knut and Morten, Having analytics table run every 10 mins sounds like a good place to start. but am kind of not sure how to implement this with the API calls and am sure this is not possible through the user interface. If you guys can point me in the right direction with an example of some sort that will greatly helpful and highly appreciated.

Thanks.

On Fri, Apr 22, 2016 at 11:23 AM, Jason Pickering jason.p.pickering@gmail.com wrote:

No they are swapped now, so first they are built as temp tables, then swapped. So, this is the brief point in time in which things may not be available.

Every 10 minutes might be OK, depending on your server, loading, and amount of data. It just requires some experimentation.

Point is, “real time” analytics is not possible. Near-real time may be.


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp


Ibrahim Rashid Bayoh
Information Systems Coordinator,

eHealth Africa(Sierra Leone)

117 Wilkinson Rd, Freetown, Sierra Leone

Mobile: +232 88-765-638

ibrahim.bayoh@sl.ehealthafrica.org

http://ehealthafrica.org/

On Fri, Apr 22, 2016 at 1:03 PM, Morten Olav Hansen morten@dhis2.org wrote:

But I thought all analytic tables was cleared out during re-generation? is this not true?


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp


Jason P. Pickering
email: jason.p.pickering@gmail.com
tel:+46764147049

Morten Olav Hansen

Senior Engineer, DHIS 2

University of Oslo

http://www.dhis2.org

On Fri, Apr 22, 2016 at 6:00 PM, Knut Staring knutst@gmail.com wrote:

Depends on the data. I trigger analytics every 10 min.

Hi All,
Initially i thought DHIS2 had fully or out-of-the-box realtime analytics and Dashboards. But after further inspection i realized that current data entry will not available for analysis or dashboards until the next day or by manually doing export tables. This is a real bottle neck to the intended implementation am working on. I have looked at Scheduling, but the options for analytics table is not ideal in my case. Is there way this can be reduced or removed to gain real-time dashboards and analytics?

Thanks,

Bayoh.


Ibrahim Rashid Bayoh
Information Systems Coordinator,

eHealth Africa(Sierra Leone)

117 Wilkinson Rd, Freetown, Sierra Leone

Mobile: +232 88-765-638

ibrahim.bayoh@sl.ehealthafrica.org

http://ehealthafrica.org/


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

Jason P. Pickering
email: jason.p.pickering@gmail.com
tel:+46764147049

Just to clarify, I was -only- talking about -aggregate- data… tracker data is much more intensive… of course everyone want everything to be real-time, you will not see a push back about that from anyone… but if we ever go “real time”, we should def start with aggregate…

···

Morten Olav Hansen

Senior Engineer, DHIS 2

University of Oslo

http://www.dhis2.org

On Fri, Apr 22, 2016 at 8:08 PM, Jason Pickering jason.p.pickering@gmail.com wrote:

Hi Alex,

What I am saying here is the underlying data itself is not real time. If we were dealing with milli–second stock trades or telemetry data from a rocket stored as events, the need to aggregate data in real-time is critical. However, we are typically dealing, event in the best case, are events which may get reported several times a day. However, the event itself has already “aged” and is no longer real-time. As an example: The patient comes, samples are taken, sent to the lab, confirmed, and reviewed by a clinician. All of that takes time. The event then in the best case, get reported shortly there after. But what if it doesn’t? What if the internet is down? What if there is no one to report it?

Speeding up DHIS2 analytics is really easy. Buy a big powerful server, and call a shell script every half an hour. But, that is not necessarily going to improve the “real time” nature of the data. You have to look downstream for that. Ergo, having a system to aggregate non-real time data in real time, seems both pointless and non environmentally friendly to me.

What does make sense is to write a simple shell script to better meet your use case. Its easy, but I do not think it will really make the data more real time. It will make it appear to be more real-time and is really quite easily achievable.

Regards,

Jason


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

On Fri, Apr 22, 2016 at 2:58 PM, Alex Tumwesigye atumwesigye@gmail.com wrote:

Dear Jason,

I agree to not complicating things but as we look at IDSR module as a generic module with DHIS2. We may need to think twice about the whole process unless we want to keep it as a separate piece managed outside the main DHIS2.

Scripts can be written, I have no problem with that by the way but how many (of those using DHIS2) can write or manage simple scripts? If we make this complicated, we run a risk of people not using some of the modules.

Alex


Jason P. Pickering
email: jason.p.pickering@gmail.com
tel:+46764147049

On Fri, Apr 22, 2016 at 3:49 PM, Jason Pickering jason.p.pickering@gmail.com wrote:

Thing is, how long does it take for the event to actually get reported?

If it shows up on a person’s dashboard in ten minutes, but they are out having lunch, or it has taken 3 days for the event to be reported, what’s the use? Its no longer real time. Not even close to it.

I agree there are use cases where things needs to be sped up, and it can be very simply with a very small curl script. You can even check and see if analytics is running first just to be sure you do not trigger it again.

But lets not over-complicate things, and think about how real-time the data is which is actually being aggregated.

Regards,

Jason


Alex Tumwesigye

Technical Advisor - DHIS2 (Consultant),
Ministry of Health/AFENET

Kampala

Uganda
+256 774149 775, + 256 759 800161

Skype ID: talexie

IT Consultant (Servers, Networks and Security, Health Information Systems - DHIS2, Disease Outbreak & Surveillance Systems) & Solar Consultant

"I don’t want to be anything other than what I have been - one tree hill "

On Fri, Apr 22, 2016 at 2:43 PM, Alex Tumwesigye atumwesigye@gmail.com wrote:

Hi Knut,

We are not talking of hours here, dashboards needs to be updated in almost real time and outbreaks needed to be detected as data is entered (once thresholds are met). So 10 minutes as an option in the dropdown for analytics is ok for me and I would consider it ok. Adding it would make the system almost real time and this would cater for the IDSR requirement. A lag of 10 minutes for an update of dashboard on screen is ok.

I wish, it could be added to the system than doing it using the API given the reasons above.

Alex


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp


Jason P. Pickering
email: jason.p.pickering@gmail.com
tel:+46764147049

On Fri, Apr 22, 2016 at 3:27 PM, Knut Staring knutst@gmail.com wrote:

Hi Alex - I agree that IDSR requires “immediate” response - but I we are usually talking hours, not minutes, right?

Knut


Alex Tumwesigye

Technical Advisor - DHIS2 (Consultant),
Ministry of Health/AFENET

Kampala

Uganda
+256 774149 775, + 256 759 800161

Skype ID: talexie

IT Consultant (Servers, Networks and Security, Health Information Systems - DHIS2, Disease Outbreak & Surveillance Systems) & Solar Consultant

"I don’t want to be anything other than what I have been - one tree hill "

On Fri, Apr 22, 2016 at 2:23 PM, Alex Tumwesigye atumwesigye@gmail.com wrote:

Jason/Morten,

I agree that real time analytics would not be easily implemented. May be it is time to separate Aggregate and Tracker analytics as a start but it also depends on what the tracker is being used for e.g surveys may require no immediate analytics but IDSR/outbreaks may require instant update of analytics. We are looking at IDSR features and as I see, real time analytics will be a requirement. The IDSR requires real time analytics since people need to respond to outbreaks and handle outbreak responses and management in real time so that you can intervene and stop the outbreak.

Here is what I propose (just a thought from my discussion with Calle)

We set a configurable (checked) variable/attribute that indicates that if this variable is changed, the analytics process is started for example changing population data has a very big impact on indicators that depend on population,entering lab results or requests or patient updates for IDSR requires immediate analytics, etc. If we had this attribute/variable, then we would use it to identify the corresponding meta data that might need to be / may have changed and update (through temp tables) the only affected analytics tables. This way we can control the load required to run analytics in realtime since the system will only be updating affected changes.

Using the API to trigger analytics every after X minutes may be feasible but not sustainable as we do not control how many threads may be running as the API call through curl does not easily get feedback/update if the previous analytics process has been completed for the new one to start otherwise we can end up in a forever loop if the server resources are not enough.

Alex


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

Knut Staring

Dept. of Informatics, University of Oslo

Norway: +4791880522

Skype: knutstar

http://dhis2.org

On Fri, Apr 22, 2016 at 2:50 PM, Morten Olav Hansen morten@dhis2.org wrote:

1 ns is fine… but i depends on what the ramifications are… what if you start a new job every 1 ms? is that ok? what happens during table swap? is analytic blocked?

Maybe this is already documented… just curious :slight_smile:


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

Alex Tumwesigye

Technical Advisor - DHIS2 (Consultant),
Ministry of Health/AFENET

Kampala

Uganda
+256 774149 775, + 256 759 800161

Skype ID: talexie

IT Consultant (Servers, Networks and Security, Health Information Systems - DHIS2, Disease Outbreak & Surveillance Systems) & Solar Consultant

"I don’t want to be anything other than what I have been - one tree hill "

Morten Olav Hansen

Senior Engineer, DHIS 2

University of Oslo

http://www.dhis2.org

On Fri, Apr 22, 2016 at 6:48 PM, Jason Pickering jason.p.pickering@gmail.com wrote:

Why not ever 1 nanosecond? Its always going to take some amount of time, question is what is reasonable.

One could argue that real-time analytics in an aggregate data system is not needed. This is not real time data. Its not even close to it.

Look at Google Analytics. They provide reports once a day, and you do not see a whole lot of people complaning. Yes, you can get some limited real time information from this as well, but its limited. The data must be processed first, and that takes computational time. Same with DHIS2.

From my experience, people think they need “real time analytics” when they really are just in a rush. Data takes time to review and analayze and whether its available now, 1 nanosecond from now, or 10 minutes from now, makes no difference in the end, as the amount of time which is required to digest that information is on a totally different time scale (hours days or weeks). Once an hour is probably easily achievable depending on the scale of the system however.

As for the call to the API, just create a Bash script and call it as frequently as you like with a cron task.

This is a very simple one, but you should really check for things like “Is analytics already running and should I trigger another run?”

#!/bin/sh

/usr/bin/curl “localhost:8080/api/resourceTables/analytics?skipResourceTables=true&lastYears=2” -X POST -u admin:district >/dev/null 2>&1

On Fri, Apr 22, 2016 at 1:34 PM, Morten Olav Hansen morten@dhis2.org wrote:

Well, it sounds like a bad solution for me :wink: if every 10 min works fine… why not every 5 min… why not every 1 min…


Jason P. Pickering
email: jason.p.pickering@gmail.com
tel:+46764147049

Morten Olav Hansen

Senior Engineer, DHIS 2

University of Oslo

http://www.dhis2.org

On Fri, Apr 22, 2016 at 6:31 PM, Ibrahim Bayoh ibrahim.bayoh@sl.ehealthafrica.org wrote:

@Jason,Knut and Morten, Having analytics table run every 10 mins sounds like a good place to start. but am kind of not sure how to implement this with the API calls and am sure this is not possible through the user interface. If you guys can point me in the right direction with an example of some sort that will greatly helpful and highly appreciated.

Thanks.

On Fri, Apr 22, 2016 at 11:23 AM, Jason Pickering jason.p.pickering@gmail.com wrote:

No they are swapped now, so first they are built as temp tables, then swapped. So, this is the brief point in time in which things may not be available.

Every 10 minutes might be OK, depending on your server, loading, and amount of data. It just requires some experimentation.

Point is, “real time” analytics is not possible. Near-real time may be.


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp


Ibrahim Rashid Bayoh
Information Systems Coordinator,

eHealth Africa(Sierra Leone)

117 Wilkinson Rd, Freetown, Sierra Leone

Mobile: +232 88-765-638

ibrahim.bayoh@sl.ehealthafrica.org

http://ehealthafrica.org/

On Fri, Apr 22, 2016 at 1:03 PM, Morten Olav Hansen morten@dhis2.org wrote:

But I thought all analytic tables was cleared out during re-generation? is this not true?


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp


Jason P. Pickering
email: jason.p.pickering@gmail.com
tel:+46764147049

Morten Olav Hansen

Senior Engineer, DHIS 2

University of Oslo

http://www.dhis2.org

On Fri, Apr 22, 2016 at 6:00 PM, Knut Staring knutst@gmail.com wrote:

Depends on the data. I trigger analytics every 10 min.

Hi All,
Initially i thought DHIS2 had fully or out-of-the-box realtime analytics and Dashboards. But after further inspection i realized that current data entry will not available for analysis or dashboards until the next day or by manually doing export tables. This is a real bottle neck to the intended implementation am working on. I have looked at Scheduling, but the options for analytics table is not ideal in my case. Is there way this can be reduced or removed to gain real-time dashboards and analytics?

Thanks,

Bayoh.


Ibrahim Rashid Bayoh
Information Systems Coordinator,

eHealth Africa(Sierra Leone)

117 Wilkinson Rd, Freetown, Sierra Leone

Mobile: +232 88-765-638

ibrahim.bayoh@sl.ehealthafrica.org

http://ehealthafrica.org/


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp


Mailing list: https://launchpad.net/~dhis2-users

Post to : dhis2-users@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-users

More help : https://help.launchpad.net/ListHelp

It looks like a shadow release add support for this 2.33.1
Does someone knows about it ?

1 Like

Hi @Stephan_Mestach,

We will introduce this in 2.34, the feature won’t be back-ported. More information will come with the release notes :slight_smile:

2 Likes