[OPENMRS-DEV] Pivot Agent

Knut_Staring · 26 February 2010 21:44

Interesting summary on issues around reporting from OpenMRS which are relevant to the DHIS community as well. Bob and Ryan has responded to that list with a status update on the OMRS-DHIS integration using SDMX.

···

---------- Forwarded message ----------
From: Justin Miranda justin@openmrs.org
Date: Thu, Feb 25, 2010 at 7:53 AM

Subject: Re: [OPENMRS-DEV] Pivot Agent
To: openmrs-devel-l@listserv.iupui.edu

Andy

Thanks for bringing this up again. The short answer is: no.

The long epic answer is that we don’t really have a clear path at the moment, but there are lots of off-the-shelf solutions out there and probably enough interest amongst the community to explore those solutions collaboratively. There’s a “data warehouse” project somewhere on the OpenMRS roadmap, but unfortunately no resources or time to work on it, so I think it’s a matter of getting a bunch of implementers and developers together to work on it. And not to burst anyone’s bubble, but the “one-size-fits-all” data warehouse solution probably doesn’t exist. There are far too many complex reporting use cases that need to be solved. So, in most cases, the best we can hope for is being able to collaborate and share some ETL code/scripts, some dimensional data modeling techniques, and a whole lot of best practices.

With that said, I’m really excited about working on a DWH project in the future. I have spent more time than I care to admit researching and evaluating different data warehousing concepts and solutions. I was hoping to get a better understanding for when/if we try to roll our own within the OpenMRS, but while I’ve pushed to build this kind of tool within the reporting framework in the past, I think it would be the least ideal solution in the long run. Therefore, I think implementers should work with developers to explore some of the ETL / data warehouse solutions that exist (I’ve noted some of the better open source systems below).

With respect to solving this within the reporting framework, we’ve already tried to roll our own flat data model solution with the Data Export tool that Ben wrote. And while the data export tool is simple, elegant, and somewhat expressive, it doesn’t scale well over medium to large data sets (on the order of a few 1000 to 10,000 patients). Mike, Darius and I have spent a considerable amount of time talking through and trying to design a de-normalized (“flat model”) solution within the reporting framework that is scalable. But unfortunately, we’ve realized over and over again that the scalability issues are too difficult to overcome with a homegrown Java-based solution.

For one, a transactional data model will never perform well when we scale to large datasets and require queries with lots of joins, self-joins, and sub-queries. And secondly, even if the time to pull the data was reasonable, executing the complex logic that our reports require (at the Java layer) will also be a performance bottleneck. The logic service will work well for individual patients, but not for large cohorts of patients. The idea is to add a cache to help improve performance, but I don’t think anyone would argue that this would be an ideal solution. That’s not a flaw within the design of the logic service, but rather a consequence of using a normalized data model (especially one that uses EAV to the extent that we do). And there’s not much we can do within a Java application to improve the performance. The fact is, reporting on a transactional system just doesn’t scale well. And there’s no amount of code we could write to solve those problems – in fact, we’ve realized that we’d essentially be re-writing a database system if we tried.

That doesn’t mean we’re not going to provide some essential tools for flattening the data – like a simple front end to the logic service that allows users to execute a set of logic rules for a set of patients. Or a module that runs ETL scripts to transform data into a de-normalized/dimensional data model (both of which have been on my back-burner for a while). Those tools will eventually be developed (I hope), but I think our whole notion of reporting needs to be turned on its head and we need to re-focus on designing / building data warehouses for specific implementations. It seems clear to me that the answer is (1) we need to stop trying to boil the ocean, building our own generalizable frameworks and (2) we need to work together on solving collective reporting problems through the use of existing BI tools, and then sharing our successes back to the community.

I’m not exactly sure what your specific end goals are, but it’s probably safe to say that ALL of the tools below should be included in your reporting solution.

* OpenMRS reporting framework for indicator reports, exportable data sets, and simple patient lists (like data quality reports).

* DHIS 2.0 (plus an OpenMRS indicator export tool) for storing/querying aggregate data.

* Business Intelligence suites (like Pentaho) to provide all aspects of reporting (dashboards, ETL, OLAP, ad-hoc query tools, etc).

* Column-oriented databases (*) like InfiniDB, InfoBright, MonetDB, LucidDB.

Unfortunately, right now, the first bullet is the only reporting solution that is actively being worked on. As a community, we need to realize that this is not going to be solved by developers alone. I think we need to put our heads together and try to work toward robust DWH solutions for specific implementation that integrate existing tools in ways that can be replicated, rather than trying to continue to roll our own generalizable reporting frameworks that try to solve everyone’s problems.

Apologies for the long email – thanks to those that actually read this far.

Justin

(*) http://en.wikipedia.org/wiki/Column-oriented_DBMS

Andrew Kanter wrote:

Folks, I have mentioned this before, but am running into this issue again and again… and I am not yet familiar with the ins and outs of the new reporting framework… However, have we considered an overall agent which would produce separate relational tables which can be used for reporting? What we need to do is have these created in a separate mysql DB for permission purposes (only allow ODBC access to de-identified data, for example). The pivoting from the EAV database to flat file tables is critical, and eventually we also want this as a way to create a centralized data warehouse.

Any further ideas? Hamish, is this where you are going with the reporting framework? We are still on 1.4.6 (avoiding the GUIDs so we can still work directly with the DB).

Thanks!

Andy

Andrew S. Kanter, MD MPH

Director of Health Information Systems/Medical Informatics

Millennium Villages Project, Earth Institute, Columbia University

Asst. Prof. of Clinical Biomedical Informatics and Clinical Epidemiology

Columbia University

Email: andrew.kanter@dbmi.columbia.edu

Mobile: +1 (646) 469-2421

Office: +1 (212) 305-4842

Skype: akanter-ippnw

Yahoo: andy_kanter

Click here to unsubscribe from OpenMRS Developers’ mailing list

–

Justin C. Miranda

Software Developer

Partners In Health

www.pih.org
Stand With Haiti:
www.standwithhaiti.org

**
Please donate now to support our earthquake relief efforts in Haiti**

Click here to unsubscribe from OpenMRS Developers’ mailing list

–
Cheers,
Knut Staring