Tuesday, March 12, 2013

A Definition of "Big Data" for Health Care Providers and Five Useful Caveats

And you thought its only
function was to be an EHR?
Regular readers of the Wall Street Journal probably saw the Monday March 11 "big data" section that was filled with articles like this.

Written from a "business intelligence" perspective, there were precious few insights for the population and care management community. We're aware of the concept, but how, asks the Disease Management Care Blog, does it apply to our corner of the health care delivery system?

Unable to resist, the DMCB donned its snorkle and flippers and took a deep dive at the topic.

First off, when the DMCB performed a classic medical literature search using the key words "big data," it found that that the term has not entered the health care lexicon in a big way. Academics instead prefer to write about "registries," "data warehousing" and "predictive modeling." The DMCB also looked for a standard health care definition of "big data" and could find none in the published medical literature.

So, the DMCB offers up its own definition, culled from papers like this and this:

Health care "big data" is a branch of health care informatics that pools large and disparate data sets and applies a suite of mathematical approaches that derives associations, facilitates comparisons and generates insights that are otherwise not possible using standard mono-source analytics. It includes, but is not limited to, reporting, dashboards, ad-hoc queries, graphical displays, scorecards, predictive modeling, data mining and business intelligence. The data sets can be comprised of EHR data, insurance claims, pharmacy utilization, care management systems, consumer as well as government information, public health, surveys, point-of-contact information and web-usage.

The DMCB's simplistic off-the-cuff examples of big data queries include examining 1) the association between "hits" from a cluster of ISPs on an emergency room's web page and ER utilization, 2) complaints about a hospital's food service from family and the likelihood of being named in a malpractice suit, 3) looking for rare side effects among persons with a cluster of medical diagnoses who are using a just-released drug and 4) whether the number of household flat screens is a useful predictor of obesity.

Five DMCB caveats:

1. One data integrity trumps five Ph.Ds: The chief challenge is not the mathematics but combining and aligning the various databases.  Once the information is teed up, it's amazing how much can be done by a masters-level statistician and a desktop PC.

2. Associations, not causality. Whether a web page leads to ER visits or whether bad food fuels dissatisfaction is a different question.  It's possible that ER visits prompt web usage or that already dissatisfied patients find overcooked string beans icky. All the possibilities are still useful insights.

3. Not a panacea: It's "a" tool, not "the" tool.  Users will still need to also invest in faster, better and cheaper mundane data tasks (like admissions per thousand) while they simultaneously understand how big data's associations, comparisons and insights generate additional patient value.

4. Journey, not destination: There's a potent mix of art, science and wizardry in the evolving science of "big data."  There are no standard methodologies or best practices.  Get used to it.

5. Skepticism abounds: Data stakeholders who are used to standard analytics will refer, as the DMCB found out, to "big data" as "voodoo," and resist buy-in.  If a critical mass of an organization's leadership comes to believe it's useful, the rest will follow... eventually.

Image from Wikipedia

No comments: