Big data and banana prices: correlation is not knowledge

I recently came across a fantastic website called Spurious Correlations. On it you can pick from myriad of random variables and see how they are correlated with other random variables.

The idea, of course, is to point out the ridiculous. For instance, I learned that between 2000-2009 there is an almost perfect correlation between US spending on science, space and technology and US spending on pets. My friend Bill noted the site reinforced his hunch that lower margarine consumption significantly improved your chances of staying married in Maine.

However, as should be obvious, correlation is not the same as useful information. Having a lot of data doesn’t necessarily tell you anything worthwhile without further analysis. And yet every new healthcare startup now possesses an obligatory slide in their deck about the massive amount of data their product will generate and the promise that said data will generate a wealth of knowledge (can you hear the angels singing?) for which people will line up to pay. What no one ever says is data doesn’t become knowledge without a lot of patient volume, detailed analytics and hard work to make that data of use to anyone.

Suennen col chart 1
(Click image to enlarge)

I am concerned much of the data collected in these products and the manner in which they are used will lead to bad decisions. To illustrate my point, I offer you this from the Spurious Correlation website: the high correlation between deaths from heart catheterization (e.g., angioplasty) and the amount of US crude oil imports from Saudi Arabia.

From looking at the chart above, one might think that FDA regulators should focus heavily on reducing crude oil imports, as it might just lead to fewer people coding on the table during critical heart procedures.

Another Spurious Correlation example: there is a .83 correlation between the cost of bananas and the number of people who died by becoming tangled in their bed sheets. Does this mean we need a new wearable that detects banana pricing when you enter Safeway and, when such pricing peaks, gives you an alarm at night when your sheets become untucked?

Suennen col chart 2
(Click image to enlarge)

I was enjoying the endless possibilities of Spurious Correlation when I happened to read an article in BetaBeat called VC Names Robot to Board of Directors.

The article said, “Aging Analytics UK, which conducts research on biotechnology and regenerative medicine, made two announcements: first, it launched an new A.I. tool called VITAL (Validating Investment Tool for Advancing Life Sciences); second, it licensed VITAL to Hong Kong VC firm Deep Knowledge Ventures, where the tool will become an “equal member of its Board of Directors.”

So, yikes. What is particularly interesting (scary?) is a VC firm is going all in on the big data thing to predict positive outcome on investments in biotechnology.  A partner at the firm was quoted in the article as saying, “We were attracted to a software tool that could in large part automate due diligence and use historical data-sets to uncover trends that are not immediately obvious to humans that are surveying top-line data.”

Now, Deep Knowledge Ventures’ spokesman says humans will still participate in investment decisions and the best decisions will be based on a combination of data and intuition, but I wonder. It is difficult for people to say “yes!” in the face of big data that say “No!” even when the data is of dubious provenance. It takes a lot of intestinal fortitude to disregard data, which is, in many ways, exactly what VCs are supposed to be doing — it is tough to make transformational investments when you spend your time looking at historical data; it often points you in the direction of either history or, worse, incrementalism. To disrupt history you have to decide that facts are somewhat inconsequential and take a leap of faith. Overreliance on data, big or otherwise, makes faith very inconvenient.

Humans are humans (even when they’re VCs), and often they can’t help themselves but to follow the data down the wrong-colored brick road. When asked whether their robot partner would be incorporated into board of director meetings, the folks at Deep Knowledge Ventures said, “… investors will firstly discuss the analytical reviews made by VITAL (aka the big data robot engine). All the decisions on investing will be made strictly after VITAL provides its data. We say that VITAL has been acknowledged as an equal member of the board of directors, because its opinion (actually, the analysis) will be considered as probably the most important one.”

And thus, intuition is left by the wayside.

In medicine it is essential to keep a balance between data and intuition (the patient’s and the provider’s). Without data, we would still be bleeding people to treat them. Clearly there has been a vast world of data created that, when viewed in proper context, tells us better ways to take care of people. We tend to name this phenomenon “evidence-based medicine” and seek to codify guidelines to ensure that everyone is using the highest, best level of knowledge to commonly treat heart diseases, asthma, and similar conditions.

When we get to even more complicated diseases, such as cancer in its many forms, it seems there can never be enough data to help us make good decisions about the optimal mix of drugs and chemo and radiation. And yet, all of the data in the world can’t tell you what the patient thinks is in his or her own best interest—that is where the human touch is still needed.

We sit at an interesting crossroads where we are strapping people to sensors and collecting data. Some are still trying to figure out what to do with all that data while others think all data should be monetized.

We know for a fact (from the data), that most of the sensors are only sort of accurate now. We know context for data is essential but rarely available. We know we need a mountain of clean data to establish good evidence and help us discern which way to send a patient through the healthcare maze. And, because of this, we know much of the data now being collected has the same predictive value as banana pricing.

We are going to have to do a lot better than the evidence-based guideline of “beware of your bedsheets” if we are going to improve medicine through mass data aggregation and analysis. Let’s hope our new robot overlords will provide enough venture backing to let entrepreneurs find the knowledge in the data and overcome the scourge of inflationary banana pricing.

Lisa Suennen is Managing Partner at advisory firm Venture Valkyrie. Follow her on Twitter @VentureValkyrie.

Note: Charts used with permission per the Creative Commons license on the Spurious Correlations website.

Photo courtesy of ShutterStock