When people I respect and admire make a book recommendation, I read it.
So when Ian Heller of HD Supply White Cap mentioned an important book I ought to read, I committed to reading it.
Then I found out the title: Big Data by Viktor Mayer-Schonberger and Kenneth Cukier.
I did not want to read Big Data by Viktor Mayer-Schonberger and Kenneth Cukier.
Nothing against them, but all the online chatter of Big Data has been trending like the latest Katy Perry single. So much hype.
Sure… Moore’s Law (computer processing power doubles every 18 months) is ridiculously on-target after nearly 50 years making large-scale computing cheaper than ever.
Fine… the rise of sensors on everything from jet engine fan blades to my Nikes exponentially increases the amount of data we can collect.
OK… culturally we are voluntarily sharing exponentially more of our personal lives on Facebook (collectively we are posting about 30B tidbits each month), Twitter, Instagram, WordPress et al.
Small Data still regularly confounds companies.
Give most companies today 1500 customer satisfaction surveys and you’ll be lucky if anyone actually reads them, let alone develops actionable responses to them. And then even fewer that implement the changes in the business.
And then I read an article in Fast Company about data guru Nate Silver. Here’s what he said about the promise of Big Data:
“But I don’t see it to be as much of a paradigm shift as some people think,” Silver says. “People sometimes get the idea that you put all this data into a machine, and you press a button and out come miraculous ideas that help your business make a quick 10% profit margin every year and your share price will double.”
So it was with some uneasiness I dove into Big Data. The uneasiness disappeared as I began to understand the bigger picture of Big Data.
It’s more than Moore’s Law.
It’s more than Amazon Web Services hawking a sliver of their server farms.
It’s more than your high school friend constantly updating Facebook about her amazing husband and her amazing kids and her amazing new swim suit for her amazing country club.
It’s about a fundamental shift in the way we will make decisions.
Here are 3 key takeaways….
Correlation supersedes Causality.
Big Data cited another Libro 52 selection when it noted from Thinking, Fast and Slow by Daniel Kahneman how our brains are lazy. We frequently “see” imaginary causalities when they don’t exist.
“Don’t confuse Correlation with Causality.”
Just because things are correlated doesn’t mean they are causal.
We need to know why A caused B. Otherwise, you’re guessing, the cliche implies.
Not in a Big Data world.
Patterns and correlations with a few hundred million data points can tell us an awful lot without Causality. We may not know why something is happening, but we can be darn sure it is.
A simple example is Big Data websites that accurately predict the best time to buy an airline ticket. You won’t know what the clowns running the airlines are thinking (or if they are thinking), but if 750M data points with a 98% accuracy tell you that flight to Naples will cost more if you wait any longer, just buy the ticket and move on with your life.
The data has spoken.
Listen to it.
Sampling precision improves with randomness, not with larger sample sizes.
Biases in the way we collect data can impact its accuracy. If this is the case, the larger the sample size, the worse the extrapolated results will be. You’re potentially just collecting more of the wrong thing.
Not the case with Big Data when N=all.
You can have the whole data set.
You have all the data – and all the randomness – you need.
Big Data is messy. Deal with it.
It’s imperfect. The authors mention numerous stories to illustrate this point, but this is my favorite. The Consumer Price Index (CPI) is developed by the Bureau of Labor Statistics and is used to calculate the inflation rate. Without going into details, let’s assume it’s important our government correctly understands this stat.
The Fed, Wall Street, and your salary – among other things – all depend on this number. Hundreds of employees in 90 cities keep tabs on the prices of things (80,000 of them) and report them monthly. The CPI effort costs $250M annually.
When it comes out, the data is already old. Not necessarily wrong, but not necessarily current.
So 2 nerds at MIT used Big Data to do better. They developed software to crawl the web and collect 500K price points everyday. The data is messy. Not all the products are comparable. An algorithm gathers the info, not government employee Nancy Johnson from Toledo.
But… after the collapse of Lehman Brothers, the MIT duo saw deflationary activity immediately. They reported it to the government and banks in September 2008.
The government’s CPI saw it in November.
As you may recall, a lot happened in those 60 days.
Big Data closes with a brilliant line: “Because we can never have perfect information, our predictions are inherently fallible. This doesn’t mean they are wrong, only that they are always incomplete. It doesn’t negate the insights that big data offers, but it puts big data in its place – as a tool that doesn’t offer ultimate answers, just good-enough ones to help us now until better methods and hence better answer come along.”
Bottom Line: Great read. Important book for understanding a force that will shape business and life in the coming decades. If nothing else, know there is little correlation or causality between Big Data and Katy Perry.
Bradley Hartmann is El Presidente at Red Angle (www.redanglespanish.com). He’s on pace to read 52 books in 2013, but he’s getting tired.
39. Big Data by Viktor Mayer-Schonberger and Kenneth Cukier.
Categories: Libro 52 Challenge