It has been said that the Internet is like all the world’s books gathered in one place, then thrown indiscriminately on the floor. Add to this pile the even greater amounts of knowledge hidden behind locked doors, dumped into disparate databases that nobody oversees, or stacked haphazardly in file cabinets or on store counters. Then multiply it by the actions of billions of human beings doing things in the real world every minute of every day.
Such is the promise and peril of big data computing.
Maybe that’s why IT spending on big data in the U.S. was predicted to hit $34 billion in 2013, according to a Gartner report last September, and 64% of the companies surveyed had made such investments (the number is expected to jump in 2014), but there’s a catch: Gartner also revealed that only 8% reported had deployed any big data technology, and 15% of respondents said they were still trying to figure out what the term means, even as they’re committed to spending on it!
This reminds me of the early days of CRM, during which execs invested many millions in software with the expectation that something miraculous would pop out when they turned it on. Books were written and careers made on how to “do” CRM correctly, but it took a good decade of hard work before it morphed from tech fad to business reality (and it’s still evolving).
Similarly, there’s no shortage of buzz about big data, usually peppered with just enough geekspeak to keep anybody but the initiated from understanding what’s going on. Big data is an idea that’s “out there,” and all it takes is smart open source software like Hadoop to corral it. For that matter, Google already indexes the Internet, so big data is really about apps, or consumer UI like Siri.
And yes, it would also be great if a million people randomly put into a choir could sing in total, pitch-perfect harmony at a moment’s notice. But that doesn’t happen.
Back to the promise and peril thing: The challenge of getting data — usually qualified as “structured,” which means it comes in formats ready-made for apples-to-apples comparisons, or “unstructured,” which is pretty much everything else in existence — and then correlating it with an incessant stream of updates from everyday experience is, well, daunting in an immense way that should quash any buzz that suggests otherwise. Add on top of that technical challenge the strategy issues of formulating hypotheses and analyzing results, and then the managerial challenges of translating insights into operations, and you get some clarity on what the “big” in big data actually means.
Doing even a little implementation takes a lot of work, and it’s work that has to get done sometime, somewhere. A project that’ll matter to a business probably isn’t as simple as getting restaurant rankings on a smartphone. It takes real problem-solving.
That’s why IBM’s Watson announcement last week could be a game-changer. The company has created a stand-alone group to develop big data solutions, funded with a $1 billion and staffed with 2,000 employees, and moved the entire shebang to a hip location in Manhattan’s East Village. Its very existence will change conversations about big data, even those with customers in which Watson doesn’t participate, and it will be hard particularly for big clients to believe promises of the magic of the crowd when Big Blue is willing to map out (and deliver) the work behind it. Watson could also take its demonstrated prowess at winning Jeopardy! and make some wildly novel or meaningful insights into newsworthy, topical issues, thereby demonstrating both the promise and peril of big data reality, not its buzz. The marketing should be loads of fun, and every company in the space will benefit from the sales effort Watson puts into explaining technical terms to its business customers like “data mining” and “machine learning.”
For instance, on the other end of the size spectrum from IBM is a company called SwiftIQ, which can aggregate, unify access and visualize hundreds of millions of data records on-demand, then yield machine learned insights such as product recommendations, and data mining insights like probabilities that shoppers will buy certain products together. Deriving product affinities is called Frequent Pattern Mining, and such solutions are truly big not only because they involve vast amounts of data — a typical grocery store might have 50,000 SKUs and 40-75,000 items sales daily — but SwiftIQ has done the heavy lifting to make them available to any retail customer. If these customers are better informed, they might require shorter sales cycles and yield more deals.
Only time will tell which companies succeed as enablers of big data solutions, but it’s certain that every business will change because of them. There are still a lot of customers waiting for not only a better explanation of the potential, but useful implementations that yield business results. IBM’s Watson announcement is good news for everyone involved.
That choir might learn how to sing together after all.