The Truth About Big Data: You Probably Don’t Have It
Something has been bothering me about all the hype around the concept of “Big Data” and, finally, I’ve figured out what it is. While everyone seems to be worried about it, the truth is that very few companies actually have it.
What Really is Big Data?
In this case (although not all cases), Wikipedia has a pretty good definition of big data: In information technology, big data is a loosely-defined term used to describe data sets so large and complex that they become awkward to work with using on-hand database management tools).
For the most part, big data is data generated by machines and programs. It includes things like social media data (like Twitter tweets & Facebook posts), website generated data (like web logs), and machine generated data (like intra-second readings taken by industrial equipment).
So, Do You Have “Big Data” or “Just Plain Data”?
Look at that definition again. The definition talks about size and complexity. It doesn’t talk about data source or data type. In other words, there is a difference between “big data” and just plain data. Big data requires specialty, rapidly evolving technologies such as Hadoop, Big Table, MongoDB… Just plain data is something you’re already working with in your relational databases, flat files, etc.
Even if you have machine-generated data, Twitter data… AND you have a business requirement to analyze it, you still likely don’t have big data. Put differently, big data is NOT a kind of data, it’s a quantity of data. If you don’t have super massive quantities of social media data or machine readings, your existing technologies (or some industry standard extensions of them) can probably meet your needs today.
So, for example, if you have to analyze a million tweets a data, you probably have big data. If, on the other hand, you have to analyze a million tweets a year, you have just plain data.
If you’re running a bank of 100 milling machines that each take 10 quality control readings every second, you have big data. If, on the other hand, you’re running a set of 100 candy vending machines, each of which captures details about 50 transactions per day, you have just plain data.
I Can See the Hype But, Can I See the Use?
No, I’m not about to say that big data is really all about marketing hype. It’s probably only 45% about marketing hype, 45% about bit heads (such as myself) who think the concept is cool and, therefore need to find a nail for our hammers, and about 15% about business need (please excuse rounding errors here).
I do see a ton of press about big data (when something gets multiple articles in The Wall Street Journal, it’s hit the big time). I also see a ton of vendors professing big data strategies. I also see a ton of data guys desperately wondering about how to handle big data. I’ve, even, seen some decent use cases.
All this is great but, as in all things BI, start with the user. Figure out what they want to do, what they want to analyze, what they need to grow and run the business. THEN figure out how to deliver it.
You might actually have big data but my guess is that, in the vast majority of cases, you have just plain data. Now go and do something amazing with it.