I recently read an article, Big Data Cracks Top of the Pops and it’s upsetting the Bookies, which made me think that the hype around Big Data is a little out of control.
The set up
Two guys from Brisbane, Tom and Nick, used social media data to predict the outcome of the Triple J Hottest 100. They posted the results on warmest100.com.au.
Triple J is an Australian radio station targeting 18-30 year-olds and the Hottest 100 is an Australia Day tradition, where listeners can vote on their favourite songs which are collated to produce the Hottest 100. Voting took place from the 19th of December 2012 to the 20th of January 2013.
The story
Nick and Tom analysed 35,081 tweets and Facebook comments to determine their predictions. Their analysis of this “Big Data” was expected to be 90% accurate (they did actually predict 92 out of the 100).
The truth
Nick and Tom used Excel to analyse the data.
My understanding of Big Data is that it’s founded on the three V’s: Velocity, Volume and Variety.
- Velocity: 35,081 tweets and Facebook comments over a month is not rapidly expanding data. It’s easily managed in a simple relational database or even Excel.
- Volume: 35,081 is not beyond Excel’s capacity to process.
- Variety: the data was structured – it came from two sources only, Twitter and Facebook.
This just isn’t Big Data.
The lesson
The hype around Big Data is so huge that people are starting to see Big Data where there is none. Just because it involves Social Media, doesn’t immediately mean it’s Big Data. You don’t need to know much about Big Data to know when people are either trying to fool you or just wrong. Just before you choose to believe it’s big data, scratch the surface a millimetre.





“Like”
People also seem to forget: Big Data, Medium Data, Small Data… without good (and unbiased) analysis it’s all just data.