Big Data’s obscurity is coming to an end; its importance represented by the sheer quantity of information stored, analyzed, and utilized for an organization’s strategic purposes. Interpreting the data requires exhaustive measures to ensure it is being translated accurately, stored properly, and utilized appropriately, especially considering the “fire hose rate” with which data spills into the database.
Google consumes more than 1 petabyte per hour, the equivalent of streaming a 4 minute, 1MB song continuously for 2,000 years! Likewise, storing 1PB of data as a hardcopy would require 223,000, 4.7GB DVDs! These numbers are staggering, especially considering the modern age of technology has less than 100 years’ experience, while the Rosetta Stone – arguably the world’s first advanced form of “data storage” – was written roughly 2,210 years ago on a stone tablet.
Tablets have come a long way since then, but data housing hardly advanced during that 2,000 year window, until suddenly in the course of 1.5 generations society developed a need for storing, manipulating, and utilizing an insurmountable flow of data.
The Undeniable Similarity Between The Rosetta Stone and Big Data
What makes this Rosetta Stone-Big Data comparison compelling is that, in the most primitive form, the Rosette Stone was the world’s first Big Data solution – it stored three separate languages with a unified message so that interpreters could decode its message at a future date.
When broken down to this basic definition, the Rosetta Stone and Big Data share a common objective – to aggregate multiple languages for a single output. For the Rosetta Stone, that output was its intended message for the populace, while Big Data’s output is comprehensive research.
Previously, research in “Small Data” was conducted via isolated surveys, censuses, and other targeted, non-inclusive samples to make grandiose market predictions. But as our capacity for reaching broader audiences (i.e. social media) grew in correlation with Moore’s Law, so too did our capacity for acquiring and storing data. However, technology has evolved faster than our own evolution, requiring us to develop even more complex systems to decipher the flood of data we could not possibly interpret ourselves.
As Kenneth Cukier, Data Editor for The Economist, said on his “Big Data Is Better Data” TED presentation, “We still store information on disks, but now we can store a lot more information. Searching, copying, sharing, and processing [the data] is easier. The data has gone from a stock to a flow, from something that is stationary and static to something that is fluid and dynamic.”
Big Data’s Fundamental Problem – The 3 V’s
Imagine trying to drain a swimming pool through a hole the size of a quarter, but a hose ten times the size of the quarter is refilling it at a rate infinitely faster than the pool can be drained. With the ubiquity of the internet, data is everywhere and most organizations simply don’t have the technology to gather the essential information, translate it, and interpret the results before more data floods the database. This problem is commonly known as “the three V’s”:
Variety: Data comes in structured, semi-structured, and un-structured schema, or code. Data commonly streams in a schema incompatible with the database’s schema, making alignment of the data with the database impossible once it has already entered.
For the Rosetta Stone, Variety simply meant the 3 common languages of the time – Greek, Demotic, and Hieroglyphic. Separated into 3 portions, the Rosetta Stone conveyed its message to anyone whose literacy encompassed one of those 3 language, but left out the majority of the illiterate populace. Big Data, on the other hand, takes language of all sorts – be it code, binary, or lingual – and translates it through a data mart for future use.
Velocity: As mentioned previously, data is being collected at an overwhelming rate. Now, with so many users and even more devices, improved storage capacities and data translation methods must be conceived before the data pool gets out of control.
Since the Rosetta Stone had a singular purpose – to convey how humble and righteous the Pharaoh was – there was no need to aggregate Tweets or Facebook posts (i.e. data) on the population’s opinion of this announcement, let alone the fact that there was no digital means for a commoner to voice his or her authoritative opinion.
Volume: Big Data solutions need to store hundreds of terabytes to compensate for the incredible rate with which data is being collected, and they must be easily expandable and accessible across multiple systems.
Similar to the above, there was no Volume of data coming in regarding the Egyptian Priests’ March 27, 196 BC blog post on Ptolemy V’s righteous ruling. However, in terms of storing the data, Big Data has a significant advantage over the Rosetta Stone, as Cukier illustrates, “The [Rosetta Stone] is heavy, doesn’t store much, and is unchangeable. By contrast, all the files Edward Snowden stole from NSA fits on a disk the size of a fingernail and can be shared at the speed of light.”
By querying the database to filter for the appropriate variety of information, organizations can scan the overwhelming volume of data, match the velocity of data intake from billions of devices, and collect a reasonable sample of applicable answers. This gives organizations who effectively manage their Big Data intake a competitive advantage by understanding a broad sample-size of consumer opinion for considering their next business move.
Although comparing the Rosetta Stone to Big Data makes for a fun narrative, Big Data has taken us far beyond inscribing messages in stone. Modern enterprises require a solution that houses the endless streams of internet noise, translates its cryptic message, and readies the data for future utilization.