Information Overload – The True Cost of Data

In the last few years we have produced more data than in all of human history. We live our lives constantly producing a stream of data, it controls our lives, not in a Matrix or Skynet kind of way, but every time we interact (text, call, tweet), conduct a transaction, perform an internet search, complete a national census or even simply give birth or die, you are creating data and contributing that in the right hands is valuable and powerful tool. A recent EMC study claims that less than 1% of global data is actually analysed.

With smarter data mining algorithms and more powerful computers, data mining on a mega scale is becoming a reality, an era of Big Data. This is a dataset so large that it holds many potential secrets about us and the universe beyond. We like to think that as individuals we are in control of our own destinies, and on an individual scale I believe we are. But on a larger scale, human behaviour has shown to become regular and predictable. Discovering patterns in a chaotic dataset of human behavior is becoming a reality and its potential is beginning to be exploited. It is being used everywhere from the world of finance, in determining what commodities and stocks will be effected by world events, to the routes driven by police on the beat to increase their chances of encountering crime. Google is already able to predict who you are, targeting specific adverts to you based on your search results, interests, email keywords, check ins etc.

John Graunt’s publication – Bills of Mortality

Big Data is hardly a new idea; it is essentially searching for patterns in sets of data and turning it into useful information. This has been done since the days of the Bubonic Plague when in 1662 John Graunt analysed masses of public data on death records to track and thus predict the spread of the plague. Tracking the spread of a modern epidemic in real time is now well within the realms of possibility, add to this other data such as flight records, seasonal tourism, bird migratory patterns and you have a huge potential untapped database that is ready to be exploited.

Big data is an integral part astronomical research too. Astronomy has traditionally been a science founded on data hunting, ever since the Babylonians recorded star positions and predicted their patterns, distinguishing stars from planets. Vast amounts of data are now gathered from telescopes and satellites worldwide. For example, the Radio Jove project is adding to all the data recorded on Jupiter and waiting to be utilised. But this is just a single planet in our solar system, a planet we have been observing for centuries. Looking deeper into space (and there is a lot of it!), requires a lot more searching, the Alma telescope array in Chile, for example, will record 30 Tb of data per second! The challenge will be to organise and make use of this data and hopefully gain a lot more detail about each one of those stars and beyond what the Babylonians tracked across the skies millennia ago.

Scrambled data anyone? Humpty-dumpty before the great fall was low entropy, If only all the kings horses and all the kings men could piece humpty-dumpty together again

But what is information? Information is real. It is a subtle concept; inherently it is order from disordered data and subject to the laws of physics as anything else. What does this mean? It means that it is an inseparable part of the physical world. Here’s where the science comes in. Science says that over time the disorder of the universe increases, its entropy. Entropy is a measure of chaos, the degree of disorder. The entropy of the universe is always increasing since the moment of the big bang. It’s the reason why a hot cup of coffee doesn’t get hotter. It would be absorbing heat from its surrounding, making its surroundings colder and decreasing the entropy. This would make the molecules more ordered, intuitively we know this is ludicrous. Now picture a carton of eggs falling and smashing, if I were to say that it is possible to put the eggs back together given nothing but information on the eggs, their spatial coordinates in relation to a fixed point, it would be difficult, but given enough data, certainly possible. With large enough dataset, perhaps.

This is similar to a scenario first dreamt up by James Maxwell in a famous thought experiment that hypothetically violated the second law of thermodynamics. He thought about a container full of gas molecules at equilibrium, divided into two sections by an insulated partition. There is a small window that can be open and closed (by Maxwell’s Demon) to allow the faster than average molecules to the right side of the partition and let the slower ones move to the left. This will over time mean one side of the chamber heats up while the other cools down, since average molecular speed is proportional to temperature. Ordering the system – decreasing its entropy, done with nothing more than the information about the velocity of the molecules and seemingly violating the second law of thermodynamics.

Maxwell’s thought experiment, sadly he didn’t live long enough to see his demon exorcised

Now back to the carton of eggs. Think of the enormous amount of information, how many bits, you need to describe the smashing of a carton of eggs. To describe every interaction between the trillions of atoms requires an unimaginable amount of data. This has to be stored somewhere, be it a piece of paper, hard drive or sequence of knots in a rope. Crucially, we only have a finite storage capacity and to continue recording our process, memory has to be deleted and replaced. Similarly with Maxwell’s Demon, it would need to ‘forget’ information it had on the moving molecules after each operation. The erasure of information carries a thermodynamic penalty, it actually expends energy. Thus Maxwell’s Demon was proven to be impossible, the expending of energy creates more entropy than the ordering process of the molecules. NOTHING can violate the 2nd law of Thermodynamics.

1 bit of data, the smallest fundamental unit of data

1 bit of data, the smallest fundamental unit of data

Even forgetting something takes a little energy – the Landauer limit. It is extremely small, 2.85×10-25 Joules! This is the amount of energy that is required to delete or reset 1 bit of data from the universe, disorder from the order of information. This Erasure of information is an irreversible process that increases the entropy of the universe. It explains why we cannot reverse the cooling of a cup of coffee on a table, which would amount to obtaining free energy as simply unlimited infinite memory storage does not exist.


Further reading:

Nature: The unavoidable cost of computing revealed 

When a Good Theory meets a Bad Idealization: The Failure of the Thermodynamics of Computation