Take a large data set, like stock prices, electricity bills or population sizes of different countries. If you were to count how many times each digit appeared as the first digit (for 105 the first digit would be 1, for 34583 the first digit would be 3, etc.), what would be their distribution? You’d expect that each numeral would occur an equal amount of times. But you would be wrong.
The distribution of first digits will look more like this:
This is called Benford’s law, or the first digit law. Frank Benford, in 1938, compiled a long list of data sets to which this rule applied, including death rates and length of rivers. This distribution holds up for many more data sets: for example, if you take the height of the 60 tallest buildings in the world, you get this distribution independent of the unit of measurement (metres, feet or egyptian cubits)! Thirty percent of the 54 million real-world constants in the Inverse Symbolic Calculator begin with the number 1. You can have a look at more examples based on real data (like twitter followers or population of Turkish boroughs) on testingbenfordslaw.com.
The distribution of first digits is the same as the distance between digits on a logarithmic scale bar (see picture above). For a more mathematical explanation of Benford’s law, have a look a this Wolfram Mathworld entry.
Benford’s law is not just a mathematical curiosity, it has also real-world applications. It can be used to test the validity of supplied data sets, if the data has been tampered with, it is less likely to follow the distribution. For example, Greece’s macroeconomic data, if analysed by Benford’s Law appears to be fraudulent. In the USA, evidence based on the first digit law has been admitted in court.
It also has fictional-world applications: the main character of the TV series NUMB3RS used Benford’s law to catch a burglar…