Posts Tagged ‘First Digit Law’

Monday Morning Math: the First Digit Law

October 3, 2022

You can catch criminals with math!  You might expect that if you were to record a bunch of numbers, the first digit would be equally likely to be 1, 2, 3, 4, 5, 6, 7, 8, or 9, but it turns out that for many numbers (costs for a company, time spent working on something) the first digit is usually small:  1 is the first digit about 30% of the time, while 9 is the first digit less than 5% of the time! Here’s a picture of the distribution:

from Gknor 

This rule is generally known as the First Digit Law, although it is also called Benford’s Law after Frank Benford (who himself called it “The Law of Anomalous Numbers” in a 1938 paper) or the Newcomb-Benford Law in recognition that  Simon Newcomb had noted it more than 50 years earlier, in 1881, in “Note on the Frequency of Use of the Different Digits in Natural Numbers”.

There are also some restrictions on what kind of numbers follow the First Digit Law:  According to Statistics How To:

Benford’s law doesn’t apply to every set of numbers, but it usually applies to large sets of naturally occurring numbers with some connection like:

  • Companies’ stock market values,
  • Data found in texts — like the Reader’s Digest, or a copy of Newsweek.
  • Demographic data, including state and city populations,
  • Income tax data,
  • Mathematical tables, like logarithms,
  • River drainage rates,
  • Scientific data.

The law usually doesn’t apply to data sets that have a stated minimum and maximum, like interest rates or hourly wages. If numbers are assigned, rather than naturally occurring, they will also not follow the law. Examples of assigned numbers include: zip codes, telephone numbers and Social Security numbers.

(TwoPi, in a discussion about this, mentioned that books of logarithm tables tend to be dirtier in the beginning than at the end, in a visual application of the law.) According to J. Carlton Collins in the Journal of Accounting the data set should be somewhat large, at least 500 entries ideally.  Still, it’s a pretty impressive rule, and one that doesn’t quite make intuitive sense to me.

So about catching criminals?  Forensic accountants use this rule to catch people who falsify invoices, because falsified data doesn’t usually follow this expected pattern.  Go math!