Comments on: GN. Benford’s Law

By: Stephen Morris

Stephen Morris — Wed, 01 Feb 2012 23:44:42 +0000

Ooh, just reading Ben Goldacre's article again. He links to this great site which tests lots of real world data. Very nice. http://testingbenfordslaw.com/ There's one data set which really stands out, you can probably guess before you try it. And, no, it isn't the russian elections.

By: Stephen Morris

Stephen Morris — Wed, 01 Feb 2012 23:37:46 +0000

You can check them on line here: http://benford.cloudcontrolled.com/# Hat tip to Ben Goldacre: http://bengoldacre.posterous.com/benfords-law-and-online-calculator-from-fun-t Do read the article he links to.

By: strauss

strauss — Tue, 24 Jan 2012 22:57:37 +0000

Er,… unfortunately, I’ve forgotten…

By: Edwin

Edwin — Wed, 18 Jan 2012 16:20:53 +0000

Ok, is it list two? Countries tat start with digits 4, 5, 6 and 8 appear to surpass the expected digit distribution.
Although, the 1st list also offers disparities when compared to Benford. I noted that digits 4, 6 and 7 happen a lot more than expected Benford Law. Nonetheless, he 2nd list appears more manipulated than 1st list.

Which is erroneous?

By: Shawn

Shawn — Tue, 29 Nov 2011 05:48:56 +0000

Has anyone figured out which list is fake yet?

By: strauss

strauss — Sat, 19 Dec 2009 15:37:40 +0000

I guess the point wasn't that we COULD fake data using Benford's Law, but rather we should be careful not to get tripped up by some pesky investigator who is using Benford's Law.

It's not too hard to make up a data set from scratch that will resist any kind of slicing: just make sure your "expenses" are chosen randomly on a log scale. That is, if RAND is a random number from 0 to 1, instead of generating expenses of the form RAND*$10,000,000, generate expenses as 10^(RAND*7). No matter how the data is sliced, you're home free! (Well, with just a few refinements depending on the particular fraud you're perpetrating.)

You're right, though, hiding or manipulating a specific piece of information within a larger data set is trickier. Now I'm no master criminal, but it doesn't seem quite as bad as you say: mostly we need to carefully smear out the fraud, blending it in against the background, most especially being careful that the manipulated data is chosen with this log distribution.

(Mix a few bags of chips, DVDs, ipods, and TVs in with the rolexes, sportscars and beachfront property!)

My consulting fee can be deposited in an unmarked bank account, number available upon request.

By: Rob Stevenson

Rob Stevenson — Sat, 19 Dec 2009 11:44:28 +0000

INteresting that you suggested that data can be faked using a knowledge of Benford’s law – this is almost impossible for medium sized data sets due to the fact that anyfair sized random sampling of the data should also exhibit the law.

For example if I were a crooked accountant and wanted to reduce the tax bill for a local store I would need to amke sure that the law held for sales data sliced monthly. Or sliced by department of the store. Or by sales person. Or by payment method.
Being able to fabricate this level of distributed “semi-randomness” is *really* hard.