Comments on: GN. Benford’s Law http://mathfactor.uark.edu/2009/12/gn-benfords-law/ The Math Factor Podcast Site Fri, 08 Aug 2014 12:52:06 +0000 hourly 1 https://wordpress.org/?v=4.9.25 By: Stephen Morris http://mathfactor.uark.edu/2009/12/gn-benfords-law/comment-page-1/#comment-953 Wed, 01 Feb 2012 23:44:42 +0000 http://mathfactor.uark.edu/?p=997#comment-953 Ooh, just reading Ben Goldacre’s article again.  He links to this great site which tests lots of real world data.  Very nice.   http://testingbenfordslaw.com/ 

 There’s one data set which really stands out, you can probably guess before you try it. And, no, it isn’t the russian elections.

]]>
By: Stephen Morris http://mathfactor.uark.edu/2009/12/gn-benfords-law/comment-page-1/#comment-952 Wed, 01 Feb 2012 23:37:46 +0000 http://mathfactor.uark.edu/?p=997#comment-952 You can check them on line here:  http://benford.cloudcontrolled.com/# 

Hat tip to Ben Goldacre:  http://bengoldacre.posterous.com/benfords-law-and-online-calculator-from-fun-t 

Do read the article he links to.

]]>
By: strauss http://mathfactor.uark.edu/2009/12/gn-benfords-law/comment-page-1/#comment-948 Tue, 24 Jan 2012 22:57:37 +0000 http://mathfactor.uark.edu/?p=997#comment-948 Er,… unfortunately, I’ve forgotten…

]]>
By: Edwin http://mathfactor.uark.edu/2009/12/gn-benfords-law/comment-page-1/#comment-939 Wed, 18 Jan 2012 16:20:53 +0000 http://mathfactor.uark.edu/?p=997#comment-939 Ok, is it list two? Countries tat start with digits 4, 5, 6 and 8 appear to surpass the expected digit distribution. 
Although, the 1st list also offers disparities when compared to Benford. I noted that digits 4, 6 and 7 happen a lot more than expected Benford Law. Nonetheless, he 2nd list appears more manipulated than 1st list.

Which is erroneous? 

]]>
By: Shawn http://mathfactor.uark.edu/2009/12/gn-benfords-law/comment-page-1/#comment-904 Tue, 29 Nov 2011 05:48:56 +0000 http://mathfactor.uark.edu/?p=997#comment-904 Has anyone figured out which list is fake yet?

]]>
By: strauss http://mathfactor.uark.edu/2009/12/gn-benfords-law/comment-page-1/#comment-697 Sat, 19 Dec 2009 15:37:40 +0000 http://mathfactor.uark.edu/?p=997#comment-697 I guess the point wasn’t that we COULD fake data using Benford’s Law, but rather we should be careful not to get tripped up by some pesky investigator who is using Benford’s Law.

It’s not too hard to make up a data set from scratch that will resist any kind of slicing: just make sure your “expenses” are chosen randomly on a log scale. That is, if RAND is a random number from 0 to 1, instead of generating expenses of the form RAND*$10,000,000, generate expenses as 10^(RAND*7). No matter how the data is sliced, you’re home free! (Well, with just a few refinements depending on the particular fraud you’re perpetrating.)

You’re right, though, hiding or manipulating a specific piece of information within a larger data set is trickier. Now I’m no master criminal, but it doesn’t seem quite as bad as you say: mostly we need to carefully smear out the fraud, blending it in against the background, most especially being careful that the manipulated data is chosen with this log distribution.

(Mix a few bags of chips, DVDs, ipods, and TVs in with the rolexes, sportscars and beachfront property!)

My consulting fee can be deposited in an unmarked bank account, number available upon request.

]]>
By: Rob Stevenson http://mathfactor.uark.edu/2009/12/gn-benfords-law/comment-page-1/#comment-696 Sat, 19 Dec 2009 11:44:28 +0000 http://mathfactor.uark.edu/?p=997#comment-696 INteresting that you suggested that data can be faked using a knowledge of Benford’s law – this is almost impossible for medium sized data sets due to the fact that anyfair sized  random sampling of the data should also exhibit the law.
 
For example if I were a crooked accountant and wanted to reduce the tax bill for a local store I would need to amke sure that the law held for sales data sliced monthly.   Or sliced by department of the store.  Or by sales person.  Or by payment method.
Being able to fabricate this level of distributed “semi-randomness” is *really* hard.

]]>