• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

.github/workflows/H29-Jun-2021-6453

benford/H29-Jun-2021-3,2562,718

data/H29-Jun-2021-6,0286,027

docs/H03-May-2022-199106

img/H03-May-2022-

tests/H29-Jun-2021-972734

.gitignoreH A D29-Jun-2021435 3525

.pylintrcH A D29-Jun-2021111 84

.readthedocs.ymlH A D29-Jun-2021575 2318

Demo.ipynbH A D29-Jun-20211 MiB2,6152,615

MANIFEST.inH A D29-Jun-202160 33

README-pypi.mdH A D03-May-20226.6 KiB148106

README.mdH A D29-Jun-20217.1 KiB179134

__init__.pyH A D29-Jun-202148 32

setup.cfgH A D29-Jun-202139 22

setup.pyH A D29-Jun-20211.4 KiB3936

README-pypi.md

1[![Downloads](https://pepy.tech/badge/benford-py)](https://pepy.tech/project/benford-py)
2
3# Benford for Python
4
5--------------------------------------------------------------------------------
6
7**Citing**
8
9
10If you find *Benford_py* useful in your research, please consider adding the following citation:
11
12```bibtex
13@misc{benford_py,
14      author = {Marcel, Milcent},
15      title = {{Benford_py: a Python Implementation of Benford's Law Tests}},
16      year = {2017},
17      publisher = {GitHub},
18      journal = {GitHub repository},
19      howpublished = {\url{https://github.com/milcent/benford_py}},
20}
21```
22
23--------------------------------------------------------------------------------
24
25`current version = 0.5.0`
26
27### See [release notes](https://github.com/milcent/benford_py/releases/) for features in this and in older versions
28
29### Python versions >= 3.6
30
31### Installation
32
33Benford_py is a package in PyPi, so you can install with pip:
34
35`pip install benford_py`
36
37or
38
39`pip install benford-py`
40
41Or you can cd into the site-packages subfolder of your python distribution (or environment) and git clone from there:
42
43`git clone https://github.com/milcent/benford_py`
44
45For a quick start, please go to the [Demo notebook](https://github.com/milcent/benford_py/blob/master/Demo.ipynb), in which I show examples on how to run the tests with the SPY (S&P 500 ETF) daily returns.
46
47For more fine-grained details of the functions and classes, see the [docs](https://benford-py.readthedocs.io/en/latest/index.html).
48
49### Background
50
51The first digit of a number is [its leftmost digit](https://github.com/milcent/benford_py/blob/master/img/First_Digits.png)
52
53Since the first digit of any number can range from "1" to "9"
54(not considering "0"), it would be intuitively expected that the
55proportion of each occurrence in a set of numerical records would
56be uniformly distributed at 1/9, i.e., approximately 0.1111,
57or 11.11%.
58
59[Benford's Law](https://en.wikipedia.org/wiki/Benford%27s_law),
60also known as the Law of First Digits or the Phenomenon of
61Significant Digits, is the finding that the first digits of the
62numbers found in series of records of the most varied sources do
63not display a uniform distribution, but rather are arranged in such
64a way that the digit "1" is the most frequent, followed by "2",
65"3", and so in a successive and decremental way down to "9",
66which presents the lowest frequency as the first digit.
67
68The expected distributions of the First Digits in a
69Benford-compliant data set are the ones shown [here](https://github.com/milcent/benford_py/blob/master/img/First.png)
70
71The first record on the subject dates from 1881, in the work of
72[Simon Newcomb](https://github.com/milcent/benford_py/blob/master/img/Simon_Newcomb_APS.jpg), an American-Canadian astronomer and mathematician,
73who noted that in the logarithmic tables the first pages, which
74contained logarithms beginning with the numerals "1" and "2",
75were more worn out, that is, more consulted.
76
77In that same article, Newcomb proposed the [formula](https://github.com/milcent/benford_py/blob/master/img/formula.png) for the probability of a certain digit "d"
78being the first digit of a number, given by the following equation.
79
80In 1938, the American physicist [Frank Benford](https://github.com/milcent/benford_py/blob/master/img/2429_Benford-Frank.jpg) revisited the
81phenomenon, which he called the "Law of Anomalous Numbers," in
82a survey with more than 20,000 observations of empirical data
83compiled from various sources, ranging from areas of rivers to
84molecular weights of chemical compounds, including cost data,
85address numbers, population sizes and physical constants. All
86of them, to a greater or lesser extent, followed such
87distribution.
88
89The extent of Benford's work seems to have been one good reason
90for the phenomenon to be popularized with his name, though
91described by Newcomb 57 years earlier.
92
93Derivations of the original formula were also applied in the
94expected findings of the proportions of digits in other
95positions in the number, as in the case of the second digit
96(BENFORD, 1938), as well as combinations, such as the first
97two digits of a number (NIGRINI, 2012, p.5).
98
99Only in 1995, however, was the phenomenon proven by Hill.
100His proof was based on the fact that numbers in data series
101following the Benford Law are, in effect, "second generation"
102distributions, ie combinations of other distributions.
103The union of randomly drawn samples from various distributions
104forms a distribution that respects Benford's Law (HILL, 1995).
105
106When grouped in ascending order, data that obey Benford's Law
107must approximate a geometric sequence (NIGRINI, 2012, page 21).
108From this it follows that the logarithms of this ordered series
109must form a straight line. In addition, the mantissas (decimal
110parts) of the logarithms of these numbers must be uniformly
111distributed in the interval [0,1] (NIGRINI, 2012, p.10).
112
113In general, a series of numerical records follows Benford's Law
114when (NIGRINI, 2012, p.21):
115* it represents magnitudes of events or events, such as populations
116of cities, flows of water in rivers or sizes of celestial bodies;
117* it does not have pre-established minimum or maximum limits;
118* it is not made up of numbers used as identifiers, such as
119identity or social security numbers, bank accounts, telephone numbers; and
120* its mean is less than the median, and the data is not
121concentrated around the mean.
122
123It follows from this expected distribution that, if the set of
124numbers in a series of records that usually respects the Law
125shows a deviation in the proportions found, there may be
126distortions, whether intentional or not.
127
128Benford's Law has been used in [several fields](http://www.benfordonline.net/).
129Afer asserting that the usual data type is Benford-compliant,
130one can study samples from the same data type tin search of
131inconsistencies, errors or even [fraud](https://www.amazon.com.br/Benfords-Law-Applications-Accounting-Detection/dp/1118152859).
132
133This open source module is an attempt to facilitate the
134performance of Benford's Law-related tests by people using
135Python, whether interactively or in an automated, scripting way.
136
137It uses the versatility of numpy and pandas, along with
138matplotlib for vizualization, to deliver results like [this one](https://github.com/milcent/benford_py/blob/master/img/SPY-f2d-conf_level-95.png) and much more.
139
140
141It has been a long time since I last tested it in Python 2. The death clock has stopped ticking, so officially it is for Python 3 now. It should work on Linux, Windows and Mac, but please file a bug report if you run into some trouble.
142
143Also, if you have some nice data set that we can run these tests on, let'us try it.
144
145Thanks!
146
147Milcent
148

README.md

1[![Downloads](https://pepy.tech/badge/benford-py)](https://pepy.tech/project/benford-py)
2
3# Benford for Python
4
5--------------------------------------------------------------------------------
6
7**Citing**
8
9
10If you find *Benford_py* useful in your research, please consider adding the following citation:
11
12```bibtex
13@misc{benford_py,
14      author = {Marcel, Milcent},
15      title = {{Benford_py: a Python Implementation of Benford's Law Tests}},
16      year = {2017},
17      publisher = {GitHub},
18      journal = {GitHub repository},
19      howpublished = {\url{https://github.com/milcent/benford_py}},
20}
21```
22
23--------------------------------------------------------------------------------
24
25`current version = 0.5.0`
26
27### See [release notes](https://github.com/milcent/benford_py/releases/) for features in this and in older versions
28
29### Python versions >= 3.6
30
31### Installation
32
33Benford_py is a package in PyPi, so you can install with pip:
34
35`pip install benford_py`
36
37or
38
39`pip install benford-py`
40
41Or you can cd into the site-packages subfolder of your python distribution (or environment) and git clone from there:
42
43`git clone https://github.com/milcent/benford_py`
44
45For a quick start, please go to the [Demo notebook](https://github.com/milcent/benford_py/blob/master/Demo.ipynb), in which I show examples on how to run the tests with the SPY (S&P 500 ETF) daily returns.
46
47For more fine-grained details of the functions and classes, see the [docs](https://benford-py.readthedocs.io/en/latest/index.html).
48
49### Background
50
51The first digit of a number is its leftmost digit.
52<p align="center">
53  <img alt="First Digits" src="https://github.com/milcent/benford_py/blob/master/img/First_Digits.png">
54</p>
55
56Since the first digit of any number can range from "1" to "9"
57(not considering "0"), it would be intuitively expected that the
58proportion of each occurrence in a set of numerical records would
59be uniformly distributed at 1/9, i.e., approximately 0.1111,
60or 11.11%.
61
62[Benford's Law](https://en.wikipedia.org/wiki/Benford%27s_law),
63also known as the Law of First Digits or the Phenomenon of
64Significant Digits, is the finding that the first digits of the
65numbers found in series of records of the most varied sources do
66not display a uniform distribution, but rather are arranged in such
67a way that the digit "1" is the most frequent, followed by "2",
68"3", and so in a successive and decremental way down to "9",
69which presents the lowest frequency as the first digit.
70
71The expected distributions of the First Digits in a
72Benford-compliant data set are the ones shown below:
73<p align="center">
74  <img alt="Expected Distributions of First Digits" src="https://github.com/milcent/benford_py/blob/master/img/First.png">
75</p>
76
77The first record on the subject dates from 1881, in the work of
78Simon Newcomb, an American-Canadian astronomer and mathematician,
79who noted that in the logarithmic tables the first pages, which
80contained logarithms beginning with the numerals "1" and "2",
81were more worn out, that is, more consulted.
82
83<p align="center">
84  <img alt="Simon Newcomb" src="https://github.com/milcent/benford_py/blob/master/img/Simon_Newcomb_APS.jpg">
85</p>
86<p align="center">
87      Simon Newcomb, 1835-1909.
88</p>
89
90In that same article, Newcomb proposed the formula for the
91probability of a certain digit "d" being the first digit of a
92number, given by the following equation.
93
94<p align="center">
95  <img alt="First digit equation" src="https://github.com/milcent/benford_py/blob/master/img/formula.png">
96</p>
97<p align="center"> where: P (D = d) is the probability that
98  the first digit is equal to d, and d is an integer ranging
99  from 1 to 9.
100</p>
101
102In 1938, the American physicist Frank Benford revisited the
103phenomenon, which he called the "Law of Anomalous Numbers," in
104a survey with more than 20,000 observations of empirical data
105compiled from various sources, ranging from areas of rivers to
106molecular weights of chemical compounds, including cost data,
107address numbers, population sizes and physical constants. All
108of them, to a greater or lesser extent, followed such
109distribution.
110
111<p align="center">
112  <img alt="Frank Benford" src="https://github.com/milcent/benford_py/blob/master/img/2429_Benford-Frank.jpg">
113</p>
114<p align="center">
115  Frank Albert Benford, Jr., 1883-1948.
116</p>
117
118The extent of Benford's work seems to have been one good reason
119for the phenomenon to be popularized with his name, though
120described by Newcomb 57 years earlier.
121
122Derivations of the original formula were also applied in the
123expected findings of the proportions of digits in other
124positions in the number, as in the case of the second digit
125(BENFORD, 1938), as well as combinations, such as the first
126two digits of a number (NIGRINI, 2012, p.5).
127
128Only in 1995, however, was the phenomenon proven by Hill.
129His proof was based on the fact that numbers in data series
130following the Benford Law are, in effect, "second generation"
131distributions, ie combinations of other distributions.
132The union of randomly drawn samples from various distributions
133forms a distribution that respects Benford's Law (HILL, 1995).
134
135When grouped in ascending order, data that obey Benford's Law
136must approximate a geometric sequence (NIGRINI, 2012, page 21).
137From this it follows that the logarithms of this ordered series
138must form a straight line. In addition, the mantissas (decimal
139parts) of the logarithms of these numbers must be uniformly
140distributed in the interval [0,1] (NIGRINI, 2012, p.10).
141
142In general, a series of numerical records follows Benford's Law
143when (NIGRINI, 2012, p.21):
144* it represents magnitudes of events or events, such as populations
145of cities, flows of water in rivers or sizes of celestial bodies;
146* it does not have pre-established minimum or maximum limits;
147* it is not made up of numbers used as identifiers, such as
148identity or social security numbers, bank accounts, telephone numbers; and
149* its mean is less than the median, and the data is not
150concentrated around the mean.
151
152It follows from this expected distribution that, if the set of
153numbers in a series of records that usually respects the Law
154shows a deviation in the proportions found, there may be
155distortions, whether intentional or not.
156
157Benford's Law has been used in [several fields](http://www.benfordonline.net/).
158Afer asserting that the usual data type is Benford-compliant,
159one can study samples from the same data type tin search of
160inconsistencies, errors or even [fraud](https://www.amazon.com.br/Benfords-Law-Applications-Accounting-Detection/dp/1118152859).
161
162This open source module is an attempt to facilitate the
163performance of Benford's Law-related tests by people using
164Python, whether interactively or in an automated, scripting way.
165
166It uses the versatility of numpy and pandas, along with
167matplotlib for vizualization, to deliver results like the one
168bellow and much more.
169
170![Sample Image](https://github.com/milcent/benford_py/blob/master/img/SPY-f2d-conf_level-95.png)
171
172It has been a long time since I last tested it in Python 2. The death clock has stopped ticking, so officially it is for Python 3 now. It should work on Linux, Windows and Mac, but please file a bug report if you run into some trouble.
173
174Also, if you have some nice data set that we can run these tests on, let'us try it.
175
176Thanks!
177
178Milcent
179