why the occurrence of 4,5,6 and 9 in pi differs?

167 Views Asked by At

i´m playing around with pi,

i have this document with the first 5million decimal numbers after comma. http://www.aip.de/~wasi/PI/Pibel/pibel_5mio.pdf

and i build a script that i put in for example pi with its first 22222 numbers after comma.

then i count the occurrence of each number and calculate the percentage

heres the result

0: 9.83709837098371
1: 9.93159931599316
2: 9.80559805598056
3: 9.91359913599136
4: 10.161101611016111
5: 10.41760417604176
6: 10.14310143101431
7: 9.90009900099001
8: 9.86859868598686
9: 10.02160021600216

i was expecting sth. like those coo repetitions of numbers, but i wonder why 4,5,6 and 9 occur a bit more often than the other numbers, when i pass all 5 million numbers after comma, the result is the same. especially i do wonder about the 9, that revides speculations about some kind of symmetric

so is there any explanation about this or am i digging into sth nobody has an answer to ?

thanks in advance

4

There are 4 best solutions below

3
On

All the proportions are within $1/200$ of a "perfect 10". How much closer did you expected them to be, and why?

0
On

Even a perfectly random sequence will show some biases in the short run. In fact, it would be extremely weird for all digits to come up exactly the same number of times. There are standard ways to measure how far the numbers are from equal distribution, and how far to expect random numbers to be from it, and the digits of $\pi$ have passed every randomness test to which they have ever been subjected.

Try with a trillion digits, and see what happens then.

7
On

Because $22222$ is not big enough.

Just use your script for many more digits and you will be much closer to $10$%.

Added later

You will find below the number of times each digit appears in the first $10,000,000$ digits of $\pi$

0:  999440
1:  999333
2: 1000306
3:  999965
4: 1001093
5: 1000466
6:  999337
7: 1000206
8:  999814
9: 1000040
0
On

It's random sampling, by Poisson distribution, you expect the variations to be of the order of $\sqrt{n}$. Write it as:

0: 9.84 +- 0.21
1: 9.93 +- 0.21
2: 9.81 +- 0.21
3: 9.91 +- 0.21
4: 10.16 +- 0.21
5: 10.42 +- 0.22
6: 10.14 +- 0.21
7: 9.90 +- 0.21
8: 9.87 +- 0.21
9: 10.02 +- 0.21

You can see that the measurements are within the expected bounds.

If you measure $n$ samples (an integer), the error is $\sqrt{n}$, so the more samples you take, the more exactly your measurements will match their asymptotic distribution, which is uniform in this case.

Calculating the percentages before evaluating the errors isn't a good idea.