Getting a normal distribution when sampling a uniform disturbition

406 Views Asked by At

I encountered some behavior that somehow makes sense to be but I don't exactly remember why, and I'll be glad to hear an explanation.

To simplify things up (it wasn't an intended experiment), I was drawing integers in the range [1, 10000], each number with the same probability (1/10000). I repeated this for like 200k times. For each number (from 1 to 10000), I wrote how many times it has been drawn. Then, I counted how many numbers appeared one time, how many numbers appeared 2 times, and so on. This gave me a normal distribution. Is that related to the original distribution being uniform? Or is it because of CLT? Should it happen in any kind of distribution?

Thanks.

1

There are 1 best solutions below

0
On

Sampling from a continuous uniform distribution, you will find that the Central Limit Theorem begins to 'converge' to normal for surprisingly small $n.$

If I sum $n=12$ independent observations from $\mathsf{Unif}(0,1)$ and subtract $6,$ the resulting random variable will be very nearly standard normal: $Z = \sum_{1=1}^{12} U_i - 6 \stackrel{aprx}{\sim}\mathsf{Norm}(0,1).$ Demonstration of a thousand such values using R:

set.seed(317)
z = replicate(1000, sum(runif(12))-6)
summary(z);  sd(z)
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
-3.265696 -0.720931  0.006213  0.003641  0.704164  2.854923 
 [1] 1.010645  # sample SD

A Shapiro-Wilk tests doesn't detect a difference from normal.

shapiro.test(z)

        Shapiro-Wilk normality test

data:  z
W = 0.99823, p-value = 0.3946

And a Kolmogorov-Smirnov test doesn't detect a difference from standard normal.

ks.test(z, pnorm)

        One-sample Kolmogorov-Smirnov test

data:  z
D = 0.020192, p-value = 0.8096
alternative hypothesis: two-sided

A histogram of the 1000 values of $Z$ generated in this way shows a reasonably good fit to a standard normal density curve.

hist(z, prob=T, col="skyblue2", main="Aprox NORM(0,1) from Sample of Uniform")
 curve(dnorm(x), add=T, col="orange", lwd=2)

enter image description here

Finally, a normal quantile plot is very nearly linear:

qqnorm(z, pch=20); qqline(z, col="green2")

enter image description here

This method of generating (nearly) a standard normal distribution isn't perfect (12 is a long way from infinity), but it was used to get approximately normal distributions in the early days of computation because it involves only simple arithmetic. Notice, however, that this method cannot give values outside the interval $[-6, 6],$ while the standard normal distribution theoretically takes values throughout the real line.