The normal distribution is a common model of randomness

2k Views Asked by At

Can someone please comment/elaborate on the statement:

"The normal distribution is a common model of randomness."

I would like to understand it more deeply.

Source:

enter image description here

Perhaps someone can point me to a theorem or proof supporting this statement.

So my question for this thread really comes down to:

If a process can be modeled well by a normal distribution, is this a necessary and sufficient condition for the process to be called "random"?

4

There are 4 best solutions below

0
On

It's not a necessary condition. There are natural processes that are well modeled by other distributions such as the Poisson distribution for waiting times.

Some processes are naturally normal such as a random walk. Other processes converge to normality in the limit. For example, the average of many trials of a uniform distribution on an interval, is nearly normal.

4
On

No. Random physical processes don't necessarily have to be distributed like the normal distribution.

The text you referenced is talking about how in physical processes will give a distribution similar to the normal distribution. Think of throwing a dart at a dartboard and measuring the distance it lands from the bullseye (ignore the actual rules of darts and scoring 180). Most shots should be very near to the bullseye but some shots will be towards the right and left of the target and be far from the bullseye. It is pretty much random where they will end up. This is an example of how a normal distribution could be great at modelling a random physical process.

However, imagine a die. Rolling the die is also a random physical process but instead of a normal distribution, you'd expend 6 roughly equal, discrete peaks. So, this is a random process where using a normal distribution would fail.

0
On

The reason the Normal is a reasonable fit for a lot of phenomena can be explained via the Central Limit Theorem: Roughly, it says that a sample average of iid random variables $X_i$ with mean $\mu$ and standard deviation $\sigma$ from any distribution will approach a Normal distribution:

$$ \lim_{n\rightarrow \infty}P\left( \frac{\sum^n (X_i-\mu)}{n} \in \left[A\frac{\sigma}{\sqrt{n}}, B\frac{\sigma}{\sqrt{n}} \right] \right) = \int_A^B\frac{1}{\sqrt{2\pi}} e^{-x^2/2}dx $$

The consequence of that is that the distribution of phenomena where a lot of variables influence the outcome, which is often the case in nature, the Normal will approximate that distribution reasonably well.

0
On

The normal distribution is maximally uncertain on the real line. Precisely, this means that the normal distribution has the highest entropy of all distributions on the real line. In this way, if your distribution has a mean and standard deviation, and support equal to the real line, and you know nothing more, then from an entropy point of view, the best guess for the distribution is a normal distribution. This in no way justifies that the distribution should be normal, it just offers a basis for a guess. In physics, systems tend to gravitate toward such maximal entropy configurations, which is a heuristic way of implying that the normal distribution comes up a lot in unbounded systems.

Here is a very, very dumbed down explanation of the last paragraph. Imagine for a moment that someone suddenly took your keys and hid them somewhere in the universe, disregarding the direction they ran. Without any other information, your best guess for the distribution of the distance to your keys is normal, centered on you. On the other hand, if you know your keys are on Earth, this will imply your keys are in some finite interval. In this situation, the uniform distribution is entropy maximizing.

A number of answers mention the CLT but, this is only applicable to averages of random systems. For example, while the height of people is bell shaped, it is categorically not normal for many obvious reasons. As well, there are plenty of situations, particularly "fat-tail" distributions which violate the CLT, which come up in finance. Specifically, fat tail distributions have either infinite mean, infinite standard deviation, or both. Unfortunately, any collection of data points will, unless much more simulation is done, have a finite sample mean and standard deviation, which leads to abuse of the CLT and the normal distribution.