Why can the normal distribution be used to describe general data?

563 Views Asked by At

I understand the assumptions underpinning and the motivation behind error distributions which led to the development of Laplace's and eventually Gauss's error distributions as explained in Saul Stahl's essay. It makes sense that larger random errors are less likely than smaller ones, and that the magnitudes of errors are the same regardless of direction, i.e. $\phi(x) = \phi(-x)$.

What I'm having a hard time reconciling is the widespread use of the normal distribution to describe generic data, not just errors. Why can we model human heights or test grades with a normal distribution?

I understand The Central Limit Theorem allows us to use the normal distribution to approximate certain sampling distributions, e.g. $\overline{x}$, as normal (usually when $n \geq 30$) - but why? What does an error curve have to do with CLT? Do random factors that affect natural phenomena to cancel each other out leading to (approximately) normal distributions with large samples?

4

There are 4 best solutions below

0
On

Are you clear what the "Central Limit Theorem" says? There doesn't have to be any "inherent" "normally distributed errors". The Central Limit Theorem say that given any probability distribution with a finite mean, $\mu$ and standard deviation, $\sigma$, then the sum of n samples will be approximately normally distribution with mean $n\mu$ and standard deviation $\sqrt{n}\sigma$ while the average of n samples is approximately distributed with mean $\mu$ and standard distribution $\frac{\sigma}{\sqrt{n}}$.

The larger n is, the better the approximation is. Generally n greater than or equal to 30 is considered enough to make the approximation "good".

2
On

Great question! I will be following this post. While I don't fully answer the question, here are some thoughts on why CLT is so prevelant.

I am actually writing a blog on this very thing but it is under construction. How it goes is that once you subtract the mean from a distribution, you are left with the deviations from the mean, and the expected value of this is 0 regardless from the distribution it came from. Now every time I sample from this distribution of deviations I get values to the left and right as you described creating a normal distribution.

A quote the blog : Any process that adds together random values from the same distribution converges to a normal. But it’s not easy to grasp why addition should result in a bell curve of sums. Here’s a conceptual way to think of the process. Whatever the average value of the source distribution, each sample from it can be thought of as a fluctuation from that average value. When we begin to add these fluctuations together, they also begin to cancel one another out.

It does this because there are so many more possible ways to realize a sequence of positive negative steps that sums to zero. There are slightly fewer ways to realize a sequence that ends up plus or minus one, and so on, with the number of possible sequences declining in the characteristic bell curve of the normal distribution.

The blog is like 70% complete but the first half or so is quite readable. You can check it out if you are interested :https://bluesky314.github.io/Coming-Soon-Guessing-the-Central-Limit-Theorem/

4
On

We model them with the normal distribution because it is convenient, not because it is correct. For real distributions, the tails are usually much larger than the normal distribution would have them if you assess the standard deviation based on most of the numbers. There is a quote attributed to Poincare that has been deleted from the Wikipedia page to the effect that "Everyone believes the normal distribution-physicists because they think it is a mathematical theorem, mathematicians because they believe it is an experimental fact."

2
On

You are obviously referring to the Central Limit Theorem which can be applied for a large enough sample from a non-normally distributed population.

It is a remarkable theorem which essentially says:

If all possible random samples of size $n$ are taken from $any$ population having mean $\mu$ and standard deviation $\sigma$, then the distribution of sample \underline{means} will:

(1) have a mean ($\mu_{x}$) which is equal to $\mu$.

(2) have a standard deviation $(\sigma_{x})$ that is equal to $\sigma$

(3) be normally distributed provided that the underlying population is normally distributed; otherwise, it will be at least approximately normally distributed for sample sizes of $n \geq 30$; and, the approximation to normality will improve as the sample size increases.

Thus, this theorem allows us to relate the mean $\bar{x}$ of our sample to the "at least approximately normally distributed" distribution of sample means whether or not our population is normally distributed provided that the sample size of our random sample is $30$ or more.