Central Limit Theorem Definition

1.4k Views Asked by At

My friend and I have a bet going about the definition of the Central Limit Theorem.

If we define an example as a number drawn at random from some probability density function where the function has a defined finite mean and variance. And we define a sample as a set of size N examples (with N>1).

Then, we take S samples and create a sampling distribution D over the means of each individual sample.

I am arguing that the Central Limit Theorem states that as the number of samples S approaches infinity, then the sampling distribution D will approximate a normal distribution.

My friend is arguing that the Central Limit Theorem states that given any number of samples S, sampling distribution D will not necessarily approximate a normal distribution, but as the number of examples per sample N approaches infinity, then D will approximate a normal distribution.

Who is right?

Update: I lost this bet.

3

There are 3 best solutions below

4
On BEST ANSWER

I'd say that your friend is more correct, in that he/she correctly points to $N$ (sample size=number of values that are summed to compute the average,i.e. the "sample mean") as the thing that must tend to $\infty$ for the CLT to hold.

We have

$$S_N=\frac{X_1+X_2+ \cdots +X_N}{N}$$

Here, in our setting, the set $\{X_1, \ldots X_N\}$ is one sample, of size $N$; and $S_N$ is the sample-mean (=average) of that sample.

This $S_N$ is a random variable (informally, it takes different random values for each sample). What the CLT says is about the distribution of this $S_N$ as $N\to \infty$. Of course, if you were practically interested in checking/experiencing that $S_N$ (for some fixed $N$) is indeed approximately gaussian, you might want to draw many values of $S_N$ and, eg, draw an histogram; for this, you would need to draw a lot of samples (each of size $N$). But this has nothing to do with the asymptotics of the theorem.

1
On

Neither of you, but you are much less wrong than he is.

Wikipeadia states what the CLT is reasonably clearly in its first 2 paragraphs. Please note that there are several variants but sampling from the same (unchanging) population meets the requirements for the Classical CLT - specifically that each of the samples is independent and identically distributed.

You have only partially captured the criteria for it to be true, specifically it is not enough for the mean and variance to be "defined" - $\sigma=\infty$ is defined, but that represents a power law distribution and the sample distribution will aproach an $\alpha$-stable distribution, not a normal distribution. Other than that you are bang on the money.

Your friend is incorrect, his postulate is demonstrably incorrect by considering setting the sample size equal to the population. In this case each sample will have the population distribution, but the sampling distribution will become more and more normal as there are more and more samples. Try the experiment with the Standard Uniform Distribution or for a more dramatic impact, this one:

$$f(x)=\begin{cases} x^2 &-\frac{1}{2}\le x\le \frac{1}{2}\\ 0 &\text{otherwise} \end{cases}$$

1
On

Both the definitions are incorrect though your friend's is less so.

The number of samples is not relevant at all since the sampling distribution gives you the probability of obtaining different values of a sample statistic and is therefore a theoretical object derived from the random process which generates your sample. Taking many samples may help you learn about the sampling distribution but that is not germane to the Central Limit Theorem since the sampling distribution exists whether we know about it or not.

Under the usual assumptions (you should look them up in a textbook) the sample mean $\bar X$ tends to the population mean so the sampling distribution of the former shrinks to a distribution that puts all its weight on one point. The variable which has a limiting normal distribution as the sample size $N$ tends to infinity is $$(\sqrt{N})\bar X.$$