Intuition on the quality of the approximation of the sample average in the central limit theorem

133 Views Asked by At

Let's say that a sequence of random variables $(X_n)_{n \in \mathbb{N}}$ is $\mathcal {o}(Y_n)$ for some other sequence of random variables $(Y_n)_{n \in \mathbb{N}}$, if $\lim_{n \to \infty}\frac{X_n}{Y_n}=0$ in distribution.

We can use this to interpret the central limit theorem for such a sequence of random variables of mean $\mu$ and variance $\sigma^2$ as an asymptotic expansion (of the distribution) $$ \frac{1}{n}\sum_{j=1}^{n} X_j = \mu + \frac{\xi}{\sqrt{n}} + \mathcal{o}\left(\frac{1}{\sqrt{n}}\right), $$ where $\xi$ is a random variable distributed as $\mathcal{N}(0,\sigma^2)$.

Intuitively the first term comes from the law of large numbers, and the second term is a correction of order $\frac{1}{\sqrt{n}}$ coming from the central limit theorem. Is there some intuition in why this term should be of order $\frac{1}{\sqrt{n}}$? Why isn't the expansion in some other power of $n$, e.g. $\frac{1}{n}$ as in the definition of differentiability.

Assuming that it is intuitively clear that for large $n$, the sample average should be approximately distributed as a normal distribution, why should the quality of this approximation be exactly $\frac{1}{\sqrt{n}}$ and not some other numerical value?

3

There are 3 best solutions below

0
On BEST ANSWER

For simplicity set $\sigma^2=1,\mu=0$.If you take $X_1,...,X_n$ as iid $N(0,1)$ random variables then $\xi=(X_1+...+X_n)/\sqrt n$ has the $N(0,1)$ distribution and the expansion $$ \frac{1}{n}\sum_{i=1}^n X_i = \frac{\xi}{\sqrt n} $$ is exact, so indeed there is not much possible choice for the scaling in front of $\xi$.

0
On

Do you have good intuition for random walks? After $n$ steps, you're within $O(\sqrt{n})$ of the origin with high probability. That's why the second term is of order $\frac{1}{\sqrt{n}}$. As to why it's exactly $\frac{\xi}{\sqrt{n}}$, I think you're gonna need a proof....

0
On

At first let us notice that $X_i$ are independent identically distributed random variables (or vectors).

According to SLLN (strong law of large numbers, under assumption $EX_1 < \infty$) we have

$$\frac1{n}\sum_{i=1}^n X_i \to E X_1$$

almost sure. In other words, we have convergence

$$\frac{S_n - \mathbf{E}S_n}{n} \to 0,$$ where $S_n = \sum_{i=1}^n X_i$.

So, after division by $n$ the value $S_n - \mathbf{E}S_n$ tends to $0$. It's easy to see that if we will not devide it by $n$, it will not tend to $0$. It's natural: $D \Bigl(S_n - \mathbf{E}S_n \Bigr) = nDX_1$ and is "big" for "big" n. More precisely, the behavior of $S_n$ we may get from LIL (law of iterated logarithm).

Hence, we know that $$ \frac{S_n - \mathbf{E}S_n}{f(n)}$$ diverges when $f(n) =1$ and "vanishes" when $f(n) = n$.

In order to make the value $ \frac{S_n - \mathbf{E}S_n}{f(n)}$ "not too big and not too small" let us notice that $$D \frac{S_n - \mathbf{E}S_n}{f(n)} = \frac{n DX_1}{f^2(n)}.$$ Hence it's natural to consider the case when $f^2(n)$ has the same rate of convergence as $n$, because only in this case the dispersion of $ \frac{S_n - \mathbf{E}S_n}{f(n)}$ is not too big and not too small. And CLT says that this is a right approach.