Rate of convergence in the central limit theorem (Lindeberg–Lévy)

8.3k Views Asked by At

There are similar posts to this one on stackexchange but none of those seem to actually answer my questions. So consider the CLT in the most common form.

Let $(X_n)_{n \in \mathbb{N}}$ be a sequence of i.i.d. random variables with $X_1 \in L^2(P)$ and $\mathbb{E}[X_i]= \mu$ and $\mathbb{V}ar[X_i] = \sigma^2>0$. Denote with $\widehat{X}:= \frac{(X_1+\dots+X_n)}{n}$. Then it holds that $$\sqrt{n} (\widehat{X} - \mu) \overset{\mathscr{D}}{\longrightarrow} \mathscr{N} (0,{\sigma}^2)$$ or, equivalently, $$\sqrt{n} \left( \frac{\widehat{X} - \mu}{\sigma} \right) \overset{\mathscr{D}}{\longrightarrow} \mathscr{N} (0,1).$$

I often see statements like the rate of convergence is of order $\frac{1}{\sqrt{n}}$. Trying to interpret this, this is what I have understood (informally) so far:

According to the strong law of large numbers, given the above conditions, $$\widehat{X} - \mu\overset{a.s.}{\longrightarrow} 0.$$ However $\widehat{X} - \mu$ stops converging to zero when multiplied by $\sqrt{n}$. So one says that the rate of convergence is of order $\frac{1}{\sqrt{n}}$.

So here are my questions:

  1. How does one define the order of convergence in this case using formal notation? Why does one say of order $\frac{1}{\sqrt{n}}$ and not of order $\sqrt{n}$?
  2. How do we know that if multiplied, for example, by a factor of lower or higher order, like $\sqrt[3]{n}$ or $n$, one would not get a random variable converging in distribution to some a.s. non-zero random variable (as opposed to the argument: "However $\widehat{X} - \mu$ stops converging to zero when multiplied by $\sqrt{n}$."?
  3. And, most importantly, can someone actually show rigorously that the rate of convergence is exactly of order $\frac{1}{\sqrt{n}}$?
2

There are 2 best solutions below

3
On BEST ANSWER
  1. I think you've basically defined it. You can say a sequence $Y_n$ of random variables converges of order $a_n$ if $Y_n/a_n$ converges in distribution to a random variable which isn't identically zero. The reason to have division instead of multiplication is so that $Y_n = a_n$ itself converges of order $a_n$. You should think of this as meaning "$Y_n$ grows or decays at about the same rate as $a_n$".

  2. This is Slutsky's theorem: if $Z_n \to Z$ in distribution and $c_n \to c$, then $c_n Z_n \to cZ$ in distribution. So suppose $Y_n$ converges of order $a_n$, so that $\frac{Y_n}{a_n}$ converges in distribution to some nontrivial $W$. If $b_n / a_n \to \infty$, then $\frac{Y_n}{b_n} = \frac{Y_n}{a_n} \frac{a_n}{b_n} \to W \cdot 0$, taking $Z_n = \frac{Y_n}{a_n}$, $Z=W$, $c_n = \frac{a_n}{b_n}$, and $c=0$ in Slutsky. So $Y_n$ does not converge of order $b_n$.

    On the other hand, if $\frac{b_n}{a_n} \to 0$, suppose to the contrary $\frac{Y_n}{b_n}$ converges in distribution to some $Z$. Then $\frac{Y_n}{a_n} = \frac{Y_n}{b_n} \frac{b_n}{a_n} \to 0 \cdot Z$ by Slutsky. But $\frac{Y_n}{a_n}$ was assumed to converge in distribution to $W$ which is not zero. This is a contradiction, so $Y_n$ does not converge of order $b_n$.

    But there isn't generally a unique sequence here. If $Y_n$ converges of order $\frac{1}{n}$, it would also be true to say $Y_n$ converges of order $\frac{1}{n+43}$, or $\frac{1}{n+\log n}$, or $\frac{1}{2n}$, et cetera.

  3. Not sure what you mean here, as this is just a restatement of the CLT, whose proof you seem to know.

6
On

This is just a remark that was too long to fit as a comment. The remark is about what people mean when they casually say "$\sqrt{n}$ (or $1/\sqrt{n}$) convergence".

Take $\mu = 0, \sigma = 1$ for simplicity. If $$\frac{1}{\sqrt{n}}\sum_i X_i $$ is "approximately normally distributed", as is guaranteed for sufficiently large $n$ by CLT, then we can approximate deviations from the empirical mean $\frac{1}{n}\sum_i X_i\approx 0$ by using the CLT approximation $$\mathbb{P}\left(-\frac{\epsilon}{\sqrt{n}} < \frac{1}{n}\sum_i X_i < \frac{\epsilon}{\sqrt{n}}\right) \approx \mathbb{P}\left(-\epsilon < N(0,1) < \epsilon\right).\tag{1}$$ Then, heuristically, if you want an extra decimal point of accuracy (i.e. divide $\epsilon$ by 10) with a fixed probability, you need $10^2$ more samples (this keeps the left-hand side of the above approximation asymptotically constant). This is often what people mean when they say "CLT implies $\sqrt{n}$ convergence".

The slight of hand above is: how large does $n$ have to be? In other words, what is the order of convergence of $$\left|\mathbb{P}\left(-\frac{\epsilon}{\sqrt{n}} < \frac{1}{n}\sum_i X_i < \frac{\epsilon}{\sqrt{n}}\right) - \mathbb{P}\left(-\epsilon < N(0,1) < \epsilon\right)\right|.$$ More explicitly, what is the approximation for any set, say $$\left|\mathbb{P}\left(\frac{1}{\sqrt{n}}\sum_i X_i < x\right) - \mathbb{P}\left(N(0,1) < x\right)\right|?$$ It turns out a bound on the above is also of order $1/\sqrt{n}$, and is the content of the Barry-Esseen Theorem, as pointed out in one of the links in the comments.