Convergence of sampling means via Central Limit

67 Views Asked by At

So, if I understand the Central Limit Theorem, if we have i.i.d. random variables $X_1,\ldots,X_n$, with mean $E$ and variance $V$, then the distribution of their arithmetic mean $\overline{X}_n$ converges to a normal distribution with mean $E$ and variance $\frac1nV$.

I keep reading that, as long as the $X_i$ have a distribution that is symmetric and unimodal, then the convergence is good enough to work with for $n>30$. This appears to be a common "rule of thumb". There are other, more refined "rules of thumb" than one can dig up, it seems, accounting for what to do if the underlying distribution is skewed, or multi-modal. (Polymodal?)

At the other end, it's clear that, if the underlying distribution is normal to begin with, then $n=1$ is sufficient for a perfect match. Is there any rule, not "of thumb", for determining a value of $n$ for which the convergence is.... somehow "$\epsilon$-good", for some well-defined notion of goodness? Can I actually measure, "how far" a distribution is from the normal by looking at that distribution? If not in general, then perhaps for special cases? I get the impression it's pretty carefully studied for Bernoulli variables, at least, because the binomial distributions are mile markers along the road of that convergence.

I mean, given an underlying distribution, I can obviously just calculate sampling distributions for larger and larger values of $n$ until I'm happy, but I'm asking if there is a less brute-force way of doing it. I'm also asking if what I've written here reveals any errors in my understanding of the subject.

1

There are 1 best solutions below

0
On BEST ANSWER

There are several answers to your question.

One way to understand the CLT is to think of it as the result that guarantees the convergence of the distribution function $F_n(x)$ of the rescaled mean to the distribution function $\Phi(x)$ of the standard normal distribution, pointwise in $x$. Your question then would be how good this approximation is.

A standard result here is the Berry-Esseen theorem which gives a bound on the absolute deviation $||F_n(x) - \Phi(x)||_\infty$. This bound is of the order $n^{-1/2},$ up to constant which only depends on the variance $\sigma^2$ of the $X_i$'s and on their third moment.

Another possibility is an Edgeworth expansion, which, loosely speaking, is a variant of the classical Taylor expansion, in the sense that one expands $F_n(x)$ around $\Phi(x)$. Essentially this gives a more refined result than the Berry-Esseen theorem.

To summarize: If you are fine with assuming that the third moment exists, you get very general bounds on the approximation error.

But, like I said, this is only one way to interpret the CLT. One could also ask how quickly the characteristic function converges, for example. Or you might want quantiles to converge, if you are interested in a statistical application. These are all somewhat different questions, and you might get different answers.