Central Limit Theorem for empirical distribution

1.3k Views Asked by At

I am a beginner in Real Analysis.

I know that if the samples are taken from Normal distribution, then the sum of the samples also follows a Normal distribution, irrespective of sample size. However, I want to know if there is any equivalent of this kind of limit theorem when the distribution of samples is not normal but converges in distribution to a Normal Distribution.

Formally,

Let $X_1,X_2,X_3...,X_n$ be independent observations from an empirical distribution function $F_n(x)$. Let $N(\mu,\sigma^2)$ be the actual Normal distribution function.

Let $F_n(x)$ converges in distribution to $F(x)$.

From classical CLT, we know that if samples are taken independently from $F(x)$, the sum of the samples is also a normal distribution,$N(n\mu,{n^2\sigma}^2)$, for any sample size $n$. Can I say the same conclusion if samples are taken from $F_n(x)$, instead of $F(n)$

I did read the various versions of Functional Central Limit Theorems but could not make any progress. Thank you for the help.

2

There are 2 best solutions below

1
On

See the Glivenko-Cantelli theorem. Basically, it says that under mild conditions of i.i.d sample, $$ F_n(x) \xrightarrow{a.s.}F(x). $$

4
On

This is by no means an answer, but just some small thoughts:


  • Let us first consider a deterministic sequence of distributions $(F_n)$ that satisfies the following conditions:

    1. $\int_{\mathbb{R}} x \, dF_n(x) = 0$ and $\lim_{n\to\infty} \int_{\mathbb{R}} x^2 \, dF_n(x) = \sigma^2 > 0$.

    2. For any $\epsilon > 0$, $\lim_{n\to\infty} \int_{|x|>\epsilon \sqrt{n}} x^2 \, dF_n(x) = 0$

    If $(X_{n,1},\cdots, X_{n,n})$ are i.i.d. and have the common distribution $F_n$, then by the Lindeberg-Feller CLT (see Theorem 3.4.5 of Durrett, for instance) we have

    $$ Z_n := \frac{X_{n,1}+\cdots+X_{n,n}}{\sqrt{n}} \quad \Rightarrow \quad \mathcal{N}(0, \sigma^2). $$

    The above two conditions are in some sense also necessary for this convergence to occur. (For instance, see this article.)

  • That being said, if $(F_n)$ is a sequence of random distributions that satisfy the condition 1 and 2 $\mathbb{P}$-almost surely, then we have the following quenched CLT:

    $$ \mathbb{P}\left[ Z_n \Rightarrow \mathcal{N}(0, \sigma^2) \right] = 1. $$

    Note that there are two levels of randomness. Each distribution $F_n$ is random by itself, and given a sample of $F_n$ we generate random variables $X_{n,1}, \cdots, X_{n,n}$.

    Formally, we can construct a product of probability spaces $\Omega \times \mathcal{S}$, a probability measure $\mathbb{P}$ on $\Omega$, and a family of probability measures $\{\mathsf{P}^{(\omega)}:\omega\in\Omega\}$ on $\mathcal{S}$ such that

    (i) $F_n$ are random distributions on $\Omega$, and

    (ii) $X_{n,k}$ are random variables on $\Omega\times\mathcal{S}$ such that for each given $\omega\in\Omega$, $X_{n,1},\cdots,X_{n,n}$ are i.i.d. and have the distribution $F_n(\omega)$ under the law $\mathsf{P}^{(\omega)}$.

    Now if $\omega \in \Omega$ is such that two conditions 1 and 2 are satisfies for the sample $(F_n(\omega))$, then we can apply the Lindeberg-Feller CLT to show that

    $$ Z_n := \frac{X_{n,1}+\cdots+X_{n,n}}{\sqrt{n}}, \qquad \lim_{n\to\infty} \mathsf{P}^{(\omega)} \left[ Z_n \leq x \right] = G(x) $$

    where $G(x)$ is the CDF of the normal distribution $\mathcal{N}(0, \sigma^2)$. So if two conditions 1 and 2 are satisfies $\mathbb{P}$-almost surely, then

    $$ \mathbb{P}\left[ \left\{ \omega \in \Omega : \lim_{n\to\infty} \mathsf{P}^{(\omega)} \left[ Z_n \leq x \right] = G(x) \right\} \right] = 1. $$

    In other words, $Z_n \Rightarrow \mathcal{N}(0,\sigma^2)$ under $\mathsf{P}^{(\omega)}$ is guaranteed for almost every $\omega \in \Omega$.