Almost sure convergence of $L_1$-Wasserstein distance between the empirical and true CDFs

233 Views Asked by At

Let $X_1, X_2, \ldots$ be i.i.d. from a distribution with cdf $F$. Let $F_n$ denote the empirical cdf $F_n(t) = \frac{1}{n}\sum\limits_{i=1}^n I(X_i \leq t)$.

How can we prove that $\int_{\mathbb{R}}|F(t) - F_n(t)|\text{ d}t \overset{a.s.}{\longrightarrow} 0$ if $\operatorname{E}[X_1] < \infty$?


Barrio, Giné, and Matrán (1999) claims that this should follow from Glivenko-Cantelli, Law of Large Numbers, and Dominated Convergence Theorem.

Matsak (2006) claims that this should follow from the Law of Large Numbers on Banach spaces.

Despite these references, I was unable to fill in the details to prove the claim.

2

There are 2 best solutions below

4
On BEST ANSWER

By the strong law of large numbers we have $$F_n(t)=\frac{1}{n}\sum_{i=1}^n I(X_i\leq t) \rightarrow \mathbb E[I(X_1\leq t)] = \mathbb P[X_1\leq t] = F(t)$$ almost surely.

Now we have $$|F_n(t)-F(t)| =|(1-F(t))-(1-F_n(t))| \leq 1-F_n(t) + 1-F(t) $$ as well as $$|F_n(t)-F(t)|\leq F_n(t)+F(t)$$ Think about how these two inequalities give you a finitely integrable upper bound of $|F_n(t)-F(t)|$ for all $t\in\mathbb R$. Then you can apply the dominated convergence theorem.

0
On

If the integral was on a compact interval, say $[-R,R]$, then we would simply apply the dominated convergence theorem, since for each fixed $t$, $F_n(t)\to F(t)$ almost surely and $\lvert F_n(t)-F(t)\rvert\leqslant 1$.

In order to treat the integral over the whole real line, write \begin{align} \int_R^\infty\left\lvert F_n(t)-F(t)\right\rvert dt &=\int_R^\infty\left\lvert \frac 1n\sum_{i=1}^n \mathbf{1}_{X_i>t}-\mathbb P(X_1>t)\right\rvert dt\\ &\leqslant \int_R^\infty\frac 1n\sum_{i=1}^n \mathbf{1}_{X_i>t}dt+\int_R^\infty\mathbb P(X_1>t)dt\\ &\leqslant \frac 1n\sum_{i=1}^n \mathbf{1}_{X_i>R}\int_R^{X_i}dt+\int_R^\infty\mathbb P(X_1>t)dt\\ &=\frac 1n\sum_{i=1}^n (X_i-R)\mathbf{1}_{X_i>R} +\int_R^\infty\mathbb P(X_1>t)dt. \end{align} By the usual law of large numbers, one has for each integer $R$ $$ \limsup_{n\to\infty}\int_R^\infty\left\lvert F_n(t)-F(t)\right\rvert dt\leqslant \mathbb E\left[(X_1-R)\mathbf{1}_{X_1>R} \right]+\int_R^\infty\mathbb P(X_1>t)dt \mbox{ a.s.} $$ By a similar reasoning, $$ \limsup_{n\to\infty}\int_{-\infty}^{-R}\left\lvert F_n(t)-F(t)\right\rvert dt\leqslant \mathbb E\left[(-R-X_1)\mathbf{1}_{X_1\leqslant -R} \right]+\int_{-\infty}^{-R}\mathbb P(X_1\leqslant t)dt \mbox{ a.s.} $$ and taking into account the first observation of the post, we derive that for each integer $R$, $$ \limsup_{n\to\infty}\int_{-\infty}^{\infty}\left\lvert F_n(t)-F(t)\right\rvert dt\\\leqslant \mathbb E\left[(X_1-R)\mathbf{1}_{X_1>R} \right]+\int_R^\infty\mathbb P(X_1>t)dt+\mathbb E\left[(-R-X_1)\mathbf{1}_{X_1\leqslant -R} \right]+\int_{-\infty}^{-R}\mathbb P(X_1\leqslant t)dt\mbox{ a.s.} $$ and integrability of $X_1$ guarantees that the last quantity goes to $0$ as $R\to \infty$.