How to calculate the covariance of the empirical distribution function?

184 Views Asked by At

Let $X_i$ be iid random variables with distribution function $F(x)$. How to calculate $\textbf{Cov}[F_n(x),F_m(y)]$ where $F_n(x)$ is the empirical cumulative distribution function and $n,m\in\mathbb N$?

1

There are 1 best solutions below

3
On BEST ANSWER

First we have that \begin{align*} \mathbb E [F_n(X)] &= \frac{1}{n} \sum\limits_{k=1}^nF_k(x)\\ \end{align*} and similarly \begin{align*} \mathbb E [F_m(X)] &= \frac{1}{m} \sum\limits_{k=1}^mF_k(x) \end{align*} Now we calculate $\mathbb E [F_n(X)F_m(X)]$ \begin{align*} \mathbb E [F_n(X)F_m(X)] &= \frac{1}{n}\sum\limits_{k=1}^n\frac{1}{m}\sum\limits_{l=1}^m\mathbb E \left[\tt1[X_k\leq x]\tt1[X_l\leq x]\right]\\ &= \frac{1}{nm}\sum\limits_{k=1}^n\sum\limits_{l=1}^m\mathbb P[X_k\leq x \wedge X_l\leq x]\\ &= \frac{1}{nm}\sum\limits_{k=1}^n\bigg[\sum\limits_{l=1}^{k-1}\mathbb P[X_k\leq x \wedge X_l\leq x]+\\ &\qquad\sum\limits_{l=k}\mathbb P[X_k\leq x \wedge X_l\leq x]+ \sum\limits_{l=k+1}^{m}\mathbb P[X_k\leq x \wedge X_l\leq x]\bigg]\\ &= \frac{1}{nm}\sum\limits_{k=1}^n\bigg[\sum\limits_{l=1}^{k-1}\mathbb P[X_k\leq x]\mathbb P[X_l\leq x]+\\ &\qquad\mathbb P[X_k\leq x]+ \sum\limits_{l=k+1}^{m}\mathbb P[X_k\leq x]\mathbb P[X_l\leq x]\bigg]\\ &= \frac{1}{nm}\left[\sum\limits_{k=1}^{min(n,m)}\mathbb P[X_k\leq x] + \sum\limits_{k=1}^n\sum\limits_{l\in [m]\setminus\{k\}}\mathbb P[X_k\leq x]\mathbb P[X_l\leq x]\right]\\ &= \frac{1}{nm}\left[\sum\limits_{k=1}^{min(n,m)}F_{k}(x) + \sum\limits_{k=1}^n\sum\limits_{l\in [m]\setminus\{k\}}F_{k}(x)F_{l}(x)\right]\\ \end{align*}

We use independence in the 4th equality. Note that we have not used identically distributed so far.

Now we can calculate the covariance. \begin{align*} \textbf{Cov}[F_n(x),F_m(y)]&=\mathbb E [F_n(X)F_m(X)] - \mathbb E [F_n(X)]\times\mathbb E [F_m(X)]\\ &=\frac{1}{nm}\sum\limits_{k=1}^n\sum\limits_{l=1}^m\mathbb P[X_k\leq x \wedge X_l\leq x] - \frac{1}{n} \sum\limits_{k=1}^nF_k(x)\times\frac{1}{m} \sum\limits_{l=1}^mF_l(x)\\ &=\frac{1}{nm}\left[\sum\limits_{k=1}^{min(n,m)}F_{k}(x) + \sum\limits_{k=1}^n\sum\limits_{l\in [m]\setminus\{k\}}F_{k}(x)F_{l}(x) - \sum\limits_{k=1}^nF_k(x)\times\sum\limits_{l=1}^mF_l(x)\right]\\ &=\frac{1}{nm}\left[\sum\limits_{k=1}^{min(n,m)}F_{k}(x) + \sum\limits_{k=1}^nF_{k}(x)\sum\limits_{l\in [m]\setminus\{k\}}F_{l}(x) - \sum\limits_{k=1}^nF_k(x)\times\sum\limits_{l=1}^mF_l(x)\right]\\ &=\frac{1}{nm}\left[\sum\limits_{k=1}^{min(n,m)}F_{k}(x) -\sum\limits_{k=1}^{min(n,m)}[F_{k}(x)]^2 \right]\\ \end{align*} where the last step is due to the rest of the terms cancelling. Using identically distributed property now: \begin{align*} \textbf{Cov}[F_n(x),F_m(y)]&=\frac{1}{nm}\left[\sum\limits_{k=1}^{min(n,m)}F_{k}(x) -[F_{k}(x)]^2 \right]\\ &=\frac{1}{nm}\left[\sum\limits_{k=1}^{min(n,m)}F_{X}(x) -[F_{X}(x)]^2 \right]\\ &=\frac{{min(n,m)}}{nm}\left[F_{X}(x)(1 -F_{X}(x)) \right]\\ \end{align*}

Therefore: $\textbf{Cov}[F_n(x),F_m(y)]=\frac{{min(n,m)}}{nm}\left[F_{X}(x)(1 -F_{X}(x)) \right]$