Find $\operatorname{Cov}(F_n^* (x), F_n^* (y))$ for fixed real numbers $x, y$ where $F_n^* (x)$ is a sample distribution function

375 Views Asked by At

Let $X_1,X_2,...,X_n$ be random sample from a DF $F$, and let $F_n^* (x)$ be the sample distribution function.

We have to find $\operatorname{Cov}(F_n^* (x), F_n^* (y))$ for fixed real numbers $x, y$ where $F_n^* (x)$ is a sample distribution.

My approach:

$$\text{Cov}(F_n^* (x), F_n^* (x)) = \mathbb{E}[(F_n^* (x) - \mathbb{E}[F_n^* (y)])(F_n^* (y) - \mathbb{E}[F_n^* (y)])]$$

$$\text{Cov}(F_n^* (x), F_n^* (x)) = \mathbb{E}[F_n^* (x) .F_n^* (y)] - \mathbb{E}[F_n^* (x)]\mathbb{E}[F_n^* (y)]$$

$$\text{Cov}(F_n^* (x), F_n^* (x)) = \mathbb{E}\bigg[\bigg(F_n^* (\min(x, y))\bigg) \bigg(F_n^* (\min(x, y)) + \int_{\min(x,y)}^{\max(x,y)} f_n^* (x) dx\bigg)\bigg] - \mathbb{E}[F_n^* (x)]\mathbb{E}[F_n^* (y)]$$

where $f_n^*(x)$ is a probability density function of the sample.

$$\text{Cov}(F_n^* (x), F_n^* (x)) = \mathbb{E}\bigg[\bigg(F_n^* (\min(x, y))\bigg)^2\bigg]+\mathbb{E}\bigg[ \bigg(F_n^* (\min(x, y)).\int_{\min(x,y)}^{\max(x,y)} f_n^* (x) dx\bigg)\bigg] - \frac{F_n (x)F_n (y)}{n^2}$$

$$\text{Cov}(F_n^* (x), F_n^* (x)) = \frac{(F_n (\min(x, y))^2}{n} - \frac{F_n (x)F_n (y)}{n^2} +\mathbb{E}\bigg[ \bigg(F_n^* (\min(x, y)).\int_{\min(x,y)}^{\max(x,y)} f_n^* (x) dx\bigg)\bigg] $$

How can I proceed from here?

2

There are 2 best solutions below

0
On BEST ANSWER

Another way is to use variance property $$ \text{Var}(F_n^*(y)-F_n^*(x)) = \text{Var}(F_n^*(y))+\text{Var}(F_n^*(x))-2\text{Cov}(F_n^*(x),F_n^*(y)). $$ All the variances can be calculated easy, and covariance can be found then. Say, $$ \text{Var}(F_n^*(x)) = \text{Var}\left(\dfrac{1}{n}\sum_{i=1}^n \mathbf 1_{X_i\le x} \right) = \dfrac{1}{n^2}\sum_{i=1}^n \text{Var}(\mathbf 1_{X_i\le x})=\dfrac{F(x)(1-F(x))}{n}. $$ And $$ \text{Var}(F_n^*(x)) = \dfrac{F(y)(1-F(y))}{n}. $$ Let $x<y$ to simplify notation. L.h.s. variance is $$ \text{Var}(F_n^*(y)-F_n^*(x)) = \text{Var}\left(\dfrac{1}{n}\sum_{i=1}^n \mathbf 1_{x<X_i\le y}\right) = \dfrac{\bigl(F(y)-F(x)\bigr)\bigl(1-F(y)+F(x)\bigr)}{n}. $$ So for $x<y$ $$ \dfrac{\bigl(F(y)-F(x)\bigr)\bigl(1-F(y)+F(x)\bigr)}{n} = \dfrac{F(y)(1-F(y))}{n}+\dfrac{F(x)(1-F(x))}{n}-2\text{Cov}(F_n^*(x),F_n^*(y)), $$ which give after expanding brackets $$ \text{Cov}(F_n^*(x),F_n^*(y)) = \dfrac{F(x)-F(x)F(y)}{n}. $$ Combining it with the case $y<x$ leads to $$ \text{Cov}(F_n^*(x),F_n^*(y)) = \dfrac{F(\min(x,y))-F(x)F(y)}{n}. $$ Please note also that empirical distribution function is a step function. It cannot have a density. The value $nF_n^*(y)$ has binomial distribution $B(n,F(y))$.

0
On

The empirical distribution function for a fixed $x\in\mathbb R$ is defined as

\begin{align} F^*_n(x)&=\frac1n\left(\text{ number of }X_i's \text{ less than or equal to }x\right) \\&=\frac1n\sum_{i=1}^n \mathbf1_{X_i\le x} \end{align}

So for fixed $x, y\in \mathbb R$,

\begin{align} \operatorname{Cov}[F^*_n(x),F^*_n(y)]&=\operatorname{Cov}\left[\frac1n\sum_{i=1}^n\mathbf1_{X_i\le x},\frac1n\sum_{j=1}^n\mathbf1_{X_j\le y}\right] \\&=\frac1{n^2}\sum_{i=1}^n\sum_{j=1}^n\operatorname{Cov}\left[\mathbf1_{X_i\le x},\mathbf1_{X_j\le y}\right] \\&=\frac1{n^2}\left\{\sum_{i=1}^n\operatorname{Cov}\left[\mathbf1_{X_i\le x},\mathbf1_{X_i\le y}\right]+\sum_{i\ne j}\underbrace{\operatorname{Cov}\left[\mathbf1_{X_i\le x},\mathbf1_{X_j\le y}\right]}_{0}\right\} \\&=\frac{n}{n^2}\operatorname{Cov}\left[\mathbf1_{X_1\le x},\mathbf1_{X_1\le y}\right] \\&=\frac1n \left\{\operatorname{E}[\mathbf1_{X_1\le x}\mathbf1_{X_1\le y}]-\operatorname{E}[\mathbf1_{X_1\le x}]\operatorname{E}[\mathbf1_{X_1\le y}]\right\} \\&=\frac1n \left\{\operatorname{E}[\mathbf1_{X_1\le \min(x,y)}]-\operatorname{E}[\mathbf1_{X_1\le x}]\operatorname{E}[\mathbf1_{X_1\le y}]\right\} \\&=\frac1n\left[F(\min(x,y))-F(x)F(y)\right] \end{align}