Empirical Distribution: problem from "All of Statistics"

35 Views Asked by At

I'm working through this textbook, and here is a problem I'm stuck on.

Setting: We are given an unknown distribution $F$, observed independent data points $X_1, ..., X_n \sim F$, the empirical distribution function $\hat F_n$, and two values $a,b \in \mathbb{R}_+$.

Tasks:

  1. use the central limit theorem to find the limiting distribution of $\hat F_n(a)$
  2. find a term for the value of $\text{Cov}(\hat F_n(a), \hat F_n(b))$

My thoughts: The central limit theorem only tells me something about the sample mean, which doesn't meaningfully restrict specific points on the CDF. Even if I knew $\mathbb E(F)$ exactly, there are still an uncountably infinite number of possible values of $F(a)$. So the application doesn't seem trivial.

My idea was to define a new sequence of random variables $A_1, ..., A_n$ where $A_i := 1_{X_i \le a}$. Now, we have $A_i \sim \text{Bernoulli}(p)$, where $p = F(a)$ is unknown. The sample mean $\overline{A_n}$ will converge to $F(a)$ by the weak law of large numbers. Now, I can apply the Central Limit Theorem to the $A_i$ and obtain that $\overline{A_n} \rightsquigarrow Z$ where $Z \sim N(p, p(1-p)/n)$.

So, it appears that $N(p, p(1-p)/n)$ may be the solution to the first problem. However, the second task made me doubt whether this is right. Since $\text{Cov}(X,Y) = \mathbb E(XY) - \mathbb E(X) \mathbb E(Y)$, I would have to compute $\mathbb E(\overline{A_n}\overline{B_n})$ (with $\overline{A_n}$ as above and $\overline{B_n}$ analogously for point $b$) somehow, and I have no idea how to do this.