confidence interval in terms of the empirical CDF

393 Views Asked by At

Recall that the cumulative distribution function (CDF) of X is defined as:

$F(x)=P(w: X(w) \leq x)$

Using the sequence $(X_i)$ estimate the CDF F(x) using the empirical CDF: $$ \bar{F_n}(x) = \frac{1}{n} \sum_{i=1}^n \chi_{(-\infty, x)} (X_i). $$ For each fixed $x \in \mathbb {R}$, show that
$$ \mathbb {I}_{\alpha, n}(x) = \left\{ y \in \mathbb {R}: \bar{F_n}(x)- \frac{1}{2 \sqrt{n \alpha}} \leq y \leq \bar{F_n}(x) + \frac{1}{2 \sqrt{n \alpha}} \right\} $$

is a $(1-\alpha) \times 100$% confidence interval for F(x) in the sense that

$P(w: F(x) \in \mathbb {I}_{\alpha, n}(x) \geq 1-\alpha)$

How to prove this statement?

My attempt: I can convert the $$ \mathbb {I}_{\alpha, n}(x) = \left\{ y \in \mathbb {R}: \bar{F_n}(x)- \frac{1}{2 \sqrt{n \alpha}} \leq y \leq \bar{F_n}(x) + \frac{1}{2 \sqrt{n \alpha}} \right\} $$ to

$$ \mathbb {I}_{\alpha, n}(x) = \left\{ ny- \frac{\sqrt{n}}{2 \sqrt{ \alpha}} \leq n\bar{F_n}(x) \leq ny+ \frac{\sqrt{n}}{2 \sqrt{ \alpha}} \right\} $$

Then how to do next?

2

There are 2 best solutions below

0
On BEST ANSWER

$$\mathbb P (F(x) \in \mathbb I_{\alpha,n}) = \mathbb P \left(F(x) \in \left\{ y \in \mathbb {R}: \bar{F_n}(x)- \frac{1}{2 \sqrt{n \alpha}} \leq y \leq \bar{F_n}(x) + \frac{1}{2 \sqrt{n \alpha}} \right\}\right) = \mathbb P\left (\bar{F_n}(x)- \frac{1}{2 \sqrt{n \alpha}} \leq F(x) \leq \bar{F_n}(x) + \frac{1}{2 \sqrt{n \alpha}} \right) = \mathbb P\left (|\bar{F_n}(x) - F(x) | \leq \frac{1}{2 \sqrt{n \alpha}} \right) = \mathbb P\left (|\bar{F_n}(x) - \mathbb E\left[\bar{F_n}(x)\right] | \leq \frac{1}{2 \sqrt{n \alpha}} \right) $$

Now, the variance of $\bar{F_n}(x)$ is at most $0.25/n$ since it is a binomial with $n$ samples (this is true for any binomial of $n$ samples, can you figure out why?). So by Chebychev:

$$\geq 1-\frac{0.25/n}{1/(2\sqrt{n\alpha})^2} = 1-\alpha$$

0
On

This problem wants you to use an exact bound that holds for all $n$, such as the Chebyshev inequality or the Chernov-Hoeffding inequality, rather than a central limit theorem approximation. As a comment by Ian suggests, this problem can be solved by using the Chebyshev inequality alone. It turns out that the Chernov-Hoeffding inequality will give you a tighter bound for values of $\alpha$ that are small enough, but that is not needed in this problem.

As you have already noted in comments, the Chebyshev inequality for a general random variable $X$ with mean $E[X]$ and variance $\sigma^2$ is $$ P[|X-E[X]|\geq c] \leq \frac{\sigma^2}{c^2}$$ You can choose the parameter $c$ to make the upper bound $\sigma^2/c^2$ as desired. Also recall that $$ \{|X-E[X]|\leq c\} \iff X-c \leq E[X]\leq X+c \iff E[X] \in [X-c, X+c]$$

In comments, you have already computed the means and variances you need for this problem so you are almost done. I will only add that the upper bound $\sigma^2/c^2$ may depend on a certain parameter $p \in [0,1]$, so you can make the bound true regardless of that parameter by maximizing the bound over all $p \in [0,1]$.