True confidence interval for the parameter of a known distribution

122 Views Asked by At

I have $X$ which has values in ${0, 1, 2}$. And i'd like to know if i could compute a 95% confidence interval for the mean of n samples from this distribution.

I know $P(X=0), P(X=1)$ and $P(X=2)$. I know how to compute the true mean of the random variable mean of n samples $(P(X=0) * 0 + P(X=1) * 1 + 2 * P(X=2))$, but I can't figure out how to compute the true confidence interval. It should not be that hard, but I really am stuck on what to use.

Hopefully it's not a stupid question!!

Thanks

2

There are 2 best solutions below

13
On

One simple thing that one can always try, following Casella&Berger, is to build an approximate confidence interval. This has the advantage that does not depend on assumptions about the distributions but is correct only for large sample sizes. I add it in case the OP is not familiar with the procedure.

From the CLT and Slutsky's theorem we have always an asymptotical Pivot statistics:

$$T= \frac{\overline{X}-\mu}{S/\sqrt{n}}$$

, where $\overline{X}$ is the sample mean and $S$ the sample standard deviation.

For large $n$ T tends in distribution to $N(0,1)$. Therefore, calling $z_{\alpha}$ as usual the value such that $\alpha=P(Z>z_{\alpha})$, an asymptotic $\alpha$ confidence interval for the mean of the distribution is:

$$\overline{X}-z_{\frac{1-\alpha}{2}}\frac{S}{\sqrt{n}}<\mu<\overline{X}+z_{\frac{1-\alpha}{2}}\frac{S}{\sqrt{n}}$$

Of course, this works for large $n$ but I think applies also in your case. I am sure there are also better small sample estimators.

12
On

The link I provided in the comments may be hard to apply to a general distribution given it presumes a parametric formulation.

I remembered a more straightforward approach that you may find useful. It's based off the ECDF of a distribution, relies on the DKW inequality which allows one to form exact confidence bands around the ECDF ($F_n$) for the CDF ($F$) (sample size of $n$):

$$P\left(\sup_{x\in \mathbb{R}} \left\vert F_n(x) - F(x)\right\vert > \varepsilon \right)\leq 2e^{-2n\varepsilon^2} \implies CI_{1-\alpha} = F_n(x) \pm \sqrt{\frac{\ln(\frac{\alpha}{2})}{2n}}:=F_n(x)\pm \varepsilon_n$$

The CI for the mean is is simply the integral of the upper and lower tail distribution curves formed from the upper and lower bands:

Let's define the "shifted" tail curve as

$$T(\epsilon):= \sum_0^{2n}\left(1-F_n(x)-\epsilon\right)$$

Also, since $X\geq 0, E[X]=\int_0^{\infty} (1-F_X) dx$ we can form our confidence interval for the expected value from the confidence bands:

$$CI_{1-\alpha}\left(E[X]\right) = \left[T(\varepsilon_n),T(-\varepsilon_n)\right]$$

Where $$P\left(E[X] \in CI_{1-\alpha}\left(E[X]\right)\right) \geq 1-\alpha$$