Reasoning for confidence interval

191 Views Asked by At

Suppose $$X_1,\dots,X_{20} \sim f_X(x;\beta)$$ where $$f_X(x;\beta) = \frac{1}{\beta} e^{-\frac{x}{\beta}},\quad x>0;\beta>0$$

It can shown that ("details omitted") $$P(0.52 \bar{X} \leq \beta \leq 1.67 \bar{X}) = 0.99, \quad \forall\beta>0 \tag{1}$$

Thus the 99% confidence interval for $\beta$ is $$(0.52 \bar{X} , 1.67 \bar{X})$$

My question is how $(1)$ come about?

My attempt to reason

By definition, $$P(\text{confidence interval for } \beta) = 0.99$$

There exists test statistics $A = A(X_1,\dots,X_{20})$ and $B = B(X_1,\dots,X_{20})$ such that

$$P(A \leq \beta \leq B) = 0.99$$

I have found that the MLE of $\beta$ is $\hat \beta = \bar{X}$. This can be involved into the confidence interval as functions of $X_i's$.

That is, there exists $a_1,a_2 \in \mathbb{R}$ such that

$$P(a_1 \bar{X} \leq \beta \leq \bar{X} a_2) = 0.99$$

Since $\hat \beta = \bar{X}$, $$P(a_1 \leq \frac{\beta}{\hat \beta} \leq a_2) = 0.99$$

Now we assign $a_1 = x_{0.001}$ and $a_2 = x_{0.99}$ because finding the probability of such bounds yields $0.99$,

$$P(x_{0.001} \leq \frac{\beta}{\hat \beta} \leq x_{0.99}) = 0.99$$

Now we compute the CDF of $\dfrac{\beta}{\hat \beta}$ by finding the CDF of $\hat \beta$ then applying the transform, $ Y = \frac{1}{X}$.

Here is what I"ve established, is this the right direction? I'm skeptical because the functions $A$ and $B$ are not general enough..

2

There are 2 best solutions below

9
On

With this edit I am entirely replacing my answer. Instead of addressing the reasoning in the question, I'll just answer the short question that follows the words "My question is". You have an exponential distribution with expected value $\beta$. To see that that is the expected value, integrate: \begin{align} & \int_0^\infty x f(x)\,dx = \int_0^\infty \frac x \beta e^{-x/\beta} \, dx = \beta \int_0^\infty \frac x \beta e^{-x/\beta}\,\frac{dx}\beta = \beta\int_0^\infty u e^{-u}\,du \\[8pt] = {} & \underbrace{\beta \int u\,dv = \beta\left(uv -\int v\,du \right) }_{\text{integration by parts}} = \beta\left( \left.\vphantom{\frac11}u e^{-u}\right|_0^\infty - \int_0^\infty -e^{-u}\,du \right) = \beta(0+1). \end{align} The "$0$" at the end can be found by L'Hopital's rule. There is also a common-sense way to see that it is $0$. That the last integral is $1$ I'll leave as an exercise unless further questions are forthcoming.

Here's an exercise: (1) Show that $\beta$ is a scale parameter; (2) use that to show that the expected value must be $\beta$ times some constant, without finding any integrals.

From part (2) of the exercise, one sees that the point of evaluating the integral is just to show that the "constant" is $1$.

Similarly to part (2) of the exercise, one can show without finding any integrals that the standard deviation is some constant times $\beta$. But let's find the variance by brute force. First, the expected value of the square of this exponentially distributed random variable: $$ \int_0^\infty x^2 f(x)\,dx = \int_0^\infty \frac{x^2}\beta e^{-x/\beta}\,dx = \beta^2 \int_0^\infty \left(\frac x \beta \right)^2 e^{-x/\beta}\, \frac{dx}\beta = \underbrace{\beta^2 \int_0^\infty u^2 e^{-u}\,du =2\beta^2}_{\text{Integrate by parts again.}}. $$ Then $$ \text{variance}=\text{expected value of the square minus the square of the expected value} = 2\beta^2-\beta^2. $$ So the standard deviation is $\beta$.

Since $X_1,\ldots,X_{20}$ are independent, we get $\operatorname{var}(X_1+\cdots+X_{20}) = 20\beta^2$, so $\operatorname{var}\left(\dfrac{X_1+\cdots+X_{20}}{20}\right) = \dfrac 1 {20^2}\cdot 20\beta^2 = \beta^2/20$. And we don't need independence to show that $\operatorname{E}(\bar X) = \beta$.

Consequently $\bar X$ has expected value $\beta$ and standard deviation $\beta/\sqrt{20}$. And so $$ \frac{\bar X - \beta}{\beta/\sqrt{20}} \tag{$*$} $$ has expected value $0$ and variance $1$.

Probably what was done next is that the central limit theorem was invoked and $(*)$ was treated as approximately normally distributed. Hence its probability of being $<-2.5758$ is about $0.005$ and its probability of being $>2.5758$ is about $0.005$. Thus the event $$ -2.5758 < \sqrt{20}\frac{\bar X-\beta}{\beta} < 2.5758 $$ has probability about $0.99$. Multiplying the numerator and denominator by $1/\beta$ and dividing both sides by $\sqrt{20}$ we get $$ \frac{-2.5758}{\sqrt{20}} < \frac{\bar X}\beta - 1 < \frac{2.5758}{\sqrt{20}} $$ $$ 1-\frac{2.5758}{\sqrt{20}} < \frac{\bar X}\beta < 1+ \frac{2.5758}{\sqrt{20}} $$ $$ \frac 1 {1-\frac{2.5758}{\sqrt{20}}} > \frac \beta {\bar X} > \frac 1 {1+ \frac{2.5758}{\sqrt{20}}} $$ $$ 2.35834 > \frac \beta {\bar X} > 0.6345287 $$ $$ 2.35834 \bar X >\beta > 0.6345287 \bar X $$ This differs from what you had. A possibility that I haven't checked is that making the lower bound smaller and also making the upper bound smaller will leave the probablity at $0.99$. It is conceivable that that would be done in order to get a shorter interval. Another possibility is that your arithmetic or mine has errors.

2
On

Exact confidence intervals for exponential data

Let $X_1, X_2, \dots, X_n$ be iid $\mathrm{Exp}(\text{mean} = \beta).$ Then $$\bar X/\beta \sim \mathrm{Gamma}(\text{shape}=n, \text{scale}=1/n),$$ so that $E(\bar X) = \beta$ and $V(\bar X) = \beta^2/n.$ Then an exact 99% confidence interval (CI) for $\beta$ can be found as follows: $$P(L < \bar X/\beta < U) = P(1/U < \beta/\bar X < 1/L) = P(\bar X/U < \beta < \bar X/L) = 0.99.$$ where $L$ and $U$ cut 0.5% of the area from the lower and upper tails, respectively, of $\mathrm{Gamma}(\text{shape}=n, \text{scale}=1/n).$ Thus $(\bar X/U, \bar X/L)$ is a 99% CI for $\beta.$

For example, if $n = 20$ independent exponential observations have $\bar X = 104.08$, then $L = 0.5177$ and $U =1.6691$ are quantiles .005 and .995, respectively, of $\mathrm{Gamma}(\text{shape}=20, \text{scale}=1/20)$ and a 99% CI for $\beta$ is $( 62.35, 201.06),$ sensibly rounded.

The computations can be done on a statistical calculator or statistical computer package. Results from R are as follows.

 UL = qgamma(c(.995,.005), 20, scale=1/20); UL
 ## 1.6691490 0.5176634
 mean(x)/UL
 ## 62.35463 201.05571

Notes: (i) The Wikipedia article on 'exponential distribution' shows how to use printed chi-squared tables to get bounds based on the gamma distribution. (ii) The bounds in equation (1) of the Question are in error. They seem to be based on an incorrect assumption that $\beta/\bar X$ is gamma distributed (instead of its reciprocal).

Approximate CIs assuming that the sample mean $\bar X$ is normal

Although, using the normal approximation for the mean of an exponential sample as small as $n=20$ is, quite justifiably, becoming obsolete in applied statistics, some texts still show it for drill using normal tables.

If we assume that $\bar X$ for data in our example is approximately normal with mean estimated by $\hat \beta = \bar X$ and standard deviation estimated by $\hat \beta/\sqrt{n}.$ Then $$P(\bar X - 2.576\bar X /\sqrt{n}< \beta < \bar X + 2.576\bar X/\sqrt{n}) \approx. 0.99,$$ and the ''99%'' CI is $(44.128, 164.030).$

Approximate CIs assuming that the data are normal

If we stretch the "robustness" of the t confidence interval beyond reason, we get, the CI $(61.843, 146.315),$ based on $$P(\bar X - 2.861S /\sqrt{n}< \beta < \bar X + 2.861S/\sqrt{n}) \approx. 0.99.$$ Here, the sample standard deviation $S = 90.245$ estimates the population standard deviation $\sigma = \beta$ (but less reliably than $\bar X$ estimates $\beta$) and numbers $\pm 2.861$ are quantiles .005 and .995 of Student's t distribution with $n - 1 = 19$ degrees of freedom.

Assessment: Exact intervals are superior

For random samples of size $n = 20$ from an exponential distribution with $\beta = 100$, one can show that 99% of the gamma-based CIs include $\beta$ and their average length is about 133.

A simulation study based on 100,000 samples of size 20, shows the inferiority of the two approximate styles of intervals illustrated above. Among intervals based on the assumption that $\bar X$ is normal, only 96.5% cover $\beta$ because they are too short: average length about 115. Even though t-based intervals are usually longer (about 123 on average), still only 96.5% of them cover $\beta.$ [Added later: To be fair, normal-based CIs of the type $(.6345\bar X,2.358\bar X),$ as in the Answer by @Michael Hardy, have nearly 99% coverage, but at the cost of average length about 172.]

Note: No one sample, including the one we used to illustrate computations above, can be typical. That is why a simulation study uses many samples.