Calculate the confidence interval of parameter of exponential distribution?

13.8k Views Asked by At

How can I calculate the confidence interval for parameter $\alpha$ of exponential distribution ?

I think I can use test-t. Knowing that: $$mean = {1\over\alpha}$$

I found that : $${1\over {\bar X + \frac{S}{\sqrt{n}}\cdot t_{\alpha/2,n - 1}}}<\alpha<{1\over {\bar X - \frac{S}{\sqrt{n}}\cdot t_{\alpha/2,n - 1}}}$$

Is this right? In general, can I use test-t for determining the confidence interval of an exponential distribution ?

If not, is there any other possibility to do this ?

2

There are 2 best solutions below

2
On BEST ANSWER

An alternative method is to use a Bayesian approach (in which case, the interval estimate calculated is not a "confidence interval" but a "credible interval"). The idea is to treat the rate parameter $\lambda$ as a random variable. For observations $$X_i \sim \operatorname{Exponential}(\lambda)$$ the conjugate prior is Gamma distributed; i.e. the choice $$\lambda \sim \operatorname{Gamma}(a,b)$$ gives a posterior distribution for $\lambda$ that is also gamma: that is to say, $$\lambda \mid \boldsymbol x \sim \operatorname{Gamma}(a + n, b + n \bar x)$$ where $\boldsymbol x = (x_1, x_2, \ldots, x_n)$ is the sample (all distributions are parametrized by rate). Specifically, then, given the prior parameters $a, b$ that inform your "belief" about $\lambda$, and the observed sample $\boldsymbol x$, the posterior distribution which takes into account the data you observed, has the density function $$f(\lambda \mid \boldsymbol x) = \frac{(b + n \bar x)^{a+n} \lambda^{a+n-1} e^{-(b + n \bar x)\lambda}}{\Gamma(a + n)}.$$ Hence, we can construct a $100(1-\alpha)\%$ credible set in a number of ways. One way is to find the interval for $\lambda$ such that the tails of the posterior distribution contain $\alpha/2$ probability: that is, we need to find $\lambda_L < \lambda_U$ such that $$\int_{\lambda = 0}^{\lambda_L} f(\lambda \mid \boldsymbol x) \, d\lambda = \int_{\lambda_U}^\infty f(\lambda \mid \boldsymbol x) \, d\lambda = \frac{\alpha}{2}.$$ Another is to find the interval with highest posterior density (HPD) such that $f(\lambda_L \mid \boldsymbol x) = f(\lambda_U \mid \boldsymbol x)$ and $$\int_{\lambda = \lambda_L}^{\lambda_U} f(\lambda \mid \boldsymbol x) \, d\lambda = 1-\alpha.$$ As you can see, interval estimates are not unique and can be constructed in different ways and with different approaches. The advantage of the Bayesian method is that the real thing of interest you get is the posterior (and that this posterior is easily updated with new data). The downside is that you might not always know what to choose for the prior parameters.

0
On

A t-interval would be a very approximate procedure here. It is intended for use when the data are at least roughly normal, and the exponential distribution is very far from normal. (Such a procedure might be OK for really large samples.)

CI based on gamma distribution. Here is a better way: If $X_1, X_2, \dots, X_n$ are a random sample from $Exp(rate=\alpha)$ then $\alpha \bar X \sim Gamma(n, n).$ Let $g_L$ cut off probability 2.5% from the lower tail of this distribution and $g_U$ cut off 2.5% from its upper tail. Then $$P(g_L \le \alpha \bar X \le g_U) = P(g_L/\bar X \le \alpha \le g_U/\bar X) = 0.95.$$ Thus a 95% CI for $\alpha$, is $(g_L/\bar X,\; g_U/\bar X).$

For example, let $n = 20$ and $\bar X = 6.32.$ Then you can use R (or other statistical software) to obtain $g_L = 0.611,$ and $g_U = 1.484,$ so that the CI is $(0.097, 0.235).$ In R, the procedure was:

 x = rexp(20, .2)                      # generate fake data, rate = .2 
 mean(x)                               # find sample mean
 ##  6.322473
 qgamma(c(.025,.975), 20, 20)/mean(x)  # 'qgamma' is quantile fcn; 95% CI
 ##  0.09661188 0.23464596

In this case the data were generated to have $\alpha = .2,$ so we know the CI covers $\alpha.$ Most statistical software packages have the ability to find quantiles (inverse CDFs) of gamma distributions. (The Wikipedia 'exponential distribution' article has an equivalent formula using the chi-squared distribution, if you must use printed tables.)

Comparison with inferior t-interval. The "95%" t CI is $(3.638, 9.007)$ for $\mu = 1/\alpha$ and so $(0.111, 0.275)$ is the CI for $\alpha.$

But such intervals from the t distribution do not have an actual 95% confidence level because the distribution theory is incorrect. (The actual coverage probability depends on $n;$ for $n = 20,$ it is about 92% instead of 95%. Other values I approximated by simulation are: 88% for $n=5$; 93% for $n=50.$)

Addendum: From a theoretical point of view, $\bar X$ is a 'sufficient statistic' for estimating $\alpha.$ This means that the sample SD $S$ is providing less than optimal information about $\alpha.$ The t-interval uses both $\bar X$ and $S,$ and to the extent that $S$ influences the answer, the t-interval must be an inferior method. For small $n,$ the role of $S$ is very prominent. Notice that the method with the gamma distribution requires you to compute only $\bar X$ from the data; computing and using $S$ is not only extra work, it is counterproductive extra work.