Confidence interval for mean using chi square

233 Views Asked by At

Among all the some problems I've practiced so far, I noticed that chi squared distribution is used only when finding the confidence interval around variance. Though there are other distributions for finding confidence intervals for mean, can we not use chi squared?? If yes, can someone give the proof for finding interval and if not, why??

1

There are 1 best solutions below

3
On

For normal data with unknown $\mu$ and $\sigma^2,$ one has $\frac{\bar X - \mu}{S/\sqrt{n}} \sim \mathsf{T}(\nu=n-1),$ where $n$ is sample size, $\bar X$ is sample mean, and $S$ is sample standard deviation. Thus Student's t dist'n is used to make a CI for $\mu.$ Specifically, a 95% CI for $\mu$ is of the form $\bar X \pm t^*\frac{S}{\sqrt{n}},$ where $t^*$ cuts probability 0.025 from the upper tail of $\mathsf{T}(n-1).$

Also, $\frac{(n-1)S^2}{\sigma^2} \sim \mathsf{Chisq}(\nu = n-1),$ so the chi-squared distribution is used to make a CI for $\sigma^2.$ Specifically, a 95% CI for $\sigma^2$ is of the form $\left(\frac{(n-1)S^2}{U},\,\frac{(n-1)S^2}{L}\right),$ where $L$ and $U$ cut probability 0.025 from the lower and upper tails, respectively, of $\mathsf{Chisq}(n-1).$ A 95% CI for $\sigma$ can be fount by taking square roots of endpoints of the CI for $\sigma^2.$

These CIs use 'sufficient statistics' so they are optimal in various ways. (Roughly speaking, all of the information about the sample necessary to make CIs is provided by $\bar X$ and $S.)$ Other distributions might conceivably be used to make CIs for $\mu$ or for $\sigma^2,$ but that would be some trouble and results would not be as good. (Roughly speaking, in order to get the same 'coverage probabilities', other styles of CIs would have to be longer.)

However, you are correct to suspect that other kinds of CIs are sometimes appropriate.

  • For data that are not normal, there are better forms of CIs depending on the population distribution. For example, a gamma distribution is used to get a 95% CI for the population mean $\mu$ of the exponential distribution from which data were randomly sampled.

  • For data from a population with an unknown distribution, one sometimes uses nonparametric bootstrap procedures to get a CI for the population mean.

Numerical examples with sampling and computations of CIs in R.

Normal: 50 observations from $\mathsf{Norm}(\mu=100, \sigma=15).$

set.seed(710)
 x = round(rnorm(50, 100, 15), 2)
summary(x)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  71.49   89.99   98.69   99.36  110.29  121.72 
sd(x)
[1] 13.52819

pm = c(-1,1)
ci.mu = mean(x) + pm*qt(.975, 99)*sd(x)/10
ci.mu
[1]  96.67751 102.04609   # 95% CI for pop mean
ci.vr = 99*var(x)/qchisq(c(.975,.025),99)
ci.vr
[1] 141.0831 246.9726     # 95% CI for pop variance
ci.sd = sqrt(ci.vr)
ci.sd
[1] 11.87784 15.71536     # 95% CI for pop std devn

Exponential: 50 observations from $\mathsf{Exp}(\mathrm{rate}=0.01),$ population mean $\mu = 100.$

set.seed(1776)
y = round(rexp(50, 1/100), 2)
summary(y)
    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    0.62   27.28   63.22  103.56  174.51  391.67 
ci = mean(x)/qgamma(c(.975,.025),100,100)
ci
[1]  82.43812 122.12012   # 95% CI for population mean

"Unknown" population: Actually, 50 observations simulated from a gamma distribution with mean $\mu = 50 $, but pretending we don't know the population distribution. One style of 95% nonparametric bootstrap confidence interval is illustrated. This is a basic 'quantile' bootstrap. (Perhaps more a sophisticated style of bootstrap CI would be better for this skewed distribution.)

set.seed(710)
w = round(rgamma(50, 5, 1/10), 2)
summary(w)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  14.93   39.58   48.06   49.37   61.20   83.51 
re.av = replicate(2000, mean(sample(w, 50, rep=T)))
# resampling groups of 50 observations with replacement 
# from the sample and finding the mean of each 're-sample'.
quantile(re.av, c(.025,.975))
2.5%    97.5% 
45.00109 53.80952       # 95% nonparametric bootstrap CI