Constructing a confidence interval for population variance

1.8k Views Asked by At

Given the following data set, how does one construct a 99% confidence interval for the population variance? I should mention now I don't have a very good understanding of when to divide by n and when to divide by (n-1).

22.2    24.7    20.9    26.0    27.0
24.8    26.5    23.8    25.6    23.9

I've already tried a couple approaches, none seem to give the correct answers. The first was to find the mean of the samples(sum divided by number of samples), giving $$\bar X = \frac{\sum X_n}{n} = 24.54$$ Then obtaining the variance of the samples as $$s^2 = \frac{\sum(X_n-\bar X)^2}{n} = 3.2924$$ Given that there are 9 degrees of freedom, and I'm looking for 99% confidence for a two-sided interval, I used a $\chi^2$ table to find values of $\chi^2$ of 1.73 and 23.59. Then the following values using the following formula. $$\frac{(n-1)s^2}{\chi_1^2} \le \sigma^2 \le \frac{(n-1)s^2}{\chi_2^2}$$ $$1.256... \le \sigma^2 \le 17.128...$$ Upon seeing that this was wrong, I tried to obtain mean and variance by dividing by (n-1) instead of n. This gave. $$\bar X = 11.919...$$ $$s^2 = 3.292...$$ Then, using the same $\chi^2$ values as above, I got the following for a confidence interval $$4.547... \le \sigma^2 \le 62.002...$$ This was also wrong. I cannot tell why these would both be wrong, whether I should have divided by n or (n-1), and how I would obtain the correct answer. Does anyone know where I went wrong here?

1

There are 1 best solutions below

2
On

First, let's get the notation and definitions right; The sample mean $\bar X = \frac 1n\sum_{i=1}^n X_i.$ If the population mean $\mu$ is unknown and estimated by $\bar X,$ then the population variance $\sigma^2$ is estimated by the sample variance $S^2 = \frac{1}{n-1}\sum_{i=1}^n (X_i - \bar X)^2.$ Then $$\frac{(n-1)S^2}{\sigma^2} = \frac{\sum_{i-1}^n(X_i - \bar X)^2}{\sigma^2} \sim \mathsf{Chisq}(df = n-1).$$

For your dataset the statistics are:

x = c(22.2, 24.7, 20.9, 26.0, 27.0, 24.8, 26.5, 23.8, 25.6, 23.9)
n = length(x);  a = mean(x);  s = sd(x)
n;  a;  s
## 10           # sample size
## 24.54        # sample mean
## 1.912648     # sample SD

Then 95% confidence interval for the population variance $\sigma^2$ is obtained as $$((n-1)S^2/U,\, (n-1)S^2/L),$$ where $L$ and $U$ cut 2.5% of the probability from the lower and upper tails, respectively, of $\mathsf{Chisq(n-1)}.$ Computations of CIs for $\sigma^2$ and $\sigma$ in R statistical software follow:

UL = qchisq(c(.975, .025), n - 1);  UL
##  19.022768  2.700389
CI = (n-1)*s^2 / UL;  CI
##  1.730768 12.192315   95% CI for pop var
sqrt(CI)
##  1.315587 3.491750    95% CI for pop SD

Notice that $S = 1.913$ is contained in the CI for $\sigma$ as it must be, but that $S$ is not at the center of the CI, because the chi-squared distribution is skewed.

I assume you can use the appropriate quantiles of $\mathsf{Chisq}(9)$ to get 99% confidence intervals.

Addendum per Comments for 99% CIs: Of course, 99% confidence intervals have to be longer than 95% CIs.

 UL = qchisq(c(.995, .005), n - 1);  UL
 ##  23.589351  1.734933  # same as you showed in your question
 CI = (n-1)*s^2 / UL;  CI
 ##  1.395715 18.977103   # using correct numerator, this is different
 sqrt(CI)
 ## 1.181404 4.356272