How to compute confidence interval for variance with unknown mean from a normal $(a,\sigma ^2)$ sample?

1k Views Asked by At

When mean is known, we note that $\frac{\bar{(X-a)^2}n}{\sigma ^2}$ has a Chi-squared distribution with $n$ degrees of freedom. However, what to do when $a$ is unknown? I can't just substitute sample mean instead of $a$. I haven't found anything on the site or the Internet covering this topic.

1

There are 1 best solutions below

0
On BEST ANSWER

Some confusion here:

  • The notation in the first line of your question is garbled. Either you need to have $\sum_{i=1}^n (X_i - a)^2,$ if you have $n$ observations, or you need $n=1$ if $X$ is your only observation. I suppose this is why @callculus asked you to post the entire question.

  • Also, the (first) comment of @minusonetwelfth overlooks that the population mean $\mu$ is not unknown and estimated by $\bar X,$ but $\mu = a$ is given.

When the sample mean $\mu - a$ is known the estimate of variance $\sigma^2$ is $V = \frac 1 n \sum_{i=1}^n (X_i - a)^2$ and $\frac{nV}{\sigma^2} \sim \mathsf{Chisq}(n).$

Thus, by the 'pivot' method, a 95% CI form $\sigma^2$ is of the form $$\left(\frac{nV}{U}, \frac{nV}{L}\right),$$ where $L$ and $U$ cut probability from the lower and upper tails, respectively, of $\mathsf{Chisq}(n).$


Example: Let x be a vector of $n=10$ observations taken at random from $\mathsf{Norm}(\mu=100, \sigma=15),$ where we are taking $\mu = 100$ to be known, and using the observations in x to give an interval estimate of $\sigma^2 = 225.$ (I'm using R statistical software.)

set.seed(516)
x = round(rnorm(10, 100, 15),2)
x
 [1]  84.80  85.72  78.11  95.17 107.85 108.01 122.61 111.15 111.46  79.67
stripchart(x, pch="|")

enter image description here

V = sum((x-100)^2)/10
V
[1] 224.3417
qchisq(c(.025,.975), 9)
[1]  2.700389 19.022768
CI = n*V/qchisq(c(.975,.025), 9)
CI
[1] 117.9333 830.7754

So a 95% CI for $\sigma^2$ is $(117.93, 830.78).$ The confidence interval may seem very long, but there isn't much information about the variance in only $n= 10$ observations. And the 95% confidence interval does cover the value 225. [So our example falls into the 'lucky' 95% of the time that the confidence interval covers (contains) $\sigma^2.]$

Notes: (1) It is easy to see that $Q = \frac{1}{\sigma^2}\sum_{i=1}^n(X_i = \mu)^2 \sim \mathsf{Chisq}(n).$ We can write $Q = \sum_{i=1}^n \left(\frac{X_i - \mu}{\sigma}\right)^2 = \sum_{i=1}^n Z_i^2,$ where the $Z_i$ are independently standard normal. A chi-squared random variable with $\nu = n$ is defined as the sum of squares of $n$ independent standard normal random variables.

(2) It is not so easy to show that $W=\frac{1}{\sigma^2}\sum_{i=1}^n(X_i - \bar X)^2 \sim \mathsf{Chisq}(n-1).$ A formal proof uses (a) an $n$-variate orthogonal transformation for which a one-dimensional marginal is related to $\bar X$ and the remaining $n-1$ dimensions are related to $S^2$ or (b) an argument using moment generating functions.

The simulation in R below takes $B=100\,000$ samples of size $n=5$ from $\mathsf{Norm}(\mu=100, \sigma=15)$ and computes $W = (n-1)S^2/\sigma^2$ for each sample. A histogram of the $B$ values of $W$ closely matches the density function of $\mathsf{Chisq}(4)$ [solid red], but not the density of $\mathsf{Chisq}(5)$ [dashes].

set.seed(2020)
w = replicate(10^5,  4*var(rnorm(5,100,15))/15^2)
hist(w, prob=T, br=50, col="skyblue2")
  curve(dchisq(x,4), add=T, col="red", lwd=2)
  curve(dchisq(x,5), add=T, col="brown", lwd=2, lty="dashed")

enter image description here