Confidence intervals for a ratio of normal variances?

862 Views Asked by At

I was studying the F-distribution which says that given a pair of samples and their sizes and knowing the sample variances, we can compute the confidence interval of the ratio of true variances as the F statistic then follows the F distribution.

What I was wondering was does it make sense to know the ratio of the true variances and then estimate the "confidence interval" for the sample variances, i.e. that the ratio of the sample variance has a range centred at the ratio of true variances in which it lies 95% of the time for any samples of those sizes? Does such a computation make sense or have a meaning? Clearly if I put in the values then the sample variance can be associated with a range based on an F distribution, so it is possible. But does it make any sense?

1

There are 1 best solutions below

0
On

Let $X_{11}, X_{12}, \dots, X_{1,m}$ be a random sample of size $m$ from $\mathsf{Norm}(\mu_1, \sigma_1).$ and let $X_{21}, X_{22}, \dots, X_{2.,n}$ be a random sample of size $n$ from $\mathsf{Norm}(\mu-2, \sigma_2).$

Then $S_1^2 = \frac{1}{m-1}\sum_{i=1}^m (X_{1i} - \bar X_1)^2,$ where $\bar X_1 = \frac 1m\sum_{i=1}^m X_i,$ is an unbiased estimator of $\sigma_1^2.$ Similarly for $S_2^2$ and $\sigma_2^2.$

Then one can show that $$\frac{S_2^2/\sigma_1^2}{S_2^2/\sigma_1^2} = \frac{S_1^2/S_2^2}{\sigma_.^2/\sigma_2^2} = \frac{R}{\psi} \sim \mathsf{F}(m-1,n-1),$$ where $R = S_1^2/S_2^2$ and $\psi = \sigma_1^2/\sigma_2^2.$

Then $P(L < R/\psi < U) = 0.95,$ where $L$ and $U$ cut probability $0.025,$ respectively from the lower and upper tails of the distribution $\mathsf{F}(m-1, n-2).$ [Values $R$ and $L$ can be obtained from printed tables of Snedecor's F distribution, but it is often more convenient to use software such as R or a statistical calculator.]

Upon pivoting (solving to isolate $\psi),$ we have $(R/U < \psi < R/L) = 0.95,$ so that a 95% confidence interval for $\psi = \sigma_1^2/\sigma_2^2$ is of the form $(R/U, R/L) = \left(\frac{S_1^2/S_2^3}{U},\, \frac{S_1^2/S_2^2}{L}\right).$

Example: Suppose we have samples of size $m = 10,n = 20$ from populations with $\sigma_1^2 = 25,\sigma_2^2 = 36,$ $\psi = 25/36 =0.6944.$

Then from the data with $R = 0.4497,$ a 95% CI for $\psi$ is $(0.156, 1.656),$ which does happen to cover the true value $\psi = \sigma_1^s/\sigma_2^2 = 0.6944.$

set.seed(2021)
x1 = rnorm(10, 50, 5)
x2 = rnorm(20, 60, 6)
v1 = var(x1); v2 = var(x2)
v1; v2;  v1/v2
[1] 22.36833
[1] 49.7425
[1] 0.4496825

CI = (v1/v2) / qf(c(.975,.025), 9, 19);  CI
[1] 0.1561369 1.6563326

Notes: (1) Even though $\psi < 1$ so that $\sigma_1^2 < \sigma_2^2$ and $R = 0.4497 < 1,$ the sample sizes are not large enough to yield a CI short enough to exclude $1.$

(2) The CI shown above is included in the R procedure var.test.

 var.test(x1, x2)

        F test to compare two variances

 data:  x1 and x2
 F = 0.44968, num df = 9, denom df = 19, p-value = 0.2197
 alternative hypothesis: true ratio of variances is not equal to 1
 95 percent confidence interval:
   0.1561369 1.6563326
sample estimates:
ratio of variances 
         0.4496825