Testing equality of variances of two populations

2.4k Views Asked by At

Suppose you are given two sets of data :

 DATA A: 2, 3, 4, 5, 6       (n=5)
 DATA B: 4, 2, 1, 2, 6, 4, 2 (n=7)

Is there any way to determine if standard deviation of A equals standard deviation of B?

2

There are 2 best solutions below

0
On BEST ANSWER

Your two samples are are as shown in the data vectors below. We can find their sample variances $S_A^2 = 2.5$ and $S_B^2 = 3.0,$ which are, respectively, unbiased estimates of population variances $\sigma_A^2$ and $\sigma_B^2.$

A = c(2, 3, 4, 5, 6);  B = c(4, 2, 1, 2, 6, 4, 2)
var(A);  var(B)
## 2.5
## 3

Then the question is whether 2.5 and 3.0 are sufficiently different to reject $H_0: \sigma_A^2 = \sigma_B^2$ against $H_a: \sigma_A^2 \ne \sigma_B^2,$ and thus declare $\sigma_A^2$ and $\sigma_B^2$ significantly different.

Testing equality of population variances.

Under $H_0,$ the ratio $F = S_A^2/S_B^2 = 2.5/3.0 = 0.8333$ is distributed according to Snedecor's F distribution with $5-1=4$ numerator degrees of freedom (df) and $7 - 1 = 6$ denominator df. [This distribution is also called the 'variance ratio' distribution.]

For a test at the 5% level, the lower critical value $F_L^* = 0.109$ cuts 2.5% of the area under the PDF from the left-hand tail of $F(4,6),$ and the upper critical value $F_U^* = 6.227$ cuts 2.5% of the area from its right-hand tail. Thus the two parts of the rejection region are to the left of 0.109 and to the right of 6.227. Our observed value $F = 0.8333$ falls between these two critical values, and so we cannot reject $H_0.$

 qf(.025, 4, 6)
 ## 0.1087274
 qf(.975, 4, 6)
 ## 6.227161

In the sketch of the PDF of $F(4,6)$ below, vertical red dotted lines show the two critical values, and the vertical green line shows the observed value of the F-statistic.

enter image description here

Unfortunately, this F-test has has rather poor power. That is, the two population variances may be different, but this may not be reflected in sufficiently different sample variances to give convincing 'proof' of inequality.

Here is what the variance test looks like in R statistical software; the P-value 0.8994 is considerably above .05 (indicating no rejection), and the CI includes 0.8333 as a 'reasonable value' of the variance ratio for samples of these sizes.

 var.test(A,B)

 ##        F test to compare two variances

 ## data:  A and B 
 ## F = 0.8333, num df = 4, denom df = 6, p-value = 0.8994
 ## alternative hypothesis: true ratio of variances is not equal to 1 
 ## 95 percent confidence interval:
 ## 0.1338223 7.6644259 
 ## sample estimates: 
 ## ratio of variances 
 ##          0.8333333 

Testing the equality of population means.

Finally, you say you want to know whether variances are equal in order to do the 'appropriate' test whether population means are equal. My guess is that you are trying to decide between a 'pooled' two-sample t test (which assumes population variances equal) and a Welch two-sample t test (which does not assume them equal).

My view is that the best statistical practice is to do the Welch test unless you have very solid prior information that population variances are equal. Simulation studies have shown that a 'hybrid test', which (1) checks for unequal variances with an F-test and (2) does the Welch test only when the F-test rejects, is not an optimal procedure (even if you can figure out the actual significance level for the two tests in tandem). It is best just to do the Welch ('separate variances') test from the start. [Of course, if your text or instructor has told you to do this, it is probably best to comply. But when you are done with the course, you can remember what best statistical practice is.]

For your data, the Welch two-sample t test fails to reject with P-value 0.3257. Notice that in R the Welch test is the default. (If you insist on doing the pooled test, you must use the parameter var.eq=T after giving the two data vectors.) Also, you could see that this is not a pooled test even if the header didn't say 'Welch': a pooled test would have df = 5 + 7 - 2 = 10; for 'insurance' against the possibility of unequal variances, you use a smaller df = 9.26 (which some software packages would round down to 9).

 t.test(A, B)

 ##        Welch Two Sample t-test

 ## data:  A and B 
 ## t = 1.0377, df = 9.26, p-value = 0.3257
 ## alternative hypothesis: true difference in means is not equal to 0 
 ## 95 percent confidence interval:
 ## -1.170565  3.170565 
 ## sample estimates:
 ## mean of x mean of y 
 ##         4         3 
11
On

You can find the variance $s^2$ of a sample $\{x_1,...,x_n\}$ using the following formula

$s^2=\frac{1}{n-1}\sum_{k=1}^{n}{(x_i-\bar{x})^2}$ where $\bar{x}$ is the sample mean.

Data A gives a variance of $2.5$, while data B gives a variance of 3