When to use Pooled Variance?

7.5k Views Asked by At

In the following examples, why is pooled variance used in the first video but not in the other? When do you use pooled variance?

  1. https://classroom.udacity.com/courses/ud257/lessons/4018018619/concepts/40043987120923

  2. The last example in this page: https://newonlinecourses.science.psu.edu/stat414/node/306/

1

There are 1 best solutions below

0
On

This seems to be about two-sample tests for equality of normal means.

Use Welch by default. Ordinarily, the default procedure should be to use the Welch t test, which does not assume that population variances are equal. Accordingly, the denominator of the t statistic uses separate sample variances. Also, the number $\nu$ of degrees of freedom of the null t distribution is approximated from sample sizes and sample variances. The approximated $\nu^\prime$ always has $$\min(n_1-1, n_2-1) \quad \le \quad \nu^\prime \quad \le \quad n_1+n_2-2,$$ where $n_1, n_2$ are the respective sample sizes and $\nu^\prime$ takes smaller values if sample variances differ greatly and larger values if sample variances are nearly the same.

For moderate $n_1$ and $n_2,$ the significance level of a test intended to be at the 5% level has actual significance level very nearly 5%. Also, the power of the Welch test is not noticeably smaller than for a corresponding pooled two-sample t test. With little or no penalty attached to the Welch test, it should almost always be used instead of the pooled test.

'Hybrid test' deprecated. In particular, the 'hybrid' test, in which one first uses an F test to assess whether population variances are equal, then second uses the pooled t test if so and the Welch test if not, has been deprecated. This is partly because of a difficulty in knowing the true P-value of the 'hybrid' test, but more importantly because the hybrid test has poor properties. (The F-test has relatively low power and it often causes branching in the wrong direction at stage two.)

Excuses for restricted use of pooled test. In careful statistical practice, the pooled t test has almost fallen into disuse. Principal exceptions in which the pooled test are still used seem to be:

(a) When prior knowledge or experience with similar data seem to provide assurance that the two population variances are truly equal.

(b) When both sample sizes are very small, and it is thought inadvisable to lower the power even a little by using $\nu^\prime$ instead of $\nu = n_1+n_2 - 2.$

(c) In elementary statistics classes, where it is thought that the extra burden of computing $\nu^\prime$ is an intolerable distraction.

Notes: (1) In R statistical software, the Welch test is the default in the procedure t.test so that one must use the parameter var.eq=T in order to do the pooled test.

(2) In a balanced design with $n_1=n_2,$ it is easy to show that the T statistics for the Welch and pooled test are numerically identical, so that the only difference between the two tests is the degrees of freedom.

(3) If $n_1, n_2 > 30$ and one is testing at the 5% level of significance, then $\nu^\prime \ge 28$ and there is little practical difference between the Welch and pooled test.

(4) Especially problematic is the use of the pooled test when $\sigma_1^2 > \sigma_2^2$ and $n_1 < n_2.$ When inequalities are large, results from the pooled test can be quite misleading. Consider the following simulation where $\sigma_1 = 20, \sigma_2 = 5,$ $n_1 = 10, n_2 =40,$ and $\mu_1 = \mu_2,$ so that $H_0$ is true.

set.seed(1119)
sg1=20;  sg2=5; n1=10; n2 = 40; mu1 = mu2 = 100
pv = replicate(10^5, t.test(rnorm(n1,mu1,sg1),
               rnorm(n2,mu2,sg2), var.eq=T)$p.value)
mean(pv < .05)
[1] 0.29556

Then the true significance level of the pooled test is almost 30% for the 100,000 simulated datasets, where one expects 5%.

By contrast, if the simulation is re-done for Welch t tests, then the p-value is very nearly 5%:

...
mean(pv < .05)
[1] 0.04966