I mostly see this formula when searching for a formula for the estimate of the standard error in difference between two means, and it is also used in this video. $$\Delta=\sqrt{s_1^2/N_1+s_2^2/N_2}$$ But I've also seen this one (and this is the one my book uses): $$\Delta'=\sqrt{\dfrac{\left(N_1-1\right)s_1^2+\left(N_2-1\right)s_2^2}{N_1+N_2-2}\left(\dfrac{1}{N_1}+\dfrac{1}{N_2}\right)}$$ As these are two very different formulas, how come they are used seemingly interchangeably?
Two different formulas for standard error of difference between two means
10.4k Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail AtThere are 2 best solutions below
On
There are two different versions of the two-sample t test in common usage.
Pooled. The assumption, often unwarranted in practice, is made that the two populations have the same variance $\sigma_1^2 = \sigma_2^2.$ In that case one seeks to estimate the common population variance, using both of the sample variances, to obtain what is called a pooled estimate $s_p^2$.
If the two sample sizes are equal, $n_1 = n_2,$ then this is simply $(s_1^2 + s_2^2)/2.$ But if sample sizes differ, then greater weight is put on the sample variance from the larger sample. The weights use the degrees of freedom $\nu_i = n_i - 1)$ instead of the $n_i.$ The first factor under the radical in your $\Delta^\prime$ is $s_p^2.$ Under the assumption of equal population variances, the standard deviation of $\bar X_1 - \bar X_2$ (estimated standard error) is your $\Delta^\prime$.
Consequently, the $T$-statistic is $T = (\bar X_1 - \bar X_2)/\Delta^\prime$. Under the null hypothesis that population means $\mu_1$ and $\mu_2$ are equal, this $T$-statistic has Student's T distribution with $n_1 + n_2 - 2$ degrees of freedom.
Separate variances (Welch). The assumption of equal population variances is not made. Then the variance of $\bar X_1 - \bar X_2$ is $\sigma_1^2/n_1 + \sigma_2^2/n_2.$ This variance is estimated by $s_1^2/n_1 + s_2^2/n_2.$ So the (estimated) standard error is $\Delta = \sqrt{s_1^2/n_1 + s_2^2/n_2}.$ So your first formula is has typos and is incorrect. This may account for "ludicrous" difference you are getting. If $n_1 - n_2$, then you should get $\Delta = \Delta^\prime.$ But the two (estimated) standard errors will not necessarily be equal if sample sizes differ.
An crucial difference between the pooled and Welch t tests is that the Welch test uses a rather complicated formula involving both sample sizes and sample variances for the degrees of freedom (DF). The Welch DF is always between the minimum of $n_1 - 1$ and $n_2 - 1$ on the one hand and $n_1 + n_2 - 2$ on the other. So if both sample sizes are moderately large both $T$-statistics will be nearly normally distributed when $\mu_1 = \mu_2.$ The Welch $T$-statistic is only approximate, but simulation studies have shown that it is a very accurate approximation over a large variety of sample sizes (equal and not) and population variances (equal or not).
The current consensus among applied statisticians is always to use the Welch t test and not worry about whether population variances are equal. Most statistical computer packages use the Welch procedure by default and the pooled procedure only if specifically requested.
In both scenarios $\sigma_{1}$ and $\sigma_{2}$ are unknown. The bottom formula is using the assumption that $\sigma_{1} = \sigma_{2}$ and attempting to estimate that shared variance by pooling all observations together and calculating a weighted mean. Thus, the factor on the left plays the role of both $s_{1}^{2}$ and $s_{2}^{2}$ in the bottom equation. This method is usually used when you have small sample sizes and the equal variance assumption is plausible.