Why aren't these two versions of a two-sample t-test the same?

32 Views Asked by At

I'm looking at two versions of a two-sample t-test that appear equivalent to me -- but when I crunch the numbers they don't seem to actually be equivalent. Consider the model $$\mathbf y = \beta_0 + \beta_1 \mathbf x$$

where $\mathbf x$ is a binary vector. So, for example, where $0$ indicates "female" and $1$ "male", we would have that $\beta_0$ is the mean response for females, and $\beta_1$ is the mean response for males minus that for females. We assume independent Gaussian errors.

So, we can test whether males and females have different means as follows: $$ \frac{\widehat\beta_1} {\mathrm{se}(\widehat\beta_1)} \sim t_{n-2} $$

Similary, we can consider the sample means of each of male and female: $\bar y_B$ and $\bar y_G$. These are normal random variables with sample variances respectively $\frac{s^2_B}{n_B}$ and $\frac{s^2_G}{n_G}$. Therefore $$ \bar y_B-\bar y_G\sim N\left(\mu_B-\mu_G,\frac{s^2_B}{n_B}+\frac{s^2_G}{n_G}\right)$$ And so under the hypothesis that $\mu_B=\mu_G$, we can test whether males and females have the same mean as follows: $$ \frac{\bar y_B-\bar y_G}{\sqrt{\frac{s^2_B}{n_B}+\frac{s^2_G}{n_G} } } \sim t_{n-2} $$ This is testing the same thing as the earlier test, and should be equivalent. I have already proven the numerators are equivalent -- that is, $\widehat\beta_1 = \bar y_B-\bar y_G$. But I cannot prove the denominators equivalent, and in fact when I try calculating the denominators for some test cases in R, I do not get the same values. Is something about my above thinking incorrect? If not, any pointers for how to prove the denominators of these two t-statistics equivalent?