Two sample test - distribution of pooled variance estimator

62 Views Asked by At

I am attending a statistics course this semester and although it is offered by the math department the precise assumptions underlying the main theorems are not provided, let alone the proofs. That being said, I would like to get a reference or at least discuss the proof of the following fact:

Suppose that we are given two normal iid samples $(X_{11},\ldots,X_{1n_1})$ and $(X_{21},\ldots,X_{2n_2})$ with standard derivations $\sigma_i$ and means $\mu_i$ and set $$S^2_i:=S^2_{X_i}:=S_{X_iX_i}:=\sum_{j=1}^{n_i}(X_{ij}-\bar{X}_i)^2$$ as well as$$S^2:=S_1^2+S_2^2$$ and finally $\nu:=n_1+n_2-2$. Then it is claimed that $$T:=\frac{(\bar{X}_1-\bar{X}_2)-(\mu_1-\mu_2)}{\sqrt{\frac{S^2/\nu}{n_1}+\frac{S^2/\nu}{n_2}}}\sim t_\nu$$ Note that if $\bar{X}_1\perp\bar{X}_2$ then$^1$ $$Z:=\frac{(\bar{X}_1-\bar{X}_2)-(\mu_1-\mu_2)}{\sqrt{\frac{\sigma^2}{n_1}+\frac{\sigma^2}{n_2}}}\sim N(0,1)$$ and it can easily be shown that $$T=\frac{Z}{\sqrt{W/\nu}}$$ with $W=S^2/\sigma^2$, so perhaps we can show that

  1. $W_1:=S_1^2/\sigma^2$ and $W_2:=S_2^2/\sigma^2$ are independent, i.e. $W_1\perp W_2$ and

  2. $W\perp Z$

as this would yield the desired result.$^{2}$ Is this indeed the usual strategy? Where can I look this up?


$^1$ If $(X_1,\ldots,X_n)$ is iid with $X_i\sim N(\mu,\sigma^2)$, then $\bar X\sim N(\mu,\sigma^2/n)$ and hence $$\bar X_1-\bar X_2\sim N(\mu_1-\mu^2,\sigma_1^2/n_1+\sigma_2^2/n_2)$$ (just look up "linear combination of independent normal variables").

$^2$ It is well known that $W_i\in\chi^2(n_i)$ and hence $W_1+W_2\in\chi^2(n_1+n_2)$ if $W_1\perp W_2$. Furthermore $Z/\sqrt{W/n}\in t_n$ if $Z\in N(0,1)$, $W\in\chi^2(n)$ and $W\perp Z$.

1

There are 1 best solutions below

7
On BEST ANSWER

The pooled variance assumption means that $\sigma_1=\sigma_2=\sigma$.

Since $(\overline X_1,S_1^2)$ is a function of the first sample and $(\overline X_2,S_2^2)$ is a function of the second sample, we have $$(\overline X_1,S_1^2)\perp \!\!\! \perp (\overline X_2,S_2^2).$$

Since each sample is normally distributed, it also holds (a standard fact) that $\overline X_1 \perp \!\!\! \perp S_1^2$ and $\overline X_2 \perp \!\!\! \perp S_2^2$, therefore $\overline X_1, \overline X_2 , S_1^2 ,S_2^2$, are mutually independent.

You stated rightly that $\overline X_1-\overline X_2\sim N\big(\mu_1 - \mu_2,\sigma^2(1/n_1+1/n_2)\big)$, thus $$Z:=\frac{\overline X_1-\overline X_2- (\mu_1 - \mu_2)}{\sigma (1/n_1+1/n_2)^{1/2}}$$ is standard normal.

Since each sample is normally distributed, $\frac{S_1^2}{\sigma^2} \sim \chi^2_{n_1-1}$ and $\frac{S_2^2}{\sigma^2} \sim \chi^2_{n_2-1}$, thus by independence of $S_1^2$ and $S_2^2$ $$U:=\frac{S_1^2+S_2^2}{\sigma^2} \sim \chi^2_{n_1+n_2-2}.$$

Since $(\overline X_1, \overline X_2)\perp \!\!\! \perp (S_1^2,S_2^2)$, $Z$ is independent of $U$, therefore $$\frac{Z}{\sqrt{U/(n_1+n_2-2)}} \sim t_{n_1+n_2-2}.$$

The unknown $\sigma$'s in the numerator and denominator cancel out and you're left with the coveted statistic.