I am doing an exercise in inference theory which involves finding a confidence interval for a difference in expectations. I have two groups $A$ and $B$, of let's say patients, and we measure each groups blood sugar levels. We assume these groups each correspond to a random variable $X_A$ and $X_B$, distributed as $N(\mu_A, \sigma_A^2)$ and $N(\mu_B, \sigma_B^2)$ respectively.
What we want is to find a confidence interval $I_{\mu}$ on significance level $\alpha$ for $\mu = \mu_A - \mu_B$, such that $$1-\alpha = P(h(\mu, \hat{\mu}) \in I) = P\left(h^{-1}_{\hat{\mu}}(h(\mu, \hat{\mu})) \in h^{-1}_{\hat{\mu}}(I)\right) = P(\mu \in I{\mu}).$$
I start by estimating $\mu$ by $\hat{\mu} = \hat{\mu_A} - \hat{\mu_B}$ and $\sigma^2$ by the pooled variance, $$\hat{\sigma}^2 = \frac{\hat{\sigma}^2_A(n_A-1)+\hat{\sigma}^2_B(n_B-1)}{(n_A-1)+(n_B-1)}=\frac{\sum_{i=1}^{n_A}(x_i-\bar{x})^2 \ +\sum_{j=1}^{n_B}(y_i-\bar{y})^2}{n_A + n_B - 2}$$ where there are $n_A$ and $n_B$ measurements from each group.
Regarding the pivot random variable, I reasoned that $\hat{\mu}$ must have a normal distribution because a linear combination of normally distributed random variables is also normally distributed and the expectation, $E(\cdot)$, as an operator is linear. How do I formalize this? I read somewhere that this is $t$-distributed but don't know (yet) what a $t$-distribution. When I google on the $t$-distribution it looks like a normal distribution but with less variance. What is the connection?
All the help is much appreciated, thank you in advance,
Isak
Note that $$ \bar{X}_A - \bar{X}_B \sim N(\mu, \sigma^2/n_A + \sigma^2/n_B), $$ hence, if you know the value of $\sigma$, you have (for some given sample size $(n_A, n_B))$ $$ 0.95 = \mathbb{P}\left(Z_{0.025} < \frac{ \bar{X}_A - \bar{X}_B - \mu}{\sqrt{\sigma^2/n_A + \sigma^2/n_B} }<Z_{0.95}\right), $$ however, as you estimate $\sigma^2$, you are dividing $\bar{X}_A - \bar{X}_B - \mu$ by a random variable and not a constant. I'll left you as an exercise to figure out why $$ \frac{\sum_{}^{n_A}(X_i-\bar{X}_{n_A})^2 + \sum_{}^{n_B}(X_i-\bar{X}_{n_B})^2}{\sigma^2} \sim \chi^2(n_A + n_B - 2). $$
Now, one of the definitions of $t$ distribution is $$ t_{(n)} \equiv \frac{Z}{\sqrt{\chi^2_{(n)}/n}}, $$ namely a standard normal (Gaussian) random variable divided by a square root of a portion of Chi squared r.v. by its degrees of freedom. Hence, just fill the gaps to get $$ 0.95 = \mathbb{P}\left(t_{0.025}^{(n_A + n_B - 2)} < \frac{ \bar{X}_A - \bar{X}_B - \mu}{\hat{\sigma} } < t_{0.95}^{(n_A + n_B - 2)}\right). $$