To estimate mean of two populations.

32 Views Asked by At

Suppose that you are given with two set of sample $F = \{ x_1, x_2,\ldots,x_n\}$ and $M=\{y_1, y_2,\ldots, y_m\}$ ($n=100,$ $m=100$)

$x_i$ (or $y_i$) represents the height of a female (or male) individual.

Sample mean of $F$: $m_F = 169.1$

Sample mean of $M$: $m_M = 170.5$

Sample standard deviation of $F$: $S_F = 5$

Sample standard deviation of $M$: $S_M = 6$

Q) Provide a 95% confidence interval of the mean of total (male+female) population.

Do I have to calculate the covariance for this problem? How can I estimate the interval in this case? Is the below equation used?

$$T=\frac{\bar{X}-\mu}{{S/\sqrt{n}}}$$

1

There are 1 best solutions below

0
On

You have $m_F\sim \operatorname N\left(\mu_F, \dfrac{\sigma^2} n \right)$ and $m_M \sim\operatorname N\left( \mu_M, \dfrac{\sigma^2} m \right),$ and therefore $$ m_F-m_M \sim\operatorname N\left( \mu_F-\mu_M, \sigma^2\left( \frac 1 n + \frac 1 m \right) \right), $$ and therefore $$ \frac{(m_F-m_M) - (\mu_F-\mu_M)}{\sigma\sqrt{\frac 1 n + \frac 1 m}} \sim \operatorname N(0,1). $$ Now let $S^2_F = \dfrac 1 {n-1} \sum_{i=1}^n (X_i-\overline X)^2$ and $S_M = \dfrac 1 {m-1} \sum_{i=1}^m (Y_i - \overline Y)^2.$

Then $(n-1)S_F^2/\sigma^2 \sim \chi^2_{n-1}$ and $(m-1)S_M^2/\sigma^2 \sim \chi^2_{m-1}$ and these two are independent, so their sum is distributed as $\chi^2_{n+m-2}.$

Moreover, their sum is independent of $m_F$ and $m_M.$ To see that, find $\operatorname{cov}(X_i - \overline X, \overline X)$ and use basic facts about the normal distribution and independence.

Therefore $$ \frac{\left( \frac{(m_F-m_M) - (\mu_F-\mu_M)}{\sigma\sqrt{\frac 1 n + \frac 1 m}} \right)}{\sqrt{\left.\frac{(n-1)S_F^2 + (m-1)S_M^2}{\sigma^2}\right/(n+m-2)}} \sim t_{n+m-2}. $$ And then the $\sigma$ cancels and we get $$ \frac{\left( \frac{(m_F-m_M) - (\mu_F-\mu_M)}{\sqrt{\frac 1 n + \frac 1 m}} \right)}{\sqrt{\left.\left((n-1)S_F^2 + (m-1)S_M^2\right) \right/(n+m-2)}} \sim t_{n+m-2}. $$ If this has a specified probability of falling within a specified interval, then then inequality that says it is in a specified interval can be solved for $\mu_F-\mu_M$ to get a confidence interval for $\mu_F-\mu_M.$

And the corresponding hypothesis test is conventionally called a two-sample t-test.