Problem statement: Let $X_1, \ldots, X_n$ and $Y_1, \ldots, Y_m$ be two independent random samples from a Bernoulli distribution with success probability $p$. Denote the sample means $$\overline{X}_n = \frac{1}{n} \sum_{i=1}^n X_i, \qquad \overline{Y}_m = \frac{1}{m} \sum_{j=1}^m Y_j. $$ Define the estimators $$ T_{n,m,1} = \frac{\overline{X}_n + \overline{Y_m}}{2}, \qquad T_{n,m,2} = \frac{ \sum_{i=1}^n X_i + \sum_{j=1}^m Y_j}{n+m} $$ for the unknown parameter $p$.
a) Investigate which of the two estimators is preferable?
b) Derive and approximate $100 \times (1-\alpha) \%$ confidence interval for $p$, based on the estimator $T_{n,m,2}$.
My attempt: a) We first check the bias of these estimators. We have \begin{align*} b_p (T_{n,m,1}) := E_p(T_{n,m,1}) - p &= \frac{1}{2} \bigg( E_p(\overline{X}_n ) - E_p(\overline{Y_m}) \bigg) - p \\ &= \frac{1}{2} \bigg( \frac{1}{n} \sum_{i=1}^n E_p(X_i) + \frac{1}{m} \sum_{i=1}^m E_p(Y_i) \bigg) - p \\ &= \frac{1}{2} \left( p + p \right) - p = 0. \end{align*} Similarly, we have \begin{align*} b_p(T_{n,m,2}) &= \frac{1}{n+m} \bigg( E_p \left( \sum_{i=1}^n X_i \right) + E_p \left( \sum_{j=1}^m Y_j \right) \bigg) - p \\ &= \frac{1}{n+m} \left( np + mp \right) - p = 0. \end{align*} So both estimators are unbiased. We now compare the mean squared errors. We have (since there is no bias) that \begin{align*} MSE_p (T_{n,m,1} ) = \text{Var}_p(T_{n,m,1}) = \frac{1}{4} \left( \text{Var}_p (\overline{X}_n )+ \text{Var}_p(\overline{Y}_m) \right). \end{align*} Here we used the fact that $X_1, \ldots, X_n$ and $Y_1, \ldots, Y_m$ are independent samples. This means the random variables $\overline{X}_n$ and $\overline{Y}_m$ are also independent. Hence \begin{align*} MSE_p (T_{n,m,1} ) = \frac{1}{4} \left( \frac{1}{n} \text{Var}_p (X_i) + \frac{1}{m} \text{Var}_p (Y_j) \right) &= \frac{1}{4} \left( \frac{p(1-p)}{n} + \frac{p(1-p)}{m} \right) \\ &= \frac{p(1-p)}{4} \left( \frac{1}{n} + \frac{1}{m} \right) . \end{align*} For the other estimator, a similar calculation yields \begin{align*} MSE_p(T_{n,m,2}) &= \frac{1}{(n+m)^2} \left( \sum_{i=1}^n \text{Var}_p (X_i) + \sum_{j=1}^m \text{Var}_p(Y_j) \right) \\ &= \frac{1}{(n+m)^2} \left( np(1-p) + mp(1-p) \right) = \frac{p(1-p)}{n+m}. \end{align*} We now compare these two MSE's. In the case $m=n$, it is obvious that $MSE_p(T_{n,m,1}) = MSE_p(T_{n,m,2})$.
Suppose now $m \neq n$, say $n >m $. How do I find out which estimator is preferable then? I think $MSE_p(T_{n,m,1}) > MSE_p(T_{n,m,2})$ because $$ \frac{1}{4} (\frac{1}{n} + \frac{1}{m}) > \frac{1}{n+m}. $$ But I'm not sure how to prove this inequality.
b) For the confidence interval, I'm not sure. I know that $$ \sum_{i=1}^n X_i + \sum_{j=1}^m Y_j \sim Binomial(n+m,p). $$ But how do I use this information to construct a confidence interval?
Thank you in advance for any help.
Consider the expression $$\begin{align*} \frac{1}{4}\left(\frac{1}{m}+\frac{1}{n}\right) - \frac{1}{m+n} &= \frac{m+n}{4mn} - \frac{1}{m+n} \\ &= \frac{(m+n)^2 - 4mn}{m+n} \\ &= \frac{(m-n)^2}{4mn}. \end{align*}$$ Since $m, n > 0$, the denominator is always positive, and the numerator, being the square of a real number, is never negative, and is zero if and only if $m = n$. Therefore, the mean square error of the first estimator is always as least as large as the MSE of the second estimator, with equality occurring when the sample sizes are the same. This makes sense because the estimators are equivalent when $m = n$.
Regarding the derivation of an approximate $100(1-\alpha)\%$ confidence interval, what you can do is construct a Wald-type interval based on a normal approximation to the binomial. That is to say, approximate the sampling distribution of the estimator $\hat p = T_{n,m,2}$ with a suitable normal distribution whose mean and variance are equal to the mean and variance of the sampling distribution, which you determined was $\operatorname{Binomial}(m+n, p)$. So if we suppose $$\hat p \overset{\boldsymbol .}{\sim} \operatorname{Normal}(\mu = p, \sigma^2 = \tfrac{p(1-p)}{m+n}),$$ then an approximate $100(1-\alpha)\%$ CI for $p$ would be found by computing the $100 \alpha/2$ and $100(1 - \alpha/2)$ percentiles of this normal distribution, but replacing the unknown parameter $p$ with the estimator $T_{n,m,2}$. The first value is the lower bound $L$, the second is the upper $U$.