Introduction Mathematical Statistics, finding the Mean Square Error of estimators

437 Views Asked by At

I'm working on a problem from my introduction to mathematical statistics course. So far, I've done the following work:

Let $X_{1},...,X_{m}$ and $Y_{1},...,Y_{n}$ be independent samples form the Bernoulli distribution with unknown parameter $p \in [0,1]$.

i) Prove that $(\bar{X}+\bar{Y})/2$ and $(\sum_{i=1}^{m}X_{i}+\sum_{j=1}^{n}Y_{j})/(m+n)$ are unbiased estimators for $p$.

ii) Which of these two estimators is preferable (if $m \neq n$)?\

Solution i) First we state the definition of an unbiased estimator. An estimator T is called unbiased for the estimation of $g(\theta)$ if $E_{\theta}T=g(\theta)$ for all $\theta \in \Theta$. The bias is defined as $E_{\theta}T-g(\theta).$

We now compute $E_{p}(\bar{X}+\bar{Y})/2$ as follows, \begin{equation} \begin{split} E_{p}(\bar{X}+\bar{Y})/2 &= \frac{1}{m}\sum_{i=1}^{m}E_{p}\frac{X_{i}}{2} + \frac{1}{n}\sum_{j=1}^{n}E_{p}\frac{Y_{i}}{2}\\ &= \frac{1}{m}m\frac{1}{2}p+\frac{1}{n}n\frac{1}{2}p\\ &= \frac{1}{2}p+\frac{1}{2}p\\ &=p, \end{split} \end{equation} which is equal to $g(p)$ and therefore this estimator is unbiased.

Now we look at the second estimator, $(\sum_{i=1}^{m}X_{i}+\sum_{j=1}^{n}Y_{j})/(m+n)$, and proceed as before. \begin{equation} \begin{split} E_{p}\left((\sum_{i=1}^{m}X_{i}+\sum_{j=1}^{n}Y_{j})/(m+n)\right) &= \frac{1}{m+n}E_{p}\left(\sum_{i=1}^{m}X_{i}+\sum_{j=1}^{n}Y_{j}\right)\\ &= \frac{1}{m+n}(mp+np)\\ &=\frac{p(m+n)}{m+n}\\ &= p, \end{split} \end{equation} which again is equal to $g(p)$ and therefore this estimator is unbiased.

Solution ii) In order to determine which of the given estimators is preferable, we desire the smallest possible mean square error (MSE). The MSE is defined as $$MSE(\theta;T)=E_{\theta}||T-g(\theta)||^{2}.$$

So this is my work so far. I still don't quite understand the meaning of the estimators. Let alone, work out which one is more preferable. I've just try to work from the definition and I think my solution for i) works but I'm not sure. But for ii) I've truely no idea where to start. Any suggestions would be much appreciated!

1

There are 1 best solutions below

7
On BEST ANSWER

Observe that the mean squared error can be expanded as (using linearity of expectation): $$ \mathrm{MSE}(\theta; T) = \mathbb{E}(T^2) + \mathbb{E}(g(\theta)^2) -2 \mathbb{E}(T g(\theta)) $$

In your case, $g(\theta)$ is just the population mean $p$, and additionally $\mathbb{E}(T) = p$, so we may rewrite

$$ \mathrm{MSE} = \mathbb{E}(T^2) + p^2 - 2p\mathbb{E}(T) = \mathbb{E}(T^2) - p^2 = \mathbb{E}(T^2) - [\mathbb{E}(T)]^2 = \mathrm{Var}(T). $$

It is then straightforward to calculate which of the two estimators has minimum variance, using the fact that $X_i, Y_i$ are all independent. For example, for the estimator $\frac{\bar{X} + \bar{Y}}{2}$, we get

$$ \mathrm{Var}\left(\frac{\bar{X} + \bar{Y}}{2}\right) = \mathrm{Var}\left(\frac{1}{2m} \sum_{i=1}^m X_i + \frac{1}{2n} \sum_{i=1}^n Y_i \right) = \frac{1}{4m^2} \sum_{i=1}^m \mathrm{Var}(X_i) + \frac{1}{4n^2} \sum_{i=1}^n \mathrm{Var}(Y_i) \\ = p(1 - p) \left( \frac{1}{4m} + \frac{1}{4n} \right) $$

where we've used the independence of all $X_i, Y_i$ to interchange variance with summation, the property that $\mathrm{Var}(aX) = a^2 \mathrm{Var}(X)$ when $a$ is a constant, and the fact that all variables are identically distributed as $\mathrm{Bernoulli}(p)$ to pull out the common factor $p(1 - p)$ out of the sum.