Bias of proportion estimator

414 Views Asked by At

The typical estimator for a population proportion $p$ is the sample proportion ${\hat p_1} = {X \over n}$, where $X$ is the number of successes in a random sample of size $n$. However, in the case when the population proportion $p$ is small and the sample size $n$ is small, one might easily get zero successes and an estimate 0 for the proportion $p$. To remedy this, the Wilson estimator is proposed as ${\hat p_2} = {{X + 2} \over {n + 4}}$.Find the bias and mean squared error of both estimators, and show whether ${\hat p_2}$ is consistent.

I apologize if this is too simple, but I'm currently starting to study estimators.

I understand through this formula $$bias\left( {{{\hat p}_1}} \right) = {\left[ {E\left( {{{\hat p}_1}} \right) - {p_1}} \right]^2}$$ that the bias of a given estimator is found by taking the square of the difference between the expected value of said estimator and the parameter's real value. Is this correct? If so, in this situation, I would have $$bias\left( {{{\hat p}_1}} \right) = {\left[ {E\left( {{X \over n}} \right) - {X \over n}} \right]^2}$$ Now, I'm supposed to find a bias of 0 in this exercise. My question is, isn't $$E\left( {{X \over n}} \right) = {1 \over n}E\left( X \right) = {1 \over n}\bar X$$ This would have me subtract $${1 \over n}\bar X - {X \over n}$$ Where is the silly mistake in this logic? Thank you for clarifying.

1

There are 1 best solutions below

3
On BEST ANSWER

Note, that $P\{success\}$ = p.

Next, let $X = \sum_{i=1}^{n}Y_{i}$, where $Y=1$, if it is success, and $0$ otherwise.

Next $$ bias(\hat{p}) = E[\hat{p}] - p, $$ therefore $$ bias(\hat{p})= E[\frac{X}{n} - p] = E[\frac{\sum_{i=1}^{n}{Y_{i}}}{n}] - p = \frac{\sum_{i=1}^{n}{E[Y_{i}]}}{n} - p= p - p = 0 $$ For the case of Wilson estimator: $$ bias(\hat{p})= E[\frac{X}{n} - p] = E[\frac{\sum_{i=1}^{n}{Y_{i}} + 2}{n +4}] - p = \frac{\sum_{i=1}^{n}{E[Y_{i}]} +2}{n+4} - p= \frac{np + 2}{n + 4} - p = $$

Note, that if $n \to \infty$, then $bias(\hat{p}) \to 0$.

One can prove consistence using LLN and properties of convergence: $$ \frac{\sum_{i=1}^{n}{Y_{i}} + 2}{n +4} = \frac{\frac{\sum_{i=1}^{n}{Y_{i}}}{n} + 2/n}{1 +4/n}\overset{p}{\to} p $$