Showing $\hat{\theta} = \frac{x_1 + 2x_2 + x_3}{4}$ is a not a sufficient estimator for the mean of a Bernoulli-distributed population

101 Views Asked by At

Suppose $x_1$, $x_2$, $x_3$ are independant observations from a Bernoulli-distributed population with parameter $\theta$.

I want to show that $$\hat{\theta} = \frac{x_1 + 2x_2 + x_3}{4}$$ is not a sufficient estimator for $\theta$.

I have derived earlier that $$L(\theta) = (1 - \theta)^3 \left(\frac{\theta}{1 - \theta}\right)^{x_1 + x_2 + x_3}$$ which allowed me to show that $\overline{x}$ was a sufficient estimator for $\theta$. I however am not sure about how to proceed to show that $\hat{\theta}$ is not sufficient for $\theta$ (while it does seem natural to me). I have tried using the "formal" definition for a statistic to be sufficient but with no concrete results.

How would one go about showing $\hat{\theta}$ is sufficient for estimating $\theta$?

2

There are 2 best solutions below

0
On BEST ANSWER

For example, you can use definition and prove that there exists some $k_1,k_2,k_3\in\{0,1\}$ and some $s\in\{0,\frac14,\frac24,\frac34,1\}$ s.t. $$ \mathbb P(x_1=k_1,x_2=k_2, x_3=k_3\mid \hat\theta = s) $$ depends on $\theta$.

Say, $$ \mathbb P\left(x_1=1,x_2=0, x_3=1\biggm| \hat\theta = \frac24\right) =\frac{\mathbb P(x_1=1,x_2=0, x_3=1)}{\mathbb P(x_1+2x_2+x_3=2)} $$ $$ =\frac{\mathbb P(x_1=1,x_2=0, x_3=1)}{\mathbb P(x_1=1,x_2=0, x_3=1)+\mathbb P(x_1=0,x_2=1, x_3=0)} $$ $$=\frac{\theta^2(1-\theta)}{\theta^2(1-\theta)+\theta(1-\theta)^2}= \theta. $$ So $\hat\theta$ is not sufficient for $\theta$.

1
On

A simple and intuitive way to understand why $\hat \theta$ is not sufficient is to note that in an iid sample $(x_1, \ldots, x_n)$ drawn from a Bernoulli distribution, all of the information about $\theta$ should be contained in the number of observations that are $1$--because the order of the observations is not relevant where it comes to the information about $\theta$.

For example, loosely speaking, the sample $(1,0,1)$ for $n = 3$ contains the equivalent amount of information about $\theta$ as the sample $(1,1,0)$, because both contain the same number of $1$s. But if this is the case, then if we can find a value of $\hat \theta$ that is generated by two samples for which the number of $1$s is not the same, then $\hat \theta$ can't be sufficient, since given such a value, you'd not be able to tell how many $1$s were present in the sample.

The form of $\hat \theta$ suggests such an example; both $(0,1,0)$ and $(1,0,1)$ have $\hat \theta = 1/2$. So if I told you that I generated a sample and computed $\hat \theta = 1/2$, you would not be able to tell me whether my sample had one $1$, or two $1$s, and since these samples contain different information about $\theta$--so in that sense, information about $\theta$ has been lost.

Now that we have looked at the intuitive reasoning, we are better prepared for the formal proof. For $n = 3$, the joint distribution is $$\Pr[(X_1, X_2, X_3) = (x_1, x_2, x_3) \mid \theta] = \prod_{i=1}^3 \Pr[X_i = x_i \mid \theta] = \prod_{i=1}^3 \theta^{x_i} (1 - \theta)^{1 - x_i} \mathbb 1(x_i \in \{0,1\}).$$ This simplifies to $$\theta^{x_1 + x_2 + x_3}(1-\theta)^{3-(x_1 + x_2 + x_3)} \mathbb 1 (x_1 \in \{0,1\})\mathbb 1 (x_2 \in \{0,1\})\mathbb 1 (x_3 \in \{0,1\}).$$ So by the factorization theorem, $$h(\boldsymbol x) = \mathbb 1 (x_1 \in \{0,1\})\mathbb 1 (x_2 \in \{0,1\})\mathbb 1 (x_3 \in \{0,1\}), \\ g(T(\boldsymbol x) \mid \theta) = \theta^T (1 - \theta)^{3-T}, \\ T(\boldsymbol x) = T(x_1, x_2, x_3) = x_1 + x_2 + x_3.$$ This formalizes our earlier claim that the number of $1$s in the sample is a sufficient statistic.

Then to see how $\hat \theta$ is not sufficient, all we need to do is show that it is possible for two distinct samples $\boldsymbol x$, $\boldsymbol x^*$, to satisfy $$T(\boldsymbol x) \ne T(\boldsymbol x^*), \quad \hat\theta(\boldsymbol x) = \hat\theta(\boldsymbol x^*).$$ Then $\hat \theta$ cannot be sufficient, since given the quantity $\hat \theta(\boldsymbol x)$, you cannot tell whether it was generated by $\boldsymbol x$ or $\boldsymbol x^*$, yet the value of the sufficient statistic $T$ is not the same for these two possibilities, meaning that information about $\theta$ was lost with $\hat \theta$.