Show $T=X_1(X_3+X_4)+X_2$ is not sufficient for $p$ where $X_1,...,X_4$ are iid Bernoulli(p)

123 Views Asked by At

I am attempting to show that $T=X_1(X_3+X_4)+X_2$ is not sufficient for $p$ where $X_1,...,X_4$ are iid Bernoulli(p), and the question specifies that I am to use the conditional distribution method. Here is my work:

It can be shown first that $T=0$ iff $[X_1=0$ $\cup$ ($X_3=0$ $\cap$ $X_4=0$)] $\cap$ $X_2=0$. So the probability of obtaining $\bigcap_{i=1}^4X_i=0 $ given that $T=0$ can be expressed as: $$ \frac{Pr[\bigcap_{i=1}^4X_i=0]}{Pr[T=0]}=\frac{(1-p)^4}{(1-p)^2+(1-p)^3-(1-p)^4} $$ Hence, $T$ is not sufficient for $p$ because the above expression depends on $p$. My questions are, first, did I successfully prove the statement that $T$ is not sufficient for $p$? And second, could I have completed this proof without using particular values of $X$ and $T$?

1

There are 1 best solutions below

1
On BEST ANSWER

Your computation $$P[(X_1,...,X_4)=(0,...,0)|T=0] = \mbox{$\frac{(1-p)^4}{(1-p)^2+(1-p)^3-(1-p)^4}$}$$ is correct, and depends on $p$, so (as Zoli mentions) you are done. Well, in some cases you may get an expression that "looks" like it depends on $p$, but really does not, such as $\frac{(p+1)^2}{p^2+2p+1}$. In your case, you can test your expression for $p=0$ and $p=1/2$ to see the answer is different.


In more detail: Define $\mathcal{X}$ as the set of all possible values of the data vector $(X_1, ..., X_4)$:

$$\mathcal{X} = \{(x_1, x_2, x_3, x_4) : x_i \in \{0,1\} \: \forall i \in \{1, …, 4\}\}$$

The probability mass function for $(X_1, ..., X_4)$ depends on some parameter $p$. Suppose $T$ is a random variable formed as a function of the data (so $T=f(X_1, ..., X_n)$ for some deterministic function $f$). Suppose $T$ takes values in some finite or countably infinite set $\mathcal{T}$. The random variable $T$ is a sufficient statistic for a parameter $p$ if for all $(x_1, x_2, x_3, x_4) \in \mathcal{X}$ and all $t \in \mathcal{T}$ we have that the expression
$$ P[(X_1,…,X_4)=(x_1,…,x_4)|T=t] \quad (\mbox{Expression 1})$$ does not depend on $p$. (That expression can of course depend on the values of $(x_1, ..., x_4)$ and $t$, just not on $p$.)

Because of the "for all" nature of the definition, showing a random variable $T$ is not a sufficient statistic reduces to finding at least one case of a particular $(x_1, …, x_4) \in \mathcal{X}$, $t \in \mathcal{T}$ for which (Expression 1) depends on $p$. You did this, so you are done.

On your second question: The particular values $(x_1,…,x_4)=(0,0,0,0)$ and $t=0$ that you used seem to be the easiest way to complete this problem. Proceeding without using particular values, but just using abstract values $(x_1, ..., x_4)$ and $t$, is possible but more complicated: You would need to compute the expression (Expression 1) in terms of those abstract values $(x_1, ..., x_4)$, $t$. Overall, the way you solved the problem seems to be the best possible way.