Let X1, . . . , Xn be a random sample from the following pmf. P(X = 0) = θ, P(X = 1) = 2θ, P(X = 2) = 1 − 3θ, 0 < θ < 1/3 Find a non-trivial sufficient statistic.
I start like this: L(θ)=L(θ)=∏i:ki=0(θ)∏i:ki=1(2θ)∏i:ki=2(1-3θ)
I'm starting it right, the joint density function?
If $\boldsymbol x = (x_1, x_2, \ldots, x_n)$ is the sample, then the likelihood is given by $$\mathcal L(\theta \mid \boldsymbol x) = \prod_{i=1}^n \theta^{\mathbb 1(x_i = 0)} (2\theta)^{\mathbb 1(x_i = 1)} (1 - 3\theta)^{\mathbb 1(x_i = 2)},$$ where $$\mathbb 1(x_i = x) = \begin{cases}1, & x_i = x \\ 0, & x_i \ne x \end{cases}$$ is an indicator function. But since the sample size is $$n = \sum_{i=1}^n \mathbb 1(x_i = 0) + \mathbb 1(x_i = 1) + \mathbb 1(x_i = 2),$$ we can write this as $$\begin{align*} \mathcal L(\theta \mid \boldsymbol x) &= \prod_{i=1}^n 2^{\mathbb 1(x_i = 1)} \theta^{\mathbb 1 (x_i = 0) + \mathbb 1 (x_i = 1)} (1 - 3\theta)^{\mathbb 1 (x_i = 2)} \\ &= 2^{\sum \mathbb 1(x_i = 1)} \theta^{\sum \mathbb 1(x_i = 0) + \mathbb 1(x_i = 1)} (1 - 3\theta)^{\sum \mathbb 1(x_i = 2)} \\ &= 2^{\sum \mathbb 1 (x_i = 1)} \theta^{n - \sum_{i=1}^n \mathbb 1 (x_i = 2)} (1 - 3\theta)^{\sum \mathbb 1(x_i = 2)}. \end{align*}$$
Using the Factorization Theorem, we need to express this in the form $$h(\boldsymbol x) g(\boldsymbol T(\boldsymbol x) \mid \theta),$$ where $h$ is a function that does not depend on $\theta$, and the dependence of $g$ on the sample is only through the sufficient statistic $\boldsymbol T$. Clearly, this suggests choosing $$h(\boldsymbol x) = 2^{\mathbb 1 (x_i = 1)}$$ since this is the only factor that does not depend on $\theta$. Next, we can choose $$\boldsymbol T(\boldsymbol x) = T(\boldsymbol x) = \sum_{i=1}^n \mathbb 1 (x_i = 2),$$ thus $$g(T \mid \theta) = \theta^{n-T} (1-3\theta)^T = \theta^n (\theta^{-1} - 3)^T.$$ Our sufficient statistic, then, is simply the number of observations in the sample that equal $2$.
At first glance, this seems counterintuitive. After all, should we not expect that the observed frequencies of $0$ and $1$ in addition to $2$ provide information about $\theta$? However, this is not so. First of all, we have a redundancy that arises from the fact that the sum of the frequencies must equal $n$, so at most, only one other category would be informative. Without loss of generality, suppose this is the frequency of $0$ in addition to $2$. But in fact, there is a second redundancy which becomes evident when we recall that $h$ was not chosen to be $1$, but $2^{\sum \mathbb 1(x_i = 1)}$. This means there is no additional information about $\theta$ carried by the observed frequency of $1$ that is not already present in the sample through the observed frequencies of $0$ and $2$.