I am just going through some old question and answers from M.I.T and I came across this one, which apparently I cannot understand its solution.
The question is here: http://www.ai.mit.edu/courses/6.867-f03/hw/hw1.pdf (Problem 4).
Answer is here: http://www.ai.mit.edu/courses/6.867-f03/hw/hw1-solutions.pdf (Problem 4 page 5-6).
Although I am familiar with maximum likelihood concepts, and i understand pretty much everything from this paper. What i cannot understand, is in page 6 of the solutions, the part where "The maximum likelihood of the data under this model..."
The problem is, how did these numbers come up, namely $(\frac{4}{7})^{8} (\frac{3}{7})^{6}$ .
The solution describes the PMFs of $x$ and $y \mid x$ (equations $45 - 47$). Then the likelihood evaluation is simply done by substituting $\theta_i = \hat \theta_i$ for each $i = 1, 2$ given the sample. Note that we can create a table of counts for the sample: $$\begin{array}{cc|cc} & & x & \\ & & 0 & 1 \\\hline y & 0 & 2 & 2 \\ & 1 & 1 & 2 \end{array}$$ What this tells us is that $$\prod_i \hat P_i (x_i) = \hat \theta_1^4 (1 - \hat \theta_1)^3,$$ because we observed $3$ instances of $x = 0$ and $4$ instances of $x = 1$. This comes directly from equation $45$. We also have from equations 46 and 47, $$\prod_i \hat P_i (y_i \mid x = 0) = \hat \theta_2^2 (1 - \hat \theta_2)^1, \quad \prod_i \hat P_i (y_i \mid x = 1) = (1 - \hat \theta_2)^2 \hat \theta_2^2.$$ Again, this is because when $x = 0$, we observed $2$ instances of $y = 0$, and one instance of $y = 1$; and when $x = 1$, we observed $2$ instances each of $y = 0$ and $y = 1$. The rest is substitution.
With $\hat \theta_1 = \hat \theta_2 = 4/7$, we compute the likelihood $$(4/7)^4 (3/7)^3 \cdot (4/7)^2 (3/7)^1 \cdot (3/7)^2 (4/7)^2 = (4/7)^{4+2+2} (3/7)^{3+1+2} = (4/7)^8 (3/7)^6$$ as claimed.