Let's assume I have 4 observations with each observation is modelled as a bernoulli trial with probability $p$. Sucesses are labelled as 1, failure is 0. My observations $(x_1, x_2, x_3, x_4)$ are as follows: (0,1,0,1) and I label this dataset as $D$. I assume the observations are independent and identically distributed. The likelihood of observing this dataset given parameter $p$ is the joint pdf: $$P(D|p)=P(x_1|p)P(x_2|p)P(x_3|p)P(x_4|p)=p^2(1-p)^2$$ Here I have factored the pdf because each observation is i.i.d from the underlying generative distribution. Then, I can estimate $p$ using for example Maximum-Log-Likelihood. However, we know that the probability of having exactly 2 successes in 4 trials is modeled by the binomial distribution, which has the binomial coefficient in front of the above last result: $P(2 succeses)={4\choose 2}p^2(1-p)^2$.
Therefore shouldn't the joint, which is the probability of observing my data, be: $$P(D|p)={4\choose 2}p^2(1-p)^2$$ ?
What am I missing? I guess I am confused as whether my reasoning is correct and would like some help to clear up this confusion.
Edit: I think part of the confusion comes from me not knowing that likelihood is not a distribution. However, even knowing this, the confusion still stands. I thought the point of assuming iid is being able to factor the joint $P(x_1,...,x_n)$ as $P(x_1)...P(x_k)$ i.e. a product of k bernoulli trials, in which case there is no binomial coefficient.
The problem is that you are looking at two different interpretations of the results, which are two different events:
The event that your results are exactly $x_1 = 0$, $x_2 = 1$, $x_3 = 0$, $x_4 = 1$, which has a probability as you initially calculate of $p^2 (1-p)^2$; and
The event that exactly two of the results are equal to 1, regardless of their order, i.e. $x_1 + x_2 + x_3 + x_4 = 2$, which has a probability of ${4 \choose 2}p^2(1-p)^2$ because it covers 6 different combinations of possible individual results (including $(0, 1, 0, 1)$ but also $(1, 1, 0, 0)$, etc).
So which is correct? They both could be, depending on what you're trying to understand the likelihood of. If you're interested in specific sequences of results, then you would work with the first option. If you're interested in aggregate results, then you would work with the second.
In trying to estimate $p$, you're really interested in looking at the average number of successes you get in some number of trials, which means that the sequence of results is not as important as the specific count, so you need to use the latter version which considers all ways you could get 2 successes in 4 trials.