Why do we write Bernoulli likelihood as follow:
$L(p) = \prod_{i=1}^n p^{x_i}(1-p)^{(1-x_i)}$
and not like this:
$L(p) = \prod_{i=1}^n [px_i +(1-p)(1-x_i)]$
Both expressions are equivalent.
Is it only a trick to make it easier to derive the maximum of likelihood as below? Or is there a fundamental explanation?
$$ \begin{align*} L(p) &= \prod_{i=1}^n p^{x_i}(1-p)^{(1-x_i)}\\ \ell(p) &= \log{p}\sum_{i=1}^n x_i + \log{(1-p)}\sum_{i=1}^n (1-x_i)\\ \dfrac{\partial\ell(p)}{\partial p} &= \dfrac{\sum_{i=1}^n x_i}{p} - \dfrac{\sum_{i=1}^n (1-x_i)}{1-p} \overset{\text{set}}{=}0\\ \sum_{i=1}^n x_i - p\sum_{i=1}^n x_i &= p\sum_{i=1}^n (1-x_i)\\ p& = \dfrac{1}{n}\sum_{i=1}^n x_i \end{align*} $$
Yeah they are equivalent; it's written like that probably just to make it easier to analyze when looking at the log likelihood, but there's no "mathematical" reason. I think writing it like that also makes it more clear that you are multiplying probabilities. What I mean by that is if you expand $$\prod_{i=1}^n p^{x_i} (1-p)^{1-x_i},$$ you get $$ p^{\sum x_i} (1-p)^{1 - \sum x_i},$$ which might be easier to work with than the expression you proposed. But yeah, I think it's just written like that because it makes a little more "sense" in some ways, but there shouldn't be a deep mathematical reason or something like that because as you said, they are equivalent.