This is a question from an exam:
You want to estimate the parameters for a gaussian distribution using the Maximum-Likelihood Method for an i.i.d. set of data. What role does the property i.i.d. play? What difficulties occur if your List of data is not i.i.d?
I basically know how ML works but I have troubles getting behind the reason why data which is not i.i.d. would cause troubles. Could anyone explain this to me? Please note that as a computer scientist I am not a mathematician - please keep it simple for me. :)
Consider the likelihood function of the data $L(\text{Data} \mid \theta)$. The data in this case are comprised of $n$ iid random draws from a normal distribution $\{x_i\}_{i=1}^n$. So $$ L(\text{Data} \mid \theta) = L(x_1, \dots, x_n \mid \theta). $$ But we know that two independent events $A, B$ can have their joint probability $P(A,B)$ decomposed to $P(A)P(B)$ by the definition of independence. Thus, $$ L(x_1, \dots, x_n \mid \theta) = L(x_1 \mid \theta) \cdots L(x_n \mid \theta) = \mathcal{N}(x_1 \mid \theta) \cdots \mathcal{N}(x_n \mid \theta) = \prod_{i=1}^n \mathcal{N}(x_i \mid \theta). $$ Now recall that the MLE for this scenario is computed by setting $$ \frac{\partial}{\partial \theta} \prod_{i=1}^n \mathcal{N}(x_i \mid \theta) = 0. $$ Independence allows us to exploit the exponential form of the normal PDF: the product of all of these PDFs creates a sum in the exponent, which makes solving the optimization problem a breeze analytically (though not necessarily algebraically!).