Understanding terminology of Mixtures of Gaussians model

37 Views Asked by At

I'm following this lecture notes to understand mixture of Gaussian model and EM (Expectation Maximization) algorithm to fit it.

I understand the complete intuition behind this algorithm, which is explained here. But I don't understand the terminology used in the lecture notes to mathematically express this algorithm.

It's written that we want to model the joint distribution of

$$p(x^{(i)}, z^{(i)}) = p(x^{(i)}| z^{(i)}) p(z^{(i)})$$

Here what is the intuitive meaning of $p(x^{(i)}, z^{(i)})$? How they came up with such expression? Here I know what $z^{(i)}$ means. It's one of the $k$ possible Gaussian distributions from which $x^{(i)}$ came from. Correct me here if I'm wrong.

I also don't understand from where the following log likelihood formula came from.

$$l(φ, μ, Σ) = \sum_{i=1}^m \log p(x^{(i)}; φ, μ, Σ)$$

which is equal to

$$\sum_{i=1}^m \log \sum_{z^{(i)}=1}^k p(x^{(i)}|z^{(i)} ; μ,Σ) p(z^{(i)}; φ)$$

Also the other thing which I don't understand is in explanation on EM algorithm, I can't figure out from where the formula for $φj$, $μj$ and $Σj$ is derived.

enter image description here

How can I derive this Maximization step of EM algorithm?