I'm following this lecture notes to understand mixture of Gaussian model and EM (Expectation Maximization) algorithm to fit it.
I understand the complete intuition behind this algorithm, which is explained here. But I don't understand the terminology used in the lecture notes to mathematically express this algorithm.
It's written that we want to model the joint distribution of
$$p(x^{(i)}, z^{(i)}) = p(x^{(i)}| z^{(i)}) p(z^{(i)})$$
Here what is the intuitive meaning of $p(x^{(i)}, z^{(i)})$? How they came up with such expression? Here I know what $z^{(i)}$ means. It's one of the $k$ possible Gaussian distributions from which $x^{(i)}$ came from. Correct me here if I'm wrong.
I also don't understand from where the following log likelihood formula came from.
$$l(φ, μ, Σ) = \sum_{i=1}^m \log p(x^{(i)}; φ, μ, Σ)$$
which is equal to
$$\sum_{i=1}^m \log \sum_{z^{(i)}=1}^k p(x^{(i)}|z^{(i)} ; μ,Σ) p(z^{(i)}; φ)$$
Also the other thing which I don't understand is in explanation on EM algorithm, I can't figure out from where the formula for $φj$, $μj$ and $Σj$ is derived.
How can I derive this Maximization step of EM algorithm?
