Derivation of expectation maximization for GMMs

96 Views Asked by At

I am referring to these lecture slides on EM estimation of GMM. In particular I have confusion in steps on slide 13 and 14.

If we have a $N$ component GMM (defined by parameters $\theta$), likelihood of observation $x_i$ is given as,

$p(x_i \vert \theta) = \sum _{j=1}^N P(j \vert \theta) \cdot p(x_i \vert j, \theta)$

First term in the product on RHS is the mixture weight, second term is the likelihood from an individual $j$'th Gaussian.

Then they introduce a hidden variable $Q$ that describes which Gaussian generated the sample point.

There is also an indicator variable defined as

$z_{i,j} = 1$ : If $x_i$ came from Gaussian $j$

$z_{i,j} = 0$ : otherwise

I do not get how they go from here to the following

$p(x_i,Q \vert \theta) = P(j \vert \theta)^{z_{i,j}} \cdot P(x_i \vert j, \theta)^{z_{i,j}}$

My reasoning is as follows -

$p(x_i,Q \vert \theta) = p(x_i \vert Q, \theta) \cdot p(Q \vert \theta)$

It can be seen that the first term on RHS (because if we know Q, we know which Gaussian the point came from),

$p(x_i \vert Q , \theta) = P(x_i \vert j, \theta)^{z_{i,j}}$

How to interpret the second term $p(Q \vert \theta)$ ? How to show that it equals the mixture weight of $j$th Gaussian (i.e. $P(j \vert \theta)^{z_{i,j}}$)?