Solving for mixture coefficients in Gaussian Mixture Model

255 Views Asked by At

In Chapter 9 of 'Pattern Recognition and Machine Learning' Bishop explains the Expectation-Maximization algorithm, also with the application on Gaussian mixtures.

On page 436 (14 in the link) he states that when maximizing

$$\ln p(X\mid \pi, \mu,\Sigma) + \lambda\bigg(\sum^K_{k=1}\pi_k-1\bigg)$$

giving

$$ 0=\sum^N_{n=1}\frac{\mathcal{N}(x_n\mid\mu_k,\Sigma_k)}{\sum_j\pi_j\mathcal{N}(x_n\mid\mu_j,\Sigma_j)} + \lambda $$

I understand how he gets to this point, but in the following he states that

"if we now multiply both sides by $\pi_k$ and sum over $k$ making use of the constraint $\sum_k \pi_k = 1$, we find $\lambda = −N$.

Using this to eliminate $\lambda$ and rearranging we obtain $\pi_k=\frac{N_k}{N}$"

I cannot follow this, i.e. could someone illustrate how do I find $\lambda=-N$?

1

There are 1 best solutions below

0
On BEST ANSWER

I try to explain in small steps starting from your second equation. First step is multiplying with $\pi_k$, so you get:

$$ 0= \pi_k \left( \sum^N_{n=1}\frac{\mathcal{N}(x_n\mid\mu_k,\Sigma_k)}{\sum_j\pi_j\mathcal{N}(x_n\mid\mu_j,\Sigma_j)} + \lambda \right) = \pi_k \sum^N_{n=1}\frac{\mathcal{N}(x_n\mid\mu_k,\Sigma_k)}{\sum_j\pi_j\mathcal{N}(x_n\mid\mu_j,\Sigma_j)} + \pi_k \lambda. $$

Now summing over $k$ gives:

$$ 0 = \sum_{k=1}^K \pi_k \sum^N_{n=1}\frac{\mathcal{N}(x_n\mid\mu_k,\Sigma_k)}{\sum_j\pi_j\mathcal{N}(x_n\mid\mu_j,\Sigma_j)} + \sum_{k=1}^K\pi_k \lambda. $$

You can use the fact that $\sum_k \pi_k=1$ to simplify the second term to $\lambda$. We can also rearrange the first term by noting that the denominator does not depend on $k$:

\begin{align} 0 &= \sum^N_{n=1}\sum_{k=1}^K \pi_k \frac{\mathcal{N}(x_n\mid\mu_k,\Sigma_k)}{\sum_j\pi_j\mathcal{N}(x_n\mid\mu_j,\Sigma_j)} + \lambda \\ &= \sum^N_{n=1} \frac{\sum_k\pi_k\mathcal{N}(x_n\mid\mu_k,\Sigma_k)}{\sum_j\pi_j\mathcal{N}(x_n\mid\mu_j,\Sigma_j)} + \lambda. \end{align}

As you can see, the nominator and denominator of the first term are basically the same as the only difference is the index that is used. Hence, we are summing $N$ times $1$, so we have $$ 0=\sum_{n=1}^N 1+\lambda =N+\lambda, $$ which leads to $$ \lambda = -N. $$