Derivative of Gaussian mixture model with respect to covariance matrix $\Sigma_k$

2.5k Views Asked by At

For the expectation maximization of Gaussian mixture model with respect to $\Sigma_k$ we have:

$\frac{\partial L}{\partial \Sigma_k } =0$

$L = \sum_{i=1}^n \log(\sum_{k=1}^K \pi_K \mathcal{N}(x_i|\mu_k,\Sigma_k))$

$ \sum_{i=1}^N\frac{ \pi_K }{\sum_{k=1}^K \pi_K \mathcal{N}(x_i|\mu_k,\Sigma_k)} \frac{\partial}{\Sigma_k}(\frac{1}{\sqrt{(2\pi)^t|\Sigma_k|}}e^{-\frac{1}{2}(x_i-\mu_k)^T\Sigma_k^{-1}(x_i-\mu_k)}) = 0$

The answer is

$\Sigma_k = \sum_{n=1}^N\frac{1}{N_k}\frac{\pi_k\mathcal{N}(x_n|\mu_k,\Sigma_k)}{\Sigma_{j=1}^{K}\mathcal{\pi_j\mathcal{N}(x_n|\mu_j,\Sigma_j)}}(x_n - \mu_k)(x_n-\mu_k)^T$

$N_k = \sum_{n=1}^{N} \frac{\pi_k\mathcal{N}(x_n|\mu_k,\Sigma_k)}{\Sigma_{j=1}^{K}\mathcal{\pi_j\mathcal{N}(x_n|\mu_j,\Sigma_j)}}$

I cannot find the answer above. I assumed $|\Sigma_k|$ is determinant of $\Sigma_k$ . Is that correct? how should we perform the derivative with respect to matrix $\Sigma_k$ when one of place it is represented as determinant and in one place as inverse matrix.

1

There are 1 best solutions below

5
On BEST ANSWER

In the EM algorithm we take some $\log$ probabilities so that the maximization step for a Gaussian mixture is a quadratic optimization problem with a closed-form.


Let $S = \Sigma_k^{-1}, y_n = x_n-\mu_k,a_n = \frac{\mathbb{P}(x_n \in C_k)}{\sum_m \mathbb{P}(x_m \in C_k)}$. Then you want find $S$ minimizing $$J(S) = -\log \det|S|+\sum_n a_n y_n^T S y_n, \qquad (a_n > 0, y_n \in \mathbb{R}^m,S \in \mathbb{R}^{m \times m})$$ Due to the adjugate matrix formula $\det(S) I = S\, Adj(S)$ you'll find $$\frac{\partial }{\partial S_{i,j}} \log \det(S) = \frac{Adj(S)_{i,j}}{\det S}, \qquad \frac{\partial }{\partial S_{i,j}} J(S) = -\frac{Adj(S)}{\det S}+\sum_n a_n y_n(i) y_n(j), \qquad \frac{\partial }{\partial S} J(S)= -\frac{Adj(S)}{\det S}+\sum_n a_n y_n y_n^T$$ $$\frac{\partial }{\partial S} J(S)=0 \qquad \implies \qquad \frac{Adj(S)}{\det(S)} =S^{-1}= \sum_n a_n y_n y_n^T$$

And you'll need to compute $ \Sigma_k^{-1} = S$ as $\Sigma_k^{-1}$ is what we need to compute the $\log$ probabilities for the next step.