Calculate the derivative for the EM algorithm

Question

Calculate the derivative for the EM algorithm

250 Views Asked by Bumbble Comm At 25 Mar 2026 - 9:45

I was working on some exercises to prepare myself for the machine learning coming exam. This is a question on EM algorithm. I want to solve the optimal value for parameter $\Sigma$.

Maximize a function: $$fun_{\pi,\mu,\Sigma} = \sum^{N} \sum^{K} \gamma(z_{nk}) [\ln \pi_k + \ln \mathcal{N}(\mathbf{\mathcal{x_n}};\mathbf{\mu_k},\mathbf{\Sigma_k})]$$ $\gamma(z_{nk}) \in \mathcal{R}$ is a fixed item, $\pi_k \in \mathcal{R}$. I am interested in the optimal value of $\Sigma_k$.

The calculation I tried:
rewrite the function and we are interested in optimizing $\Sigma$, we have

$$f = -\frac{1}{2} \sum^{N} \sum^{K} \gamma(z_{nk})(x_n - \mu_k)^T \Sigma_k^{-1} (x_n - \mu_k) + constant$$

we have, $$\frac{\partial a^TXa}{\partial X_{ij}} = \frac{\partial \sum_i \sum_j a_{1i}X_{ij}a_{j1}}{\partial X_{ij}} \Rightarrow \frac{\partial a^TXa}{\partial X} = a \ a^T$$ thus, $$\frac{\partial f}{\partial \Sigma_k^{-1}}=-\frac{1}{2} \sum^N \gamma(z_{nk})(x_n -\mu_k)(x_n - \mu_k)^T$$

this is a computer science course, we haven't cover anything of taking derivative on an inverse matrix, I am not sure where to go for the next step. I mean how to take derivative on $\Sigma_k$.

Also, I suspect I've made errors somewhere, as at this stage my result seems a bit off comparing to the final solution $$\mathbf{\Sigma}_k = \frac{1}{N_k} \sum_{n=1}^N \gamma(z_{nk}) (\mathbf{x}_n-\mathbf{\mu}_k)(\mathbf{x}_n-\mathbf{\mu}_k)^\top$$

Updates

To simplify the question, what I want to ask is how to calculate $$\frac{\partial a^T B^{-1}a}{\partial B}$$ where $a$ is a vector and $B$ is a matrix.

Second Updates (solved)

I'd like to first thanks people who spending time resolving my concerns, very appreciate. I found the mistake I've made in the prior calculations, and seems like we can get the optimal $\Sigma_k$ without considering the effects of the inverse matrix, here's what I did:

Note $\mathcal{N}$ represents a multivariate distribution and $|\Sigma_k| = \frac{1}{|\Sigma_k^{-1}|}$, the function can be written as: $$f = -\frac{1}{2} \sum^{N} \sum^{K} \gamma(z_{nk}) \Big[(x_n - \mu_k)^T \Sigma_k^{-1} (x_n - \mu_k) - \ln |\Sigma_k^{-1}| \Big] + constant$$

since we have the following identities: $$\nabla_X |X| = |X| (X^{-1})^T$$ $$\nabla_X a^TXa = aa^T$$ thus, $$\nabla_{\Sigma_k}f = \sum^N \gamma(z_{nk}) \Big[ (x_n - \mu_k) (x_n - \mu_k)^T - \frac{1}{|\Sigma_k^{-1}|} |\Sigma_k^{-1}| \Sigma^T_k \Big] = 0$$ solve $\Sigma_k$, we have $$\mathbf{\Sigma}_k = \frac{1}{N_k} \sum_{n=1}^N \gamma(z_{nk}) (\mathbf{x}_n-\mathbf{\mu}_k)(\mathbf{x}_n-\mathbf{\mu}_k)^\top$$ where $N_k = \sum^N \gamma(z_{nk})$

Original Q&A

There are 2 best solutions below

Bumbble Comm On 04 Jun 2017 - 12:17

Let function $f : \mbox{GL}_n (\mathbb R) \to \mathbb R$ be defined by

$$f (\mathrm X) := \mathrm a^{\top} \mathrm X^{-1} \mathrm a$$

where $\mathrm a \in \mathbb R^n$ is given. Hence,

$$\begin{array}{rl} f (\mathrm X + h \mathrm V) &= \mathrm a^{\top} (\mathrm X + h \mathrm V)^{-1} \mathrm a\\ &= \mathrm a^{\top} (\mathrm I_n + h \mathrm X^{-1} \mathrm V)^{-1} \mathrm X^{-1} \mathrm a\\ &\approx \mathrm a^{\top} (\mathrm I_n - h \mathrm X^{-1} \mathrm V) \mathrm X^{-1} \mathrm a\\ &= f (\mathrm X) - h \, \mathrm a^{\top} \mathrm X^{-1} \mathrm V \mathrm X^{-1} \mathrm a\\ &= f (\mathrm X) - h \, \mbox{tr} \left( \mathrm X^{-1} \mathrm a \mathrm a^{\top} \mathrm X^{-1} \mathrm V \right)\\ &= f (\mathrm X) + h \left\langle \color{blue}{-\mathrm X^{-\top} \mathrm a \mathrm a^{\top} \mathrm X^{-\top}} , \mathrm V \right\rangle \end{array}$$

Thus, the gradient of $f$ with respect to $\rm X$ is $\color{blue}{-\mathrm X^{-\top} \mathrm a \mathrm a^{\top} \mathrm X^{-\top}}$.

**Bumbble Comm** · Accepted Answer

To address your simplified question, first note that $$BB^{-1}=I$$ Taking differentials $$\eqalign{ dB\,B^{-1} + B\,dB^{-1} &= 0 \cr B\,dB^{-1} &= -dB\,B^{-1} \cr dB^{-1} &= -B^{-1}\,dB\,B^{-1} \cr\cr }$$ Now take the differential of the function and substitute the above result $$\eqalign{ f &= a^TB^{-1}a \cr &= aa^T:B^{-1} \cr\cr df &= aa^T:dB^{-1} \cr &= -aa^T:B^{-1}\,dB\,B^{-1} \cr &= -B^{-T}aa^TB^{-T}:dB \cr\cr \frac{\partial f}{\partial B} &= -B^{-T}aa^TB^{-T} \cr }$$where a colon represents the inner/Frobenius product, i.e. $$X:Y={\rm tr}\big(X^TY\big)$$

Calculate the derivative for the EM algorithm

There are 2 best solutions below

Related Questions in MATRICES

Related Questions in MULTIVARIABLE-CALCULUS

Related Questions in DERIVATIVES

Related Questions in MATRIX-CALCULUS

Related Questions in SCALAR-FIELDS

Trending Questions

Popular # Hahtags

Popular Questions