I was working on some exercises to prepare myself for the machine learning coming exam. This is a question on EM algorithm. I want to solve the optimal value for parameter $\Sigma$.
Maximize a function: $$fun_{\pi,\mu,\Sigma} = \sum^{N} \sum^{K} \gamma(z_{nk}) [\ln \pi_k + \ln \mathcal{N}(\mathbf{\mathcal{x_n}};\mathbf{\mu_k},\mathbf{\Sigma_k})]$$ $\gamma(z_{nk}) \in \mathcal{R}$ is a fixed item, $\pi_k \in \mathcal{R}$. I am interested in the optimal value of $\Sigma_k$.
The calculation I tried:
rewrite the function and we are interested in optimizing $\Sigma$, we have
$$f = -\frac{1}{2} \sum^{N} \sum^{K} \gamma(z_{nk})(x_n - \mu_k)^T \Sigma_k^{-1} (x_n - \mu_k) + constant$$
we have, $$\frac{\partial a^TXa}{\partial X_{ij}} = \frac{\partial \sum_i \sum_j a_{1i}X_{ij}a_{j1}}{\partial X_{ij}} \Rightarrow \frac{\partial a^TXa}{\partial X} = a \ a^T$$ thus, $$\frac{\partial f}{\partial \Sigma_k^{-1}}=-\frac{1}{2} \sum^N \gamma(z_{nk})(x_n -\mu_k)(x_n - \mu_k)^T$$
this is a computer science course, we haven't cover anything of taking derivative on an inverse matrix, I am not sure where to go for the next step. I mean how to take derivative on $\Sigma_k$.
Also, I suspect I've made errors somewhere, as at this stage my result seems a bit off comparing to the final solution $$\mathbf{\Sigma}_k = \frac{1}{N_k} \sum_{n=1}^N \gamma(z_{nk}) (\mathbf{x}_n-\mathbf{\mu}_k)(\mathbf{x}_n-\mathbf{\mu}_k)^\top$$
Updates
To simplify the question, what I want to ask is how to calculate $$\frac{\partial a^T B^{-1}a}{\partial B}$$ where $a$ is a vector and $B$ is a matrix.
Second Updates (solved)
I'd like to first thanks people who spending time resolving my concerns, very appreciate. I found the mistake I've made in the prior calculations, and seems like we can get the optimal $\Sigma_k$ without considering the effects of the inverse matrix, here's what I did:
Note $\mathcal{N}$ represents a multivariate distribution and $|\Sigma_k| = \frac{1}{|\Sigma_k^{-1}|}$, the function can be written as: $$f = -\frac{1}{2} \sum^{N} \sum^{K} \gamma(z_{nk}) \Big[(x_n - \mu_k)^T \Sigma_k^{-1} (x_n - \mu_k) - \ln |\Sigma_k^{-1}| \Big] + constant$$
since we have the following identities: $$\nabla_X |X| = |X| (X^{-1})^T$$ $$\nabla_X a^TXa = aa^T$$ thus, $$\nabla_{\Sigma_k}f = \sum^N \gamma(z_{nk}) \Big[ (x_n - \mu_k) (x_n - \mu_k)^T - \frac{1}{|\Sigma_k^{-1}|} |\Sigma_k^{-1}| \Sigma^T_k \Big] = 0$$ solve $\Sigma_k$, we have $$\mathbf{\Sigma}_k = \frac{1}{N_k} \sum_{n=1}^N \gamma(z_{nk}) (\mathbf{x}_n-\mathbf{\mu}_k)(\mathbf{x}_n-\mathbf{\mu}_k)^\top$$ where $N_k = \sum^N \gamma(z_{nk})$
To address your simplified question, first note that $$BB^{-1}=I$$ Taking differentials $$\eqalign{ dB\,B^{-1} + B\,dB^{-1} &= 0 \cr B\,dB^{-1} &= -dB\,B^{-1} \cr dB^{-1} &= -B^{-1}\,dB\,B^{-1} \cr\cr }$$ Now take the differential of the function and substitute the above result $$\eqalign{ f &= a^TB^{-1}a \cr &= aa^T:B^{-1} \cr\cr df &= aa^T:dB^{-1} \cr &= -aa^T:B^{-1}\,dB\,B^{-1} \cr &= -B^{-T}aa^TB^{-T}:dB \cr\cr \frac{\partial f}{\partial B} &= -B^{-T}aa^TB^{-T} \cr }$$where a colon represents the inner/Frobenius product, i.e. $$X:Y={\rm tr}\big(X^TY\big)$$