I have the following function $\mathcal{L}(W)$ and I want to find the gradient with respect to $W$, but I'm struggling with the matrix operations and derivations.
$$\mathcal{L}(W) := -\frac{n}{2}\left\{ d\ln(2\pi) + \ln|C| + \mbox{Tr}(C^{-1}S) \right\}$$
where $C := WW^T + \sigma^2I$. You can consider $S$, which is positive definite, as constant.
The gradient should be
$$\nabla_W \mathcal{L}(W) = -n \left( C^{-1} S C^{-1} W - C^{-1} W \right)$$
but I cannot understand the steps that lead to it.
If someone is interested, this is the probabilistic PCA, and you can find more information here.
Use a colon as a convenient product notation for the trace, i.e. $\;A:B = {\rm Tr}(A^TB)$.
Define the variables $$\eqalign{ C &= WW^T +\sigma^2I &\implies dC = W\,dW^T+dW\,W^T \\ I &= C^{-1}C &\implies dC^{-1} = -C^{-1}\,dC\,C^{-1} \\ {\cal J} &= -\frac{n}{2}\log(2\pi) &\implies d{\cal J} = 0 \\ }$$ Write the objective function in terms of these new variables.
Then calculate the differential and gradient. $$\eqalign{ {\cal L} &= {\cal J} - \frac{n}{2}\Big(S:C^{-1} + \log(\det(C))\Big) \\ d{\cal L} &= 0-\frac{n}{2}\Big(S:dC^{-1} + C^{-1}:dC\Big) \\ &= \frac{n}{2}\Big(C^{-1}SC^{-1} - C^{-1}\Big):dC \\ &= \frac{n}{2}\Big(C^{-1}SC^{-1} - C^{-1}\Big):(W\,dW^T+dW\,W^T) \\ &= n\Big(C^{-1}SC^{-1} - C^{-1}\Big):dW\,W^T \\ &= n\Big(C^{-1}SC^{-1} - C^{-1}\Big)W:dW \\ \frac{\partial {\cal L}}{\partial W} &= n(C^{-1}SC^{-1} - C^{-1})W \\ }$$ Terms in a colon product can be rearranged in accordance with the properties of the trace, e.g. $${(A:BC) \;=\; (B^TA:C) \;=\; (AC^T:B)}$$ which was used to rearrange some of the lines in the above derivation.
NB: The well-known gradient of $\log(\det(X))$ was used without any derivation.