I'm reading a paper, probabilistic CCA, in which the authors state derivatives without showing derivations. I would like step-by-step derivations to convince myself. Consider a $d$-dimensional multivariate Gaussian random variable:
$$ \textbf{x} \sim \mathcal{N}(\boldsymbol{\mu}, \Sigma) $$
In probabilistic CCA, we define $\Sigma = W W^{\top} + \Psi$, where $W \in \mathbb{R}^{d \times q}$ and $\Psi \in \mathbb{R}^{d \times d}$. I'd like to compute the derivative w.r.t. $\boldsymbol{\mu}$, $W$, and $\Psi$ for the negative log-likelihood.
The stationary point for $\boldsymbol{\mu}$ is just the empirical mean (shown below*) or $\hat{\boldsymbol{\mu}}$. Plugging in the minimum for the parameter $\boldsymbol{\mu}$ into the negative log-likelihood, we get:
$$ \frac{\partial \mathcal{L}}{\partial W} = \frac{\partial}{\partial W} \Big\{ \overbrace{ \frac{1}{2} \sum_{i=1}^{n}(\textbf{x}_i - \hat{\boldsymbol{\mu}})^{\top} \Sigma^{-1} (\textbf{x}_i - \hat{\boldsymbol{\mu}}) }^{A} + \overbrace{\frac{n}{2} \ln |\Sigma|}^{B} + \overbrace{\text{const}}^{C} \Big\} $$
Clearly, $C = 0$. But I'm not sure how to handle $A$ and $B$, particularly since $\Sigma = W W^{\top} + \Psi$.
*Derivative w.r.t. $\boldsymbol{\mu}$
The negative log-likelihood is:
$$ \mathcal{L} = \frac{1}{2} \sum_{i=1}^{n}(\textbf{x}_i - \boldsymbol{\mu})^{\top} \Sigma^{-1} (\textbf{x}_i - \boldsymbol{\mu}) + \frac{n}{2} \ln |\Sigma| + \text{const} $$
The derivative of the two rightmost terms with respect to $\boldsymbol{\mu}$ is $0$, meaning we just need to compute:
$$ \frac{\partial}{\partial \boldsymbol{\mu}} \Big\{ \frac{1}{2} \sum_{i=1}^{n}(\textbf{x}_i - \boldsymbol{\mu})^{\top} \Sigma^{-1} (\textbf{x}_i - \boldsymbol{\mu}) \Big\} = 0 $$
By the linearity of differentiation, we have:
$$ \frac{1}{2} \sum_{i=1}^{n} \frac{\partial}{\partial \boldsymbol{\mu}} \Big\{ (\textbf{x}_i - \boldsymbol{\mu})^{\top} \Sigma^{-1} (\textbf{x}_i - \boldsymbol{\mu}) \Big\} = 0 $$
Using Equation ($86$) from the Matrix Cookbox, we get:
$$ \frac{1}{2} \sum_{i=1}^{n} \Big\{ -2 \Sigma^{-1} (\textbf{x}_i - \boldsymbol{\mu}) \Big\} = 0 $$
Finally, solve for $\boldsymbol{\mu}$, we get:
$$ \begin{align} 0 &= \frac{1}{2} \sum_{i=1}^{n} \Big\{ -2 \Sigma^{-1} (\textbf{x}_i - \boldsymbol{\mu}) \Big\} \\ &= - \sum_{i=1}^{n} \Big\{ \Sigma^{-1} \textbf{x}_i - \Sigma^{-1} \boldsymbol{\mu} \Big\} \\ &= - \sum_{i=1}^{n} \Big\{ \Sigma^{-1} \textbf{x}_i \Big\} + n \Sigma^{-1} \boldsymbol{\mu} \\ - n \Sigma^{-1} \boldsymbol{\mu} &= - \Sigma^{-1} \sum_{i=1}^{n} \textbf{x}_i \\ \boldsymbol{\mu} &= \frac{1}{n} \sum_{i=1}^{n} \textbf{x}_i \end{align} $$
And we're done.
All those Greek letters are a pain to type, so let's use these variables $$\eqalign{ S = \Sigma,\,\,\,P = \Phi,\,\,\,L={\mathcal L},\,\,\,Z = (X-\mu 1) \cr }$$ where $X$ is the matrix whose columns are the $x_i$ vectors, and $(\mu 1)$ is a matrix all of whose elements are equal to $\mu$.
Further, let's use a colon to denote the trace/Frobenius product $$A:B = {\rm tr}(A^TB)$$ Write the objective function in terms of the Frobenius product and these new variables. Then find its differential and gradients. $$\eqalign{ L &= \tfrac{n}{2}\log(\det(S)) + \tfrac{1}{2}ZZ^T:S^{-1} + K \cr dL &= \tfrac{n}{2}{\rm tr\,}(d\log(S)) + \tfrac{1}{2}ZZ^T:dS^{-1} + 0 \cr &= \frac{1}{2}\Big(nS^{-1} - S^{-1}ZZ^TS^{-1}\Big):dS \cr &= \frac{1}{2}\Big(nS^{-1} - S^{-1}ZZ^TS^{-1}\Big):d(WW^T+P) \cr &= \frac{1}{2}\Big(nS^{-1} - S^{-1}ZZ^TS^{-1}\Big):(dW\,W^T+ W\,dW^T+dP) \cr }$$ Setting $dW=0$ yields the gradient wrt $P$ $$\eqalign{ dL &= \frac{1}{2}\Big(nS^{-1} - S^{-1}ZZ^TS^{-1}\Big):dP \cr \frac{\partial L}{\partial P} &= \frac{1}{2}\Big(nS^{-1} - S^{-1}ZZ^TS^{-1}\Big)\cr }$$ While setting $dP=0$ recovers the gradient wrt $W$ $$\eqalign{ dL &= \frac{1}{2}\Big(nS^{-1} - S^{-1}ZZ^TS^{-1}\Big):(dW\,W^T+ W\,dW^T) \cr &= \Big(nS^{-1} - S^{-1}ZZ^TS^{-1}\Big)W:dW \cr \frac{\partial L}{\partial W} &= \Big(nS^{-1} - S^{-1}ZZ^TS^{-1}\Big)W \cr }$$ In several of the steps, we've made use of the fact that $S$ is symmetric.