Derivation of derivative of multivariate Gaussian w.r.t. covariance matrix

Question

Derivation of derivative of multivariate Gaussian w.r.t. covariance matrix

4.1k Views Asked by Bumbble Comm At 25 Mar 2026 - 11:24

I'm reading a paper, probabilistic CCA, in which the authors state derivatives without showing derivations. I would like step-by-step derivations to convince myself. Consider a $d$-dimensional multivariate Gaussian random variable:

$$ \textbf{x} \sim \mathcal{N}(\boldsymbol{\mu}, \Sigma) $$

In probabilistic CCA, we define $\Sigma = W W^{\top} + \Psi$, where $W \in \mathbb{R}^{d \times q}$ and $\Psi \in \mathbb{R}^{d \times d}$. I'd like to compute the derivative w.r.t. $\boldsymbol{\mu}$, $W$, and $\Psi$ for the negative log-likelihood.

The stationary point for $\boldsymbol{\mu}$ is just the empirical mean (shown below*) or $\hat{\boldsymbol{\mu}}$. Plugging in the minimum for the parameter $\boldsymbol{\mu}$ into the negative log-likelihood, we get:

$$ \frac{\partial \mathcal{L}}{\partial W} = \frac{\partial}{\partial W} \Big\{ \overbrace{ \frac{1}{2} \sum_{i=1}^{n}(\textbf{x}_i - \hat{\boldsymbol{\mu}})^{\top} \Sigma^{-1} (\textbf{x}_i - \hat{\boldsymbol{\mu}}) }^{A} + \overbrace{\frac{n}{2} \ln |\Sigma|}^{B} + \overbrace{\text{const}}^{C} \Big\} $$

Clearly, $C = 0$. But I'm not sure how to handle $A$ and $B$, particularly since $\Sigma = W W^{\top} + \Psi$.

*Derivative w.r.t. $\boldsymbol{\mu}$

The negative log-likelihood is:

$$ \mathcal{L} = \frac{1}{2} \sum_{i=1}^{n}(\textbf{x}_i - \boldsymbol{\mu})^{\top} \Sigma^{-1} (\textbf{x}_i - \boldsymbol{\mu}) + \frac{n}{2} \ln |\Sigma| + \text{const} $$

The derivative of the two rightmost terms with respect to $\boldsymbol{\mu}$ is $0$, meaning we just need to compute:

$$ \frac{\partial}{\partial \boldsymbol{\mu}} \Big\{ \frac{1}{2} \sum_{i=1}^{n}(\textbf{x}_i - \boldsymbol{\mu})^{\top} \Sigma^{-1} (\textbf{x}_i - \boldsymbol{\mu}) \Big\} = 0 $$

By the linearity of differentiation, we have:

$$ \frac{1}{2} \sum_{i=1}^{n} \frac{\partial}{\partial \boldsymbol{\mu}} \Big\{ (\textbf{x}_i - \boldsymbol{\mu})^{\top} \Sigma^{-1} (\textbf{x}_i - \boldsymbol{\mu}) \Big\} = 0 $$

Using Equation ($86$) from the Matrix Cookbox, we get:

$$ \frac{1}{2} \sum_{i=1}^{n} \Big\{ -2 \Sigma^{-1} (\textbf{x}_i - \boldsymbol{\mu}) \Big\} = 0 $$

Finally, solve for $\boldsymbol{\mu}$, we get:

$$ \begin{align} 0 &= \frac{1}{2} \sum_{i=1}^{n} \Big\{ -2 \Sigma^{-1} (\textbf{x}_i - \boldsymbol{\mu}) \Big\} \\ &= - \sum_{i=1}^{n} \Big\{ \Sigma^{-1} \textbf{x}_i - \Sigma^{-1} \boldsymbol{\mu} \Big\} \\ &= - \sum_{i=1}^{n} \Big\{ \Sigma^{-1} \textbf{x}_i \Big\} + n \Sigma^{-1} \boldsymbol{\mu} \\ - n \Sigma^{-1} \boldsymbol{\mu} &= - \Sigma^{-1} \sum_{i=1}^{n} \textbf{x}_i \\ \boldsymbol{\mu} &= \frac{1}{n} \sum_{i=1}^{n} \textbf{x}_i \end{align} $$

And we're done.

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

All those Greek letters are a pain to type, so let's use these variables $$\eqalign{ S = \Sigma,\,\,\,P = \Phi,\,\,\,L={\mathcal L},\,\,\,Z = (X-\mu 1) \cr }$$ where $X$ is the matrix whose columns are the $x_i$ vectors, and $(\mu 1)$ is a matrix all of whose elements are equal to $\mu$.

Further, let's use a colon to denote the trace/Frobenius product $$A:B = {\rm tr}(A^TB)$$ Write the objective function in terms of the Frobenius product and these new variables. Then find its differential and gradients. $$\eqalign{ L &= \tfrac{n}{2}\log(\det(S)) + \tfrac{1}{2}ZZ^T:S^{-1} + K \cr dL &= \tfrac{n}{2}{\rm tr\,}(d\log(S)) + \tfrac{1}{2}ZZ^T:dS^{-1} + 0 \cr &= \frac{1}{2}\Big(nS^{-1} - S^{-1}ZZ^TS^{-1}\Big):dS \cr &= \frac{1}{2}\Big(nS^{-1} - S^{-1}ZZ^TS^{-1}\Big):d(WW^T+P) \cr &= \frac{1}{2}\Big(nS^{-1} - S^{-1}ZZ^TS^{-1}\Big):(dW\,W^T+ W\,dW^T+dP) \cr }$$ Setting $dW=0$ yields the gradient wrt $P$ $$\eqalign{ dL &= \frac{1}{2}\Big(nS^{-1} - S^{-1}ZZ^TS^{-1}\Big):dP \cr \frac{\partial L}{\partial P} &= \frac{1}{2}\Big(nS^{-1} - S^{-1}ZZ^TS^{-1}\Big)\cr }$$ While setting $dP=0$ recovers the gradient wrt $W$ $$\eqalign{ dL &= \frac{1}{2}\Big(nS^{-1} - S^{-1}ZZ^TS^{-1}\Big):(dW\,W^T+ W\,dW^T) \cr &= \Big(nS^{-1} - S^{-1}ZZ^TS^{-1}\Big)W:dW \cr \frac{\partial L}{\partial W} &= \Big(nS^{-1} - S^{-1}ZZ^TS^{-1}\Big)W \cr }$$ In several of the steps, we've made use of the fact that $S$ is symmetric.

Derivation of derivative of multivariate Gaussian w.r.t. covariance matrix

*Derivative w.r.t. $\boldsymbol{\mu}$

There are 1 best solutions below

Related Questions in STATISTICS

Related Questions in DERIVATIVES

Related Questions in PARTIAL-DERIVATIVE

Related Questions in MATRIX-CALCULUS

Trending Questions

Popular # Hahtags

Popular Questions