Understanding some linear algebra for KL derivation

Question

Understanding some linear algebra for KL derivation

121 Views Asked by Bumbble Comm At 11 May 2026 - 12:11

Having some trouble understanding this proof in certain steps, even after trying to consult the matrix cookbook.

For two multivariate Gaussians $P_1, P_2 \in R^n$:

$KLD(P_1 || P_2) = E_{P_1}[\log P_1 - \log P_2]$

$= \frac{1}{2} E_{P_1}[-\log \det\Sigma_1 - (x - \mu _1)^T\Sigma_{1}^{-1}(x - \mu_1) + \log\det\Sigma_2 + (x - \mu _2)^T\Sigma_{2}^{-1}(x - \mu_2)]$

$= \frac{1}{2}\log \frac{\det\Sigma_2}{\det\Sigma_1} + \frac{1}{2}E_{P_1}[- (x - \mu _1)^T\Sigma_{1}^{-1}(x - \mu_1) + (x - \mu _2)^T\Sigma_{2}^{-1}(x - \mu_2)]$

$= \frac{1}{2}\log \frac{\det\Sigma_2}{\det\Sigma_1} + \frac{1}{2}E_{P_1}[tr (\Sigma_{1}^{-1}(x - \mu_1)(x - \mu _1)^T) + tr(\Sigma_{2}^{-1}(x - \mu_2)(x - \mu _2)^T)]$

$= \frac{1}{2}\log \frac{\det\Sigma_2}{\det\Sigma_1} + \frac{1}{2}E_{P_1}[tr (\Sigma_{1}^{-1}\Sigma_{1}) + tr(\Sigma_{2}^{-1}(xx^T - 2x\mu^{T}_{2} + \mu_2\mu_{2}^T)]$

Why does $(x-\mu)(x-\mu) = \Sigma_1$?

$= \frac{1}{2}\log \frac{\det\Sigma_2}{\det\Sigma_1} + \frac{1}{2}n + \frac{1}{2} tr(\Sigma_{2}^{-1}(\Sigma_1 + \mu_1\mu_{1}^T - 2\mu_2\mu^{T}_{1} + \mu_2\mu_{2}^T)]$

What rule gets rid of the EV?

$= \frac{1}{2}(\log \frac{\det\Sigma_2}{\det\Sigma_1} - n + tr(\Sigma_{2}^{-1}(\Sigma_1) + tr(\mu_{1}^T\Sigma_{2}^{-1}\mu_1 - 2\mu_{1}^T\Sigma_{2}^{-1}\mu_2 + \mu_{2}^T\Sigma_{2}^{-1}\mu_2)$

$= \frac{1}{2}(\log \frac{\det\Sigma_2}{\det\Sigma_1} - n + tr(\Sigma_{2}^{-1}(\Sigma_1) + (\mu_{2}-\mu_1)^T\Sigma_{2}^{-1}(\mu_{2}-\mu_1))$

How do you reduce (what is the rule) that last term?

Thanks

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

"Why does $(x-\mu)(x-\mu) = \Sigma_1$?"

It doesn't, what you do have is $$ E_{P_1}[(X_1-\mu_1)(X_1-\mu_1)^T] = \Sigma_1, $$ and this is the definition of the covariance matrix $\Sigma_1$. This gives you the step $$ \begin{align} E_{P_1}\left[\operatorname{Tr}\left(\Sigma_{1}^{-1}(X_1-\mu_1)(X_1-\mu_1)^T \right)\right] &= \operatorname{Tr}\left(\Sigma_{1}^{-1}E_{P_1}\left[(X_1-\mu_1)(X_1-\mu_1)^T \right]\right) = \operatorname{Tr}(\Sigma_{1}^{-1}\Sigma_1). \end{align} $$

"What rule gets rid of the EV?"

The rule is simply taking the expected value, and using the fact that the expectation and trace operator are interchangeable, also recall that $$ \Sigma_{1} = E_{P_1}[X_1 X_1^T] - \mu_1\mu_1^T, $$ then $$ \begin{align} E_{P_1}\left[\operatorname{Tr}\left(\Sigma_{2}^{-1}(X_1X_1^T-2X_1\mu_2^T+\mu_2\mu_2^T\right) \right] &= \operatorname{Tr}\left(\Sigma_{2}^{-1}E_{P_1}\left[X_1X_1^T - 2X_1\mu_2^T + \mu_2\mu_2^T\right]\right) \\ &=\operatorname{Tr}\left(\Sigma_{2}^{-1}\left[\Sigma_1 + \mu_1\mu_1^T - 2\mu_1\mu_2^T + \mu_2\mu_2^T\right]\right) \end{align} $$ where I have repeatedly used the linearity of expectation.

How do you reduce the last term?

The rule used at the end is the trace trick, which allows us to write for instance $$ \begin{align*} \operatorname{Tr}(\Sigma_{2}^{-1}\mu_1\mu_2^T) &= \operatorname{Tr}(\mu_2^T\Sigma_{2}^{-1}\mu_1) \\ &= \mu_2^T\Sigma_{2}^{-1}\mu_1\\ &= \mu_1^T\Sigma_{2}^{-1}\mu_2 \\ &=\operatorname{Tr}(\mu_1^T\Sigma_{2}^{-1}\mu_2). \end{align*} $$ and combine this with the quadratic expansion $$ (\mu_2-\mu_1)^T\Sigma_{2}^{-1}(\mu_2-\mu_1) = \mu_2^T\Sigma_2^{-1}\mu_2 - 2\mu_2^T\Sigma_{2}^{-1}\mu_1 + \mu_1^T\Sigma_{2}^{-1}\mu_1. $$

That should be all you need to follow the steps involved.

Understanding some linear algebra for KL derivation

There are 1 best solutions below

Related Questions in LINEAR-ALGEBRA

Related Questions in MATRICES

Related Questions in DIVERGENCE-OPERATOR

Trending Questions

Popular # Hahtags

Popular Questions