Showing that if the KL divergence between two multivariate Normal distributions is zero then their covariances and means are equal

1.1k Views Asked by At

We have two, k-dimensional multivariate normal distributions $\mathcal{N}_0(\mu_0,\Sigma_0)$ and $\mathcal{N}_1(\mu_1,\Sigma_1)$ with means $\mu_0$ and $\mu_1$ and covariances $\Sigma_0$ and $\Sigma_1$. The KL divergence between the two distributions $KL(\mathcal{N}_0||\mathcal{N}_1)$ is (from wiki (here), also here):

$KL(\mathcal{N}_0||\mathcal{N}_1)=\frac{1}{2}\left(tr(\Sigma_1^{-1}\Sigma_0)+(\mu_1 - \mu_0)^T\Sigma_1^{-1}(\mu_1 - \mu_0)-k+ln(\frac{det\Sigma_1}{det\Sigma_0})\right)$

It is well known that the KL divergence is positive in general and that $KL(p||q)=0$ implies $p=q$ (e.g. Gibbs inequality wiki).

Now, obviously $\mathcal{N}_0=\mathcal{N}_1$ means that $\mu_1 = \mu_0$ and $\Sigma_1 = \Sigma_0$, and it is easy to confirm that the KL expression above is indeed zero in this case.

However, can we go the other way around and use only the KL expression above to get that $\mu_1 = \mu_0$ and $\Sigma_1 = \Sigma_0$?

I.e. how do we show that $KL(\mathcal{N}_0||\mathcal{N}_1)=\frac{1}{2}\left(tr(\Sigma_1^{-1}\Sigma_0)+(\mu_1 - \mu_0)^T\Sigma_1^{-1}(\mu_1 - \mu_0)-k+ln(\frac{det\Sigma_1}{det\Sigma_0})\right) = 0 \\ \implies \mu_1 = \mu_0, \Sigma_1 = \Sigma_0$

and that this is the only solution?

1

There are 1 best solutions below

4
On

It's a classic proof (usually done via log-sum inequality, or Jensen - see eg) that, in general, $KL(p ||q) \ge 0$ and (as a corolary) that $KL(p ||q) = 0 \iff p=q$ (most precisely, almost everywhere).

In your case, the latter is equivalent to having the same mean and covariance matrix.


Ok, I'll bite. Let's prove that $$tr(\Sigma_1^{-1}\Sigma_0)+\ln(\frac{\det\Sigma_1}{\det\Sigma_0})\ge k \tag{1}$$

with equality only for $\Sigma_1 = \Sigma_0$.

Letting $C=\Sigma_1^{-1}\Sigma_0$ , and noting that $\Sigma_0$ and $\Sigma_1$ (and hence also $C$) are symmetric and positive definite, we can write the LHS as

$$ tr(C) + \ln(\det C^{-1}) = tr(C) - \ln (\det C)=\sum_i \lambda_i - \ln \prod \lambda_i= \sum_i (\lambda_i - \ln \lambda_i) \tag{2}$$

where $\lambda_i \in (0,+\infty)$ are the eigenvalues of $C$.

But $x - \ln x \ge 1$, for all $x>0$ with equality only when $x=1$.

Then $$ tr(C) + \ln(\det C^{-1}) \ge k \tag{3}$$

with equality only if all eigenvalues are $1$, i.e. if $C=I$, i.e. if $\Sigma_1 = \Sigma_0$.

Hence $(1)$ is proved.

To complete the proof: $w^T \Sigma_1^{-1} w \ge 0$, with equality only when $w = 0$ (positive definite quadratic form). Hence the desired property follows: $KL(\mathcal{N}_0||\mathcal{N}_1) =0 \iff \Sigma_1 = \Sigma_0, \mu_1 = \mu_0$

Notice that all this requires that the covariances matrices are strictly positive definite, i.e. that the densities are not degenerate.