Derivation of the von Neumann divergence $d_{vN}$

243 Views Asked by At

In slide 16 of the work "Kernel Learning with Bregman Matrix Divergences" [1], it is shown that for matrices $X$ and $Y$, s.t. $X=V\Lambda V^T$ and $Y=U\Phi U ^T$, the von Neumann divergence can be written as:

$$D_{vN} = tr(X log X - X log Y - X + Y)$$ $$= \sum_i \lambda_i log \lambda_i - \sum_{i,j} (v_i^T u_j )\lambda_i log \theta_j -\sum_i (\lambda_i -\theta_i)$$

What I don't understand is how the author reaches the second line from the first one.

Remark: The author assumes to sort the eigenvalues for both matrices [2], Theorem 6 page 11.

[1] https://www.cs.utexas.edu/users/inderjit/Talks/stanford_kernel.pdf

[2] https://www.jmlr.org/papers/volume10/kulis09a/kulis09a.pdf