How to show the derivative of $f(M)=\text{Tr}(M\log (M) -M)$ is $\log (M)$?

219 Views Asked by At

Let $M$ be a positive definite matrix in $\mathbb{S}_+^n$. Let $\log$ be natural matrix logarithm which $\log(M)$ is defined as $\log(M)=\sum_{i=1}^{n}\log(\lambda_i)v_iv_i^T$ where $(\lambda_i,v_i)$ are eigenpairs of $M$.

This function is called negative Von Neumann entropy or negative Quantum entropy.

How can we show the derivative of $f(M)=\text{Tr}(M\log M -M)$ is $\log M?$

4

There are 4 best solutions below

8
On BEST ANSWER

Since $M\in\mathbb{S}_+^n$, there must exist a orthogonal matrix $U$ and a diagonal matrix $\Lambda=\text{diag}\left\{\lambda_1,\lambda_2,...,\lambda_n\right\}$, with each $\lambda_j>0$, such that $$ M=U\Lambda U^{\top}. $$ Hence, using the definition of $\log M$, \begin{align} M\log M-M&=\left(U\Lambda U^{\top}\right)\left(U\log\Lambda\,U^{\top}\right)-U\Lambda U^{\top}\\ &=U\left(\Lambda\log\Lambda-\Lambda\right)U^{\top}. \end{align} Consequently, \begin{align} f(M)&=\text{tr}\left(M\log M-M\right)\\ &=\text{tr}\left(U\left(\Lambda\log\Lambda-\Lambda\right)U^{\top}\right)\\ &=\text{tr}\left(\left(\Lambda\log\Lambda-\Lambda\right)U^{\top}U\right)\\ &=\text{tr}\left(\Lambda\log\Lambda-\Lambda\right)\\ &=\sum_{j=1}^n\lambda_j\left(\log\lambda_j-1\right). \end{align} Therefore, it follows that $$ {\rm d}f(M)=\sum_{j=1}^n\log\lambda_j\,{\rm d}\lambda_j=\text{tr}\left(\log\Lambda\,{\rm d}\Lambda\right). $$

On the other hand, \begin{align} \log M\,{\rm d}M&=\left(U\log\Lambda\,U^{\top}\right){\rm d}\left(U\Lambda U^{\top}\right)\\ &=\left(U\log\Lambda\,U^{\top}\right)\left({\rm d}U\Lambda U^{\top}+U{\rm d}\Lambda\,U^{\top}+U\Lambda\,{\rm d}U^{\top}\right)\\ &=U\log\Lambda\,U^{\top}\,{\rm d}U\Lambda U^{\top}+U\log\Lambda\,{\rm d}\Lambda U^{\top}+U\log\Lambda\,\Lambda\,{\rm d}U^{\top}. \end{align} Take the trace of both sides, and the first and the last term on the right-hand side cancel out (to be explained soon). Thus we obtain \begin{align} \text{tr}\left(\log M\,{\rm d}M\right)&=\text{tr}\left(U\log\Lambda\,{\rm d}\Lambda U^{\top}\right)\\ &=\text{tr}\left(\log\Lambda\,{\rm d}\Lambda U^{\top}U\right)\\ &=\text{tr}\left(\log\Lambda\,{\rm d}\Lambda\right)\\ &={\rm d}f(M). \end{align} This result implies that, in the form, $$ \frac{\partial}{\partial M}f(M)=\log M. $$


In this appendix, let us explain the cancel-out of the two traces. In fact, \begin{align} \text{tr}\left(U\log\Lambda\,U^{\top}\,{\rm d}U\Lambda U^{\top}\right)&=\text{tr}\left({\rm d}U\Lambda U^{\top}\,U\log\Lambda\,U^{\top}\right)\\ &=\text{tr}\left({\rm d}U\Lambda\log\Lambda\,U^{\top}\right)\\ &=\text{tr}\left({\rm d}U\log\Lambda\,\Lambda\,U^{\top}\right)\\ &=\text{tr}\left(\log\Lambda\,\Lambda\,U^{\top}{\rm d}U\right)\\ &=-\text{tr}\left(\log\Lambda\,\Lambda\,{\rm d}U^{\top}\,U\right)\\ &=-\text{tr}\left(U\log\Lambda\,\Lambda\,{\rm d}U^{\top}\right), \end{align} where we have repeatedly used the identity $$ \text{tr}\left(AB\right)=\text{tr}\left(BA\right) $$ for square matrices $A$ and $B$, the orthogonality $$ U^{\top}U=UU^{\top}=I_n, $$ and its differentiation $$ \mathbf{0}={\rm d}\left(U^{\top}U\right)={\rm d}U^{\top}\,U+U^{\top}\,{\rm d}U. $$

0
On

Well. Observe \begin{align} \operatorname{Tr}(M\log M-M) = \operatorname{Tr}(M\log M)-\operatorname{Tr}(M). \end{align} Using the Gateaux variation, observe that \begin{align} \operatorname{Tr}\left(N\frac{d\operatorname{Tr}M}{dM} \right)=\frac{d}{dt}\operatorname{Tr}(M+tN)\big|_{t=0} = \frac{d}{dt} \sum_{i=1}^n(M_{ii}+tN_{ii})\bigg|_{t=0} = \operatorname{Tr}(N) \end{align} for any $N$. Hence $\frac{d\operatorname{Tr}M}{dM}=I$.

Likewise, we see that \begin{align} \operatorname{Tr}\left(N\frac{d\operatorname{Tr}[M\log M]}{dM} \right) = \frac{d}{dt}\operatorname{Tr}[(M+tN)\log(M+tN)]\bigg|_{t=0}. \end{align} Observe \begin{align} \operatorname{Tr}[(M+tN)\log(M+tN)]=&\ \operatorname{Tr}[M\log M]+\operatorname{Tr}[M\log(I+tM^{-1}N)]\\ &\ + t\operatorname{Tr}[N\log M]+t\operatorname{Tr}[N\log(I+tM^{-1}N)]\\ =&\ \operatorname{Tr}[M\log M]+t\operatorname{Tr}[N(I+\log M)]+\mathcal{O}(t^2) \end{align} then it follows \begin{align} \frac{d}{dt}\operatorname{Tr}[(M+tN)\log(M+tN)]\bigg|_{t=0} = \operatorname{Tr}[N(I+\log M)] \end{align} for any $N$. Hence it follows $\frac{d\operatorname{Tr}[M\log M]}{dM}=I+\log M$. Thus we have the desired result.

Note, I used the fact that $M$ is positive definite when I claim that $M$ is invertible and $\log M$ is well-defined.

2
On

Note that for small $H$ we have $\log (I+H) = \sum_{k=1}^\infty (-1)^{k+1} {1 \over k} H^k$.

Also note that for $A,B$ positive definite we have $\operatorname{tr} (\log (AB)) = \operatorname{tr} (\log A) + \operatorname{tr} ( \log B)$.

Going straight for the Fréchet derivative, we have: \begin{eqnarray} f(M+H) -f(M)&=& \operatorname{tr} [ (M+H) \log(M+H)-M \log M -H] \\ &=& \operatorname{tr} [ (M+H) \log(M(I+M^{-1}H))-M \log M -H] \\ &=& \operatorname{tr} [ (M+H) (\log M + \log(I+M^{-1}H))-M \log M -H] \\ &=& \operatorname{tr} [ H \log M+(M+H)\log(I+M^{-1}H)) -H] \\ &=& \operatorname{tr} [ H \log M+(M+H)\sum_{k=1}^\infty (-1)^{k+1} {1 \over k} (M^{-1}H)^k-H] \\ &=& \operatorname{tr} [ H \log M+ g(H)] \\ \end{eqnarray} where $\|g(H)\| \le K \|H\|^2$. Hence $Df(M)H = \operatorname{tr} (\log M H) = \langle \log M , H \rangle$.

Addendum: @greg suggested a simpler approach in the comments in the question, here is a proof for an analytic $g$ defined on a simply connected open domain $D$ and $f(M) = \operatorname{tr} g(M)$.

Let $\gamma$ be a curve contained in $D$ and which encircles the spectrum of $M$ once (that is, $I(\gamma,\lambda) = 1$ for all eigenvalues $\lambda$ of $M$). Let $\phi_M(z) = (zI-M)^{-1}$. Let $g(A) = {1 \over 2 \pi i} \int_\gamma g(z) \phi_M(z)dz$.

A straightforward computation shows that $Dg(A)H = {1 \over 2 \pi i} \int_\gamma g(z) \phi_M(z)H\phi_M(z)dz$, and so $D f(A)H = \operatorname{tr} (Dg(A)H) = \operatorname{tr}[{1 \over 2 \pi i} \int_\gamma g(z) \phi_M^2(z)dzH]$. Integration by parts shows that ${1 \over 2 \pi i} \int_\gamma g(z) \phi_M^2(z)dz = {1 \over 2 \pi i} \int_\gamma g'(z) \phi_M(z)dz = g'(M)$, and so $Df(M)H = \operatorname{tr}(g'(M)H)$.

In particular, the gradient is $(g'(M))^T = g'(M^T)$.

0
On

We will utilize the fact that ${\rm tr}(A - B) = {\rm tr}(A) - {\rm tr}(B)$ and denote ${\rm tr}(AB) = \langle A^T, B \rangle := A^T : B$. Also, utilize the cyclic property of trace and denote the identity matrix by $I$.

Now compute differential and then gradient. \begin{align} df &= d\:{\rm tr}\left(M \log(M) - M\right)\\ &= d\:\left[{\rm tr}\left(M \log(M)\right) - {\rm tr}(M)\right]\\ &= d\:\left[M^T:\log(M) - I:M\right]\\ &= d\:\left[M^T:\log(M)\right] - I:dM\\ &= \left[dM^T:\log(M) + M^T:M^{-1}dM\right] - I:dM\\ &= \left[\log(M):dM^T + M^{-T}M^T:dM\right] - I:dM\\ &= \left[\log(M)^T:dM + I:dM\right] - I:dM\\ &= \left[\log(M)^T:dM \right] \ . \end{align}

The gradient is \begin{align} \frac{\partial f}{\partial M} = \log(M)^T = \log(M) \hspace{5mm} \{\textrm{ since }M^T = M\}. \end{align}