Derivative of trace of a matrix function [$\operatorname{Tr}(X\log(Y))$] w.r.t. a scalar

338 Views Asked by At

$\DeclareMathOperator{\Tr}{Tr}$ I'm trying to find a closed form for $\frac{\partial}{\partial \theta}\Tr(X\log(Y))$ where $X(\theta)$ and $Y(\theta)$ are Hermitian positive definite matrices with trace 1 (i.e. full rank density matrices), parametrized by scalar $\theta$, that in general don't commute.

If it was $\frac{\partial}{\partial \theta}\Tr(X\log(X))$, I could write down the Taylor expansion of the log and using the cyclic property of the trace rearrange all the terms coming from differentiating $X^n$ and pretend like I was doing single variable calculus; which would give me $\Tr(X'\log(X) - X')$, where $X'\equiv\frac{\partial X}{\partial \theta}$.

However, if I were to apply the same approach to $\frac{\partial}{\partial \theta}\Tr(X\log(Y))$, the Taylor expansion gives me a sum of terms like $\Tr(X(Y-1)^n)$ (here $1$ is the identity matrix of the same dimension as $X$ & $Y$):

$$ \Tr(X\log(Y)) = \sum_{n=1}^{\infty}\frac{(-1)^{n+1}}{n}\Tr(X(Y-1)^n) $$

When differentiated, the trace in the $n$-th term produces $$ \Tr(X'(Y-1)^n)+\Tr(XY'(Y-1)^{n-1})+\dots+\Tr(X(Y-1)^k\,Y'\,(Y-1)^{n-k-1})+\dots+\Tr(X(Y-1)^{n-1}Y') $$

The first term can be separated to give $\Tr(X'\log(Y))$. However, the rest of the terms can't be rearranged in a nice way since $X$ breaks the sort of cyclic symmetry we had in the previous case. This is the point I'm stuck at; is there a different approach with which I can manipulate this expression to obtain a closed form derivative? Thanks in advance.

Context: I'm trying to differentiate von Neumann relative entropy in quantum mechanics/information.

1

There are 1 best solutions below

1
On

$ \def\l{\left(} \def\r{\right)} \def\p{\partial} \def\m#1{ \left[\begin{array}{c}#1\end{array}\right] } $I think you've summarized the issue perfectly. While I don't know of a nice closed-form solution, the following approach might be useful.

First, let's use a dot to denote derivatives with respect to $\theta$.
Then, given a matrix $Y\in{\mathbb R}^{n\times n}$ and an analytic function $f,\;$ let $F=f(Y)$.
Then, using block-triangular matrices, one can write $$\eqalign{ f\l\m{Y&{\dot Y}\\0&Y}\r &= \m{F&{\dot F}\\0&F} \\ }$$ Further, by defining block-matrix analogs of the ${\mathbb R}^{2}$ basis vectors $$ E_1 = \l e_1\otimes I_n\r = \m{I_n\\0_n} \qquad E_2 = \l e_2\otimes I_n\r = \m{0_n\\I_n} $$ one can extract the $(1,2)-$block $${\dot F} = E_1^T\m{F&{\dot F}\\0&F}\,E_2$$ Therefore
$$\eqalign{ \Omega &= {\rm Tr}(XF) \\ {\dot\Omega} &= {\rm Tr}\l{\dot X}F \,+\, X{\dot F}\r \\ &= {\rm Tr}\l{\dot X}\,\log\l Y\r+XE_1^T\,\log\l\m{Y&{\dot Y}\\0&Y}\r\,E_2\r \\ &= {\rm Tr}\bigg(\big(XE_1^T+\dot XE_2^T\big)\,\log\!\big(E_1 YE_1^T + E_2 YE_2^T + E_1\dot YE_2^T\big)\,E_2\bigg) \\ }$$ Note that this approach merely defers the issue of non-commuting matrices to the algorithm which calculates the matrix function. However, current algorithms (e.g. in Matlab or Julia) are fairly robust and reliable.

For computational efficiency, one need only evaluate the triangular logarithm, from which both $F$ and $\dot F$ can be extracted via block operations.