Gradient of trace of squared matrix logarithm

471 Views Asked by At

I have a simple question that confuses me for a while:

$$f(X) = \text{tr} \left( [ \log(X) ]^2 \right)$$

where $X$ is an $m \times m$ symmetric positive definite (SPD) matrix and $\log(X)$ is the matrix logarithm of matrix $X$. What is $\frac{\partial f}{\partial X}$?

Using the chain rule, I have

$df = \text{tr}(2ZdZ)$,

where $Z=\log (X)$. I think we should have $dZ = X^{-1}dX$ as a scalor function, so we will have

$\frac{\partial f}{\partial X} = 2\log(X)X^{-1}$,

but I haven't found any related reference.

Any comment or hint will be appreciated!

2

There are 2 best solutions below

3
On

Careful: most of the standard calculus formulas for differentiation require things to commute. Matrices don't. So $d(Z^2)$ is not $2 Z \; dZ$, it's $Z \; dZ + (dZ)\; Z$. And I don't think there is a closed-form formula for $d(\log Z)$.

However, if $X$ is symmetric, we can assume wlog that it is diagonal. Then you can easily compute $f(X + dX)$ for $dX$ with a single matrix element.

0
On

Let $\phi:X\in U\mapsto \log(X)$, where $U$ is the set of $n\times n$ complex matrices that have no eigenvalues in $\mathbb{R}^-=(-\infty,0]$ (we use the principal $\log$). Its derivative is

cf. Higham, functions of matrices

$D\phi_X:H\in M_n\mapsto \int_0^1(t(X-I)+I)^{-1}H(t(X-I)+I)^{-1}dt$.

$\textbf{Proposition 1}$. Let $f:X\in M_n\mapsto tr(\log(X))$.

Then its derivative is $Df_X(H)=tr(X^{-1}H)$ and its gradient is $\nabla(f)(X)=X^{-T}$.

$\textbf{Proof}$. $Df_X(H)=tr(D\phi_X(H))=\int_0^1tr((t(X-I)+I)^{-1}H(t(X-I)+I)^{-1})dt=$

$\int_0^1tr((t(X-I)+I)^{-2}H)dt=tr(\int_0^1(t(X-I)+I)^{-2}dtH)=tr(X^{-1}H)$.

$\textbf{Proposition 2}$. Let $g:X\in M_n\mapsto tr((\log(X))^2)$.

Then its derivative is $Dg_X(H)=2tr(\log(X)X^{-1}H)$ and its gradient is

$\nabla(g)(X)=2X^{-T}\log(X^T)$.

$\textbf{Proof}$. $Dg_X(h)=2tr(\log(X)D\phi_X(H))=$

$2\int_0^1tr(\log(X)(t(X-I)+I)^{-1}H(t(X-I)+I)^{-1})dt$.

The key is that $\log(X)$ and $(t(X-I)+I)^{-1}$ commute (both are polynomials in $X$); then (as above)

$Dg_X(H)=2tr(\log(X)\int_0^1(t(X-I)+I)^{-2}dtH)=2tr(\log(X)X^{-1}H)$.

$\textbf{Remark}$.The derivative of the function $tr(A\log(X))$ (where $A\in M_n$ is fixed), is much more complicated!!