derivative of the trace of matrix logarithm

646 Views Asked by At

Let

$f(X) = \text{tr}(\log(X)\cdot A)$,

where $\log(X)$ is the matrix logrithm of matrix $X$, both $X$ and $A$ are $m\times m$ symmetric positive definite (SPD) matrices. I was wondering what is $\frac{\partial f}{\partial X}$?

My solution:

Let $Z= \log(X)$, and I assume (am not quite sure) that $dZ = Z^{-1}dX$. Then we have

$df = \text{tr}(X^{-1}dXA) = \text{tr}(AX^{-1}dX)$,

which gives

$\frac{\partial f}{\partial X} = X^{-1}A$.

It that correct?

Addition

What if

$f(X) = \text{tr}([\log(X)]^2A)$?

Using the similar method, let $Z= [\log(X)]^2$, and I assume (am still not quite sure) that $dZ = 2ZX^{-1}dX$. Then we have

$df = \text{tr}(2ZX^{-1}dXA) = 2\text{tr}(AZX^{-1}dX)$,

which gives

$\frac{\partial f}{\partial X} = 2X^{-1}ZA$.

1

There are 1 best solutions below

5
On

$ \def\h{\odot} \def\o{{\tt1}} \def\bR#1{\big(#1\big)} \def\BR#1{\Big[#1\Big]} \def\LR#1{\left(#1\right)} \def\op#1{\operatorname{#1}} \def\Diag#1{\op{Diag}\LR{#1}} \def\tr#1{\op{tr}\LR{#1}} \def\frob#1{\left\| #1 \right\|_F} \def\qiq{\quad\implies\quad} \def\p{\partial} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\m#1{\left[\begin{array}{r}#1\end{array}\right]} \def\c#1{\color{red}{#1}} \def\CLR#1{\c{\LR{#1}}} \def\fracLR#1#2{\LR{\frac{#1}{#2}}} \def\gradLR#1#2{\LR{\grad{#1}{#2}}} $ Here is a numerical counter-example to your first formula using random, non-commuting, $2\times 2$, SPD matrices. $$ A = \m{ 29 & 57 \\ 57 & 117 \\ },\,\,\,\, X = \m{ 20 & 44 \\ 44 & 100 \\ },\,\,\,\, dX = \m{ 4 & 5 \\ 5 & 18 \\ }\times 10^{-4} $$ Let's estimate $df$ using your formula versus a direct calculation. $$\eqalign{ f(X) &= \tr{\log(X)\cdot A} \\ f(X+dX)-f(X) &= 0.002849328 \\ \tr{AX^{-1}dX} &= 0.000975000 \\ \Delta &= 65.8\% \cr }$$ To follow up on @lynn's comment, let's see what happens if the matrices commute. The simplest way to ensure that is to set $A=X$ and repeat the calculation. $$\eqalign{ f(X+dX)-f(X) &= 0.00219992 \\ \tr{AX^{-1}dX}= \tr{dX} &= 0.00220000 \ \Delta &= 0.004\% \\ \\ }$$

Update

Here is a simple non-numerical example of what goes wrong when the matrices don't commute. $$\eqalign{ f &= \tr{X^3A} \\ df &= \tr{X^2\,dX\,A + X\,dX\,XA + dX\,X^2A} \\ \grad fX &= X^2A + XAX + AX^2 \\ }$$ A rather ugly and complicated result for such a simple function. However if $(A,X)$ commute, then you can combine terms to obtain $$\grad fX = 3X^2A$$

Now imagine expanding a matrix function as a Taylor series, and then taking its derivative term-by-term. Each term $X^k$ will explode into $k$ distinct terms and you'll end up with a horrible mess. But you could do it.

However, for the $\log$ function, you can't even write down a Taylor series because it's singular at zero.

Update 2

For SPD matrices, the $\sf Daleckii$-$\sf Krein\;Theorem$ yields a closed-form solution.

First, calculate the Eigenvalue Decomposition $$\eqalign{ \def\b{\beta} X &= QBQ^T,\qquad I=Q^TQ,\;B=\Diag{\b_k} \\ }$$ Applying a generic function $h(z)$ and its derivative $h'(x)$ to $X$ yields $$\eqalign{ H_x &= h(X),\quad &H_b = h(B) \qiq &H_x = Q\,H_b\,Q^T \\ H_x' &= h'(X),\quad &H_b' = h'(B) \qiq &H_x' = Q\,H_b'\,Q^T \\ }$$ Since $B$ is diagonal, its functions are very easy to evaluate.

According to the DK Theorem the differential of this function is $$\eqalign{ Z &= \zeta(BJ-JB), \quad R = \fracLR{H_bJ-JH_b+ZH_b'}{BJ-JB+Z} \\ dH_x &= Q\BR{R\h\LR{Q^TdX\,Q}}Q^T \\ }$$ where $J$ is the all-ones matrix, $\LR{F\h G}$ denotes elementwise multiplication, $\fracLR FG$ denotes elementwise division, and $\zeta$ is an elementwise zero-indicator function, i.e. $$\eqalign{ \zeta\!\LR{\m{-2 & \c0 & 3\\9 & -5 & \c0}} = \m{0 & \c\o & 0\\0 & 0 & \c\o} }$$ In the current problem $$\eqalign{ h(x) &= \log(x), \qquad h'(x) = x^{-1} \\ }$$ The final piece of notation that we need is the Frobenius $(:)$ product $$\eqalign{ F:G &= \sum_{i=1}^m\sum_{j=1}^n F_{ij}G_{ij} \;=\; \tr{F^TG} \\ G:G &= \frob{G}^2 \\ F:G &= G:F \;=\; G^T:F^T \\ \LR{PQ}:G &= P:\LR{GQ^T} \;=\; Q:\LR{P^TG} \\ \LR{E\h F}:G &= E:\LR{F\h G} \\ }$$ Putting this all together $$\eqalign{ f &= A:H_x \\ df &= A:dH_x \\ &= A:\LR{Q\BR{R\h\LR{Q^TdX\,Q}}Q^T} \\ &= \LR{Q\BR{R\h\LR{Q^TAQ}}Q^T}:dX \\ \grad fX &= Q\BR{R\h\LR{Q^TAQ}}Q^T \\ }$$