Derivative of matrix involving trace and log

5.5k Views Asked by At

I'm stuck on this problem. Let $X\in\mathbb{R}^{n\times n}$, compute the following matrix derivatives $$\frac{\partial}{\partial X}\mathrm{tr}(\log(XA)\log(XA)^\top),$$ $$\frac{\partial}{\partial X}\mathrm{tr}(B\log(XA)), $$ where $\log(\cdot)$ is the matrix logarithm (not element-wise) and $A,B\in\mathbb{R}^{n\times n}$ are constant matrices.

Thanks in advance for your suggestions!

4

There are 4 best solutions below

2
On

Here's an incomplete answer:

$\log A = - \sum_{i = 1}^{\infty} \, \frac{1}{i}(I - A)^i$

So

$d \log A = \sum_{i = 0}^{\infty} \, (I - A)^i \, dA$

If you know the chain and product rules, deriving over the $\log A$ term should be the only tricky part.

2
On

I assume you are able to compute the derivative without trace. And the rest part, actually, is not hard. Try to compute it componentwisely, then $$\begin{align} \frac{\partial}{\partial X^i_j}\text{tr}f(X)=& \frac{\partial}{\partial X^i_j}f^k_l(X)\delta^l_k\\ =&\frac d{dx}f^k_m(x)|_{x=X}\frac{\partial}{\partial X^i_j}X^m_l\delta^l_k\\ =&\frac d{dx}f^k_m(x)|_{x=X}\delta^m_i\delta^j_l\delta^l_k\\ =&\frac d{dx}f^j_i(x)|_{x=X} \end{align}$$ The conclusion is

$$\frac{\partial}{\partial X}\text{tr}f(X)=\left(\frac d{dx}f(x)|_{x=X}\right)^T$$

0
On

Clearly the calculation is quasi-unfeasible. In particular, the answer of GMB is absolutely false (cf. my comment above). Who has given a good point to his answer ? Yet, one can give a compact answer if $X=A^{-1}$ because $D\log_I=id$.

Let $f(X)=tr(B\log(XA))$. Then $Df_{A^{-1}}:H\rightarrow tr(BHA)$. Thus $\nabla (f)_{A^{-1}}=(AB)^T$.

EDIT: Let $h(X)=tr(\log(I+X))$ where $\log$ denotes the principal logarithm and $||X||<1$. Then $h(X)=tr(\sum_{k=1}^{\infty}(-1)^{k+1}/k.X^k) =\sum_{k=1}^{\infty}(-1)^{k+1}/k.tr(X^k)$ and $Dh_X:H\rightarrow \sum_{k=1}^{\infty}(-1)^{k+1}/k.tr(kX^{k-1}H)$ (because $tr(UV)=tr(VU)$)$=tr(\sum_{k=1}^{\infty}(-1)^{k-1}X^{k-1}H)=tr((I+X)^{-1}H)$. In other words $\nabla(h)_X={(I+X)^{-1}}^T$. More generally, if $X$ has no eigenvalues in $\mathbb{R}^-$, then let $g(X)=tr(\log(X))$. One has $Dg_X:H\rightarrow tr(X^{-1}H)$ and $\nabla(g)_X={X^{-1}}^T$.

Here $f(X)=tr(B\log(XA))=tr(B\log(Y))$; then we must derive $u(Y)=tr(BY^k)$. Unfortunately $Du_Y:K\rightarrow tr(B(kY^{k-1})K)$ is false in general ; yet, it is true if, for instance, $BY=YB$. Finally, when $BXA=XAB$, $Df_X:H\rightarrow tr(B(XA)^{-1}HA)$ (because $K=HA$)$=tr((XA)^{-1}BHA)=tr(X^{-1}BH)$ and $\nabla(f)_X=(X^{-1}B)^T$ does not depend on $A$.

0
On

$ \def\l{\lambda}\def\p{\partial} \def\L{\left}\def\R{\right}\def\LR#1{\L(#1\R)} \def\trace#1{\operatorname{tr}\LR{#1}} \def\qiq{\quad\implies\quad} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\notimplies{\;{\large\not}\!\!\!\!\!\implies} \def\c#1{\color{red}{#1}} \def\CLR#1{\c{\LR{#1}}} \def\fracLR#1#2{\LR{\frac{#1}{#2}}} \def\m#1{\left[\begin{array}{r}#1\end{array}\right]} $Denote the function $\Phi(\l)$ of a scalar $\l\,$ and its derivative $$\phi(\l) = \frac{d\Phi}{d\l}$$ and write Shuchang's matrix result as a differential expression $$\eqalign{ d\trace{{\large\Phi}(X)} &= \phi\!\LR{X^T} :dX \\ }$$ where the colon denotes the Frobenius product $$\trace{A^TB} \;=\; A:B \;=\; A^T:B^T $$

Let's also define the matrix variables $$\eqalign{ Y &= XA &\implies\quad Y^{-1} = A^{-1}X^{-1} \\ L &= \log(Y) &\;\notimplies\quad L = \log(X) + \log(A) \\ }$$ Consider a function to which Shuchang's result can be applied, e.g. $$\eqalign{ \phi &= \trace{L} \\ &= \trace{\log(Y)} \\ d\phi &= \LR{Y^T}^{-1}:dY \\ &= \LR{Y^{-1}}^T:\LR{dX\,A} \\ &= \LR{Y^{-1}}^TA^T:dX \\ &= \LR{AY^{-1}}^T:dX \\ &= \LR{X^{-1}}^T:dX \\ \grad{\phi}{X} &= \LR{X^T}^{-1} \\ }$$ But differentiation of the functions posed in this question get stuck after the very first step ! $$\eqalign{ \phi &= \trace{B^TL} = B:L \\ d\phi &= B:dL \\ \\ \psi &= \trace{LL^T} = L:L \\ d\psi &= 2L:dL \\ }$$ The problem is that no simple expression for $dL\,($in terms of $dX)$ exists. Note that Shuchang's result applies to the trace of $dL$ and not to $dL$ itself, so it cannot be used here.

The standard approach is to employ a power series, but one much more complicated than that proposed by GMB $\,($who naively assumed that $A$ commutes with $dA).\,$

While the power series approach yields a solution, it's complicated, has a tiny radius of convergence, and when it does converge it is so slow that it's practically useless.

Since high quality algorithms exist for evaluating matrix logarithms, a better approach is to calculate $dL$ via the logarithm of a block-triangular matrix $$\eqalign{ &\m{L&dL\\0&L} = \log\L(\m{Y&dY\\0&Y}\R) = \log\L(\m{X&dX\\0&X}\cdot\m{A&0\\0&A}\R) \\ \\ d\phi = &\m{0&B\\0&0}:\log\L(\m{X&dX\\0&X}\cdot\m{A&0\\0&A}\R) \\ \\ d\psi = &\m{0&2L\\0&0}:\log\L(\m{X&dX\\0&X}\cdot\m{A&0\\0&A}\R) \\ }$$ These expressions are straightforward and accurate. All of the problematic non-commutative behavior is handled by the block structure of the function argument.

Despite the fact that $dX$ appears within a function argument, these formulas represent a practical way to calculate directional derivatives. Simply set $dX$ parallel to the direction of interest and scale it to have unit norm.

In summary, there are no closed-form solutions for $\,\large\grad{\phi}{X}\,$ or $\,\large\grad{\psi}{X}$