Derivative of a trace w.r.t matrix within log of matrix sums

319 Views Asked by At

I'm trying to solve an optimization (sub)problem and am running into trouble with a tricky derivative. I'd like to find the matrix $C \in \mathbb{R}^{n\times d}_+$ which minimizes the following function, where $X \in \mathbb{R}^{n\times d}_+$, $W\in\mathbb{R}^{d\times d}$ and $L \in \mathbb{R}^{n\times n}$.

$f= \mathrm{tr}(W\log(X+C)L\log(X+C)^T)$

Here, $\log$ and the addition are element-wise. It would be convenient if a closed form solution exists for $\min\limits_{C > \mathbf{0}} f(C)$, where the other variables are fixed.

My attempt:

$f = \sum\limits_{i}[W\log(X+C)L\log(X+C)^T]_{ii}$

$=\sum\limits_{i}\sum\limits_{j}\sum\limits_{k}\sum\limits_{l}W_{ij}\log(X_{jk}+C_{jk})L_{kl}\log(X_{il}+C_{il})$

$\dfrac{\delta f}{\delta B}=\dfrac{\delta f}{\delta C_{jk}} + \dfrac{\delta f}{\delta C_{il}}$

$=\sum\limits_{i}\sum\limits_{j}\sum\limits_{k}\sum\limits_{l}\dfrac{ W_{ij}L_{kl}\log(X_{il}+C_{il})}{X_{jk} + C_{jk}} + \dfrac{ W_{ij}\log(X_{jk}+C_{jk})L_{kl}}{X_{il} + C_{il}}$

Which, in matrix notation, I take to be:

$=\dfrac{WL\log(X+C)^T}{X+C} + \dfrac{W\log(X+C) L}{(X+C)^T}$

Setting this equal to zero, I see no obvious solution except except for $C=-X$ which violates my assumption.

I'm hoping someone would be able to check that I've differentiated correctly, at least, and perhaps suggest an alternative minimization scheme (perhaps some subgradient method) in this event.

2

There are 2 best solutions below

4
On BEST ANSWER

@ lynn , you divide by $X+C$ !!! Yet, your final equation is correct.

$f$ is a function of $C$. If $H\in M_{n,d}$, then let $K=[k_{i,j}]=[\dfrac{h_{i,j}}{x_{i,j}+c_{i,j}}]$. If $C$ is a stationnary point, then, for every $H$, $Df_C:H\rightarrow tr(WKLlog(X+C)^T+Wlog(X+C)LK^T)=tr(Llog(X+C)^TWK+L^Tlog(X+C)^TW^TK)=0.$

It is also true for every $K$ ; then $Llog(X+C)^TW+L^Tlog(X+C)^TW^T=0$, that is equivalent to the lynn's condition.

Thus $log(X+C)\in\ker Z$ where $Z=W^T\otimes L + W\otimes L^T$ (stacking row by row) is a symmetric matrix. Let $(\lambda_i)_i=spectrum(W),(\mu_i)_i=spectrum(L)$. If $W,L$ are normal matrices, then the spectrum of $Z\in M_{nd,nd}$ is $(2Re(\lambda_i\mu_j))_{i,j}$ and $Z$ is not a bijection iff there is $(i,j)$ s.t. $Re(\lambda_i\mu_j)=0$. Otherwise we cannot conclude.

EDIT: As Johan wrote, your problem is not clear. Assume for instance that $w_{r,r}l_{s,s}<0$. We choose $C=[c_{i,j}]$ s.t. the $(c_{i,j})_{i,j}$ are small positive numbers except $c_{r,s}$ which is a great positive number. Then $f(C)\sim w_{r,r}l_{s,s}(log(c_{r,s}))^2$ when $c_{r,s}$ tends to $+\infty$. Thus the inferior bound of $f$ is $-\infty$.

0
On

First, solve the problem for $G = \rm{log}(X+C)$.

In terms of $G$, the derivative is $$ \frac {\partial f} {\partial C} = \frac {W^TGL^T + W G L} {X+C} $$ Setting it to zero and taking the $\rm{vec}$ of both sides $$ \eqalign { (L\otimes W^T + L^T\otimes W)\,\, \rm{vec}(G) &= 0 \cr K\,g &= 0 \cr (I+K)\,g &= g \cr }$$ So $g$ is an eigenvector of $(I+K)$ corresponding to the eigenvalue $1$, if it exists. After finding $g$, it is easy to recover $G$, and finally $C = (\rm{exp}(G)-X)$.