Gradient of trace of a product with a matrix logarithm and Kronecker product

163 Views Asked by At

I'm looking to compute the gradient of a reasonably complicated matrix function, which I can mostly reduce to the following problem. I'm not entirely sure if a closed-form analytic solution is possible. I want to find $\nabla_X f$, where $$f(X) = \text{tr}\left[X\cdot \log\left(\sum_i A_i X A_i^T\right)\otimes\mathbb{I}_{n/m}\right]$$ Here:

  • $X$ is an $n\times n$ complex, full-rank, positive semi-definite matrix,
  • $\{A_i\}$ is a set of real, $m\times n$ matrices (specifically, this sum is to compute the partial trace of $X$),
  • $\mathbb{I}$ is the $n/m \times n/m$ identity matrix,
  • and $\otimes$ is the regular Kronecker product.

I don't have much experience with matrix calculus, but it seems at face value like most literature on the topic are cheat-sheet type rules about how to compute different basic derivatives but I don't have a good feel for how to tackle more difficult problems. For example, here, I have seen that $\frac{\partial \text{tr}(F(X))}{\partial X} = f(X)^\dagger $, where $f$ is the scalar derivative of $F$ but it's not clear to me exactly what this scalar derivative means and I can't seem to find more information or build a ground-up approach. From this, my best guess is that $$\frac{\partial f}{\partial X}^\dagger = \frac{\partial}{\partial X}\left[\text{tr}(X)\right]\cdot\log\left(\sum_i A_i X A_i^T\right)\otimes\mathbb{I}_{n/m} + X\cdot \frac{\partial}{\partial X}\left[\text{tr}\left(\log\left(\sum_i A_i X A_i^T\right)\otimes\mathbb{I}_{n/m}\right)\right]$$ which simplifies to $$\log\left(\sum_i A_i X A_i^T\right)\otimes\mathbb{I}_{n/m} + X\cdot \frac{\partial}{\partial X}\left[\log\left|\sum_i A_i X A_i^T\right|\right]\cdot\text{Tr}(\mathbb{I}_{n/m})$$ $$= \log\left(\sum_i A_i X A_i^T\right)\otimes\mathbb{I}_{n/m} + (n/m)\cdot X \cdot \left(\sum_i A_i^T P A_i\right)$$ with $P = \left(\sum_i A_i X A_i^T\right)^{-1}$. I don't think you can just take the derivatives out of the trace like that, but I don't really know how to proceed with taking the derivative of what's inside the trace and then using the chain rule. Is anybody able to help with this? Am I on the right-track or is there a more systematic way to compute this? Is it even possible to find a closed form expression or should I be resorting to numerics? I know some of the non-commuting aspects to the problem can be tamed by the trace but I'm actually not totally sure which elements should be required to commute in this sense.

Many thanks in advance.

1

There are 1 best solutions below

2
On

In order to tackle this problem, you need to know how to vectorize a matrix equation, i.e. $$\eqalign{ {\rm vec}(ASB) = (B^T\otimes A)\,{\rm vec}(S) = (B^T\otimes A)\,s }$$ where the $\otimes$ symbol denotes the Kronecker product and ${\rm vec}(S)$ operation stacks the columns of $S$ to create the long column vector $s$.

The original matrix can be recovered from the long vector by the reverse operation, i.e. $$S = {\rm Mat}(s)$$

The next thing you need is a series expansion of the logarithm $$\eqalign{ B &= S-I \\ Y &= \log(S) = \sum_{k=1}^\infty \frac{(-1)^{k+1}}{k}B^k \\ }$$ and its differential $$\eqalign{ dB &= dS \\ dY &= \sum_{k=1}^\infty\sum_{j=1}^k \frac{(-1)^{k+1}}{k}B^{j-1}dS\,B^{k-j} \\ dy &= \left(\sum_{k=1}^\infty\sum_{j=1}^k \frac{(-1)^{k+1}}{k}\big(B^{k-j}\big)^T\otimes B^{j-1}\right)ds \\ }$$ In the current problem, $$\eqalign{ S &= \sum_{\ell=1}^L A_\ell X A_\ell^T \quad\implies\quad ds &= \left(\sum_{\ell=1}^L A_\ell\otimes A_\ell\right)dx \\ }$$ Combining the last two results we can write $$dy = M\,dx$$ The objective function can be written as $$f \;=\; X^T:(Y\otimes I) \;=\; X:(Y^T\otimes I)$$ where the colon denotes the trace/Frobenius product, i.e. $$A:B = {\rm Tr}(A^TB) = {\rm Tr}(AB^T) = B:A$$ The SVD of $X$ can be used to simplify the objective function $$\eqalign{ X &= \sum_{k=1}^{rank X} \sigma_k u_k v_k^T\,, \quad U_k = {\rm Mat}(u_k), \quad V_k = {\rm Mat}(v_k), \quad W = \sum_k \sigma_k V_k^TU_k \\ f &= X:(Y^T\otimes I) \\ &= \sum_k \sigma_k u_k:(Y^T\otimes I)v_k \\ &= \sum_k \sigma_k u_k:{\rm vec}(V_kY) \\ &= \sum_k \sigma_k U_k:(V_kY) \\ &= \left(\sum_k \sigma_k V_k^TU_k\right):Y \\ &= W:Y \\ }$$ Now we're ready to calculate the requested gradient. $$\eqalign{ df &= (Y^T\otimes I):dX + W:dY \\ &= (Y^T\otimes I):dX + w:dy \\ &= (Y^T\otimes I):dX + w:M\,dx \\ &= (Y^T\otimes I):dX + M^Tw:dx \\ &= \Big((Y^T\otimes I) + {\rm Mat}(M^Tw)\Big):dX \\ \frac{\partial f}{\partial X} &= (Y^T\otimes I) + {\rm Mat}(M^Tw) \\ }$$ There's a lot going on here, in particular $M$ requires the evaluation of an infinite series.