Partial derivatives of matrix logarithm

377 Views Asked by At

I am trying to figure out what the derivative of the matrix logarithm w.r.t. the matrix parameter is. So, for $X \in \mathcal S$ the set of $n \times n$ symmetric positive definite full-rank matrices, I'd like to find: $$X \mapsto \frac {\partial \log(X)_{\alpha\beta}}{\partial X_{ij}},$$ and even if possible: $$\frac {\partial \log(f(X))_{\alpha\beta}}{\partial X_{ij}}$$ for a given $f : \mathcal S \to \mathcal S$.

I tried a few things using the fact that if $X = V\Lambda V^\top$, then $\log X = V\log\Lambda V^\top$, but I haven't been able to make anything useful out of it since I know how to differentiate neither $V$ nor $\Lambda$ w.r.t. $X$. Any leads?

For a bit of context, I am reading this paper in which the authors use the matrix logarithm to locally map covariance matrices onto the tangent space to the SPD matrices manifold at a given point. I am trying to find how the original components of the covariance matrices affect the components of the projected matrix. Therefore I'd like to differentiate the mapping as shown above.

Using the power series expression of matrix log: $$\log X = \sum_{m \geq 1}(-1)^{m+1}\frac 1m(X-I)^m$$ as a basis to differentiate formally yields: $$\frac {\partial \log(X)_{\alpha\beta}}{\partial X_{ij}} = \sum_{m \geq 0}(-1)^{m+1}\frac 1m\frac {\partial}{\partial X_{ij}}{(X-I)^m}_{k\ell} = \sum_{m \geq 1}(-1)^{m+1}\frac 1m\sum_{k=0}^{m-1}{(X-I)^k}_{\alpha i}{(X-I)^{m-1-k}}_{j\beta},$$ but again I don't know where to go from here.

1

There are 1 best solutions below

5
On

Instead of 4th order tensors, it is usually easier to use vectorization in these situations. $$\eqalign{ Y &= AXB \\ dY &= A\;dX\;B \\ {\rm vec}(dY) &= (B^T\otimes A)\;{\rm vec}(dX) \\ dy &= (B^T\otimes A)\;dx \\ }$$ where $\otimes$ is the Kronecker product.

For the current problem, define the symmetric matrices $$\eqalign{ Z &= (X-I),\quad dZ = dX \\ Y &= \log(X) \;=\; \sum_{m=1}^\infty \frac{(-1)^{m-1}}{m}\;Z^m \\ }$$ Then differentiate the power series and vectorize it. $$\eqalign{ dY &= \sum_{m=1}^\infty\frac{(-1)^{m-1}}{m} \;\sum_{k=0}^{m-1}Z^{k}\;dX\;Z^{m-1-k} \\ dy &= \sum_{m=1}^\infty\frac{(-1)^{m-1}}{m} \;\sum_{k=0}^{m-1}\Big(Z^{m-1-k}\otimes Z^{k}\Big)\;dx \\ \frac{\partial y}{\partial x} &= \sum_{m=1}^\infty \frac{(-1)^{m-1}}{m} \;\sum_{k=0}^{m-1}\Big(Z^{m-1-k}\otimes Z^{k}\Big) \\ }$$ Elements of the vector expression are equal to those of the tensor expression $$\eqalign{ \frac{\partial y_{k}}{\partial x_{r}} &= \frac{\partial Y_{ij}}{\partial X_{pq}} }$$ The mapping for the vector indexes is given by $$k = i + (j-1)n \\ r = p + (q-1)n \\$$

Update #2

The gradient of $X$ with respect to one of its components is $$\eqalign{ \def\p{\partial} \def\grad#1#2{\frac{\p #1}{\p #2}} \grad{X}{X_{ij}} = E_{ij} \\ }$$ where $E_{ij}$ is the matrix whose $(i,j)$ component equals one and all other components equal zero.

The $\sf Daleckii$-$\sf Krein$ theorem, using an eigenvalue decomposition of $X$, states that $$\eqalign{ \def\op{\operatorname} \def\L{{\Lambda}} \def\l{\lambda} X &= V\L V^T, \qquad \L = \op{Diag}(\l_k) \\ F &= f(X) \;=\; V\,f(\L)\,V^T \\ dF &= V\,\Big[R\odot\left(V^TdX\,V\right)\Big]\,V^T \\ \\ R_{k\ell} &= \begin{cases} {\large\frac{f(\l_k)\,-\,f(\l_\ell)}{\l_k\,-\,\l_\ell}}\qquad{\rm if}\;\l_k\ne\l_\ell \\ \\ \quad{\small f'(\l_k)}\qquad\qquad{\rm otherwise} \\ \end{cases} \\ }$$ where $(\odot)$ denotes the Hadamard product, and in the current problem the function of interest is $$f(\l) = \log(\l), \qquad\quad f'(\l) = \l^{-1} \\ $$

Combining these two results produces a closed-form solution for the current problem $$\eqalign{ \grad{\log(X)}{X_{ij}} &= V\,\Big[R\odot\left(V^TE_{ij}V\right)\Big]\,V^T \\ }$$