Matrix chain rule question: what is $\frac{d}{dX} f(S)$ where $S = (A+X)^{-1}$

424 Views Asked by At

I'm trying to find the following derivative: $$\frac{d}{dX} f(S)$$ where $f$ is a function that takes a matrix and returns a scalar, and $S=(A+X)^{-1}$. Assume that we know what $\frac{d}{dS} f(S)$ is (for example, if $f(S)=\exp(u^\intercal S u)$, we'd have $\frac{d}{dS} f(S)=u u^\intercal f(S)$).

I want to employ something like a matrix chain rule that looks like $$\frac{d}{dX} f(S) = \left(\frac{dS}{dX}\right) \left(\frac{d}{dS} f(S)\right)$$ but the problem is that the $\left(\frac{dS}{dX}\right)$ doesn't seem to make sense, and I don't know how to do a matrix-by-matrix derivative.

If it helps, assume that all matrices are symmetric and PSD.

1

There are 1 best solutions below

4
On BEST ANSWER

We know how to calculate the gradient with respect to $S$ $$G=\frac{\partial f}{\partial S}$$ We also know that $$\eqalign{ X &= S^{-1} - A\cr dX &= -S^{-1}\,dS\,S^{-1} &\implies dS = -S\,dX\,S \cr }$$ Let's use this to write the differential of the function, and then perform a change of variables to find a result in terms of $X$ $$\eqalign{ df &= G:dS \cr &= -G:S\,dX\,S \cr &= -S^TGS^T:dX \cr &= -S^T\,\frac{\partial f}{\partial S}\,S^T:dX \cr \cr \frac{\partial f}{\partial X} &= -S^T\,\frac{\partial f}{\partial S}\,S^T \cr }$$ where colon denotes the inner/Frobenius product, i.e. $$A:B={\rm tr}(A^TB)$$ and the cyclic properties of the trace give rise to some rules for rearranging the product, i.e. $$\eqalign{ A:BC &= AC^T:B \cr A:BC &= B^TA:C \cr A:BC &= BC:A \cr }$$

As you've discovered, the chain rule can be difficult to apply to matrix problems when the intermediate quantities, i.e. matrix-by-matrix or vector-by-matrix derivatives, are higher-order tensors.

The virtue of the differential approach is that the differential of a matrix behaves like an ordinary matrix.