Derivative of $\text{trace}((a|S^{-1}| + bS^{-1})B)$ with respect to $S$

57 Views Asked by At

I'd like to calculate the derivative of $\text{trace}((a|S^{-1}| + bS^{-1})B)$ with respect to $S$. Here, $a$ and $b$ are scalars, $S$ is a symmetric, non-singular matrix, $B$ is positive semi-definite and $|X|$ denotes the absolute value of $X$ i.e. the matrix which shares the same eigenvectors as $X$ but whose eigenvalues are the absolute value of those of $X$.

The only problem term really is $|S^{-1}| = Q|D^{-1}|Q^T$ for $S = QDQ^T$.

If someone could also point to some useful literature to become more familiar with matrix calculus then that would be greatly appreciated.

1

There are 1 best solutions below

2
On

$ \def\p{\partial} \def\g#1#2{\frac{\p #1}{\p #2}} \def\R{\operatorname{Reshape}} \def\S{\operatorname{sign}} \def\v{\operatorname{vec}} \def\M{\operatorname{Mat}} $The absolute value function that you're using can defined using Higham's Matrix Sign function $$\eqalign{ G &= \S(S) = S^{-1}\big(S^2\big)^{1/2} \\ A &= |S| = SG = GS = \big(S^2\big)^{1/2} \\ }$$ Note that for these functions, the function of the inverse equals the inverse of the function $$\eqalign{ \S(S^{-1}) &= S\big(S^{-2}\big)^{1/2} &= G^{-1} \\ |S^{-1}| &= S^{-1}G^{-1} &= A^{-1} \\ }$$

The differential of $A$ has a simple relationship to that of $S$ $$\eqalign{ A^2 &= S^2 \\ A\,dA + dA\,A &= S\,dS + dS\,S \\ }$$ This expression can be vectorized with the aid of the Kronecker product $(\otimes)$ and sum $(\oplus)$ $$\eqalign{ (I\otimes A + A^T\otimes I)\,da &= (I\otimes S+S^T\otimes I)\,ds \\ (I\otimes A + A\otimes I)\,da &= (I\otimes S+S\otimes I)\,ds \\ (A\oplus A)\,da &= (S\oplus S)\,ds \\ da &= (A\oplus A)^{-1}(S\oplus S)\,ds \\ &= M\,ds \\ }$$ where the combined coefficient matrix $M$ inherits the symmetry of $A$ and $S$.

Your objective function can be rewritten using the Frobenius product $(:)$ and differentiated. $$\eqalign{ \phi &= B:(\alpha A^{-1} +\beta S^{-1}) \\ d\phi &= B:(\alpha\,dA^{-1} + \beta\,dS^{-1}) \\ &= -B:(\alpha A^{-1}dA\,A^{-1} + \beta S^{-1}dS\,S^{-1}) \\ &= -\big(\alpha A^{-1}BA^{-1}:dA + \beta S^{-1}BS^{-1}:dS\big) \\ &= -\Big(\v(\alpha A^{-1}BA^{-1}):da + \v(\beta S^{-1}BS^{-1}):ds\Big) \\ &= -\Big(\v(\alpha A^{-1}BA^{-1}):M^Tds + \v(\beta S^{-1}BS^{-1}):ds\Big) \\ &= -\Big(M\v(\alpha A^{-1}BA^{-1}) + \v(\beta S^{-1}BS^{-1})\Big):ds \\ \g{\phi}{s} &= -\Big(M\v(\alpha A^{-1}BA^{-1}) + \v(\beta S^{-1}BS^{-1})\Big) \\ }$$ It is easy to convert this gradient between vector and matrix forms $$\eqalign{ \g{\phi}{S} &= \R\left(\g{\phi}{s},\;n,\,n\right) &= \M\left(\g{\phi}{s}\right) \\ \g{\phi}{s} &= \R\left(\g{\phi}{S},\;n^2,\,{\tt1}\right) &= \v\left(\g{\phi}{S}\right) \\ }$$