Differentiate ${\operatorname{Tr}((SS^T)^{-1}A)}$ w.r.t. $S$

83 Views Asked by At

From matrix cookbook (eq. 125) I only know that

$\frac{{\operatorname{Tr}((S^TS)^{-1}A)}}{dS}=-S(S^TS)^{-1}(A+A^T)(S^TS)^{-1}$.

Can anybody tell me what happens to the result if I switch $(S^TS)^{-1}$ to $(SS^T)^{-1}$?

So what is

$\frac{{\operatorname{Tr}((SS^T)^{-1}A)}}{dS}=...$?

P.S. In general I am also very interested how the authors came up with the solution in eq. 125

2

There are 2 best solutions below

3
On BEST ANSWER

For convenience, define a new matrix variable $$M=S^TS$$ note that $M$ is symmetric.

Instead of the trace notation, let's use the inner/Frobenius product $$A:B = {\rm tr}(A^TB)$$ which is easier to manipulate algebraically. All of the properties of the Frobenius product follow from the cyclical and transpostional properties of the trace, e.g. $$\eqalign{ &A:B = B:A \cr &A:B = A^T:B^T \cr &AB:C = A:CB^T \cr }$$

Now re-write the function and find its differential and gradient $$\eqalign{ f &= A:M^{-1} \cr \cr df &= -A:M^{-1}\,dM\,M^{-1} \cr &= -M^{-1}AM^{-1}:dM \cr &= -(S^TS)^{-1}A(S^TS)^{-1}:d(S^TS) \cr &= -(S^TS)^{-1}A(S^TS)^{-1}:(dS^T\,S+S^T\,dS) \cr &= -(S^TS)^{-1}(A+A^T)(S^TS)^{-1}:S^T\,dS \cr &= -S(S^TS)^{-1}(A+A^T)(S^TS)^{-1}:dS \cr \cr \frac{\partial f}{\partial S} &= -S(S^TS)^{-1}(A+A^T)(S^TS)^{-1} \cr \cr }$$ So that's where the result you quoted comes from -- but notice that it's off by a factor of 2. The factor of 2 is likely due to the fact that when $A$ is symmetric then $A+A^T=2A$, but this result was incompletely substituted.

Now let's change the definition of $M$ to $$M=SS^T$$ Picking up the differential at the point where it's still in terms of $M$, we can substitute the new definition to get the answer to your question $$\eqalign{ df &= -M^{-1}AM^{-1}:dM \cr &= -(SS^T)^{-1}A(SS^T)^{-1}:d(SS^T) \cr &= -(SS^T)^{-1}A(SS^T)^{-1}:(dS\,S^T+S\,dS^T) \cr &= -(SS^T)^{-1}(A+A^T)(SS^T)^{-1}:dS\,S^T \cr &= -(SS^T)^{-1}(A+A^T)(SS^T)^{-1}S:dS \cr \cr \frac{\partial f}{\partial S} &= -(SS^T)^{-1}(A+A^T)(SS^T)^{-1}S \cr \cr }$$

1
On

What you are doing is - Replacing $\mathrm S$ by $\mathrm S^{\mathrm T}$. Therefore just replace $\mathrm S \rightarrow \mathrm S^{\mathrm T}$ everywhere to get the result.