How to differentiate $g(X)=\operatorname{tr}\left(X^{-1}\right)$?

199 Views Asked by At

Let $X$ be a square invertible $n \times n$ matrix. Calculate the derivative of the following function with respect to X.

$$ g(X)=\operatorname{tr}\left(X^{-1}\right) $$

I'm stumped with this. As when I work through it I use these two identities.

  1. $$\frac{\partial}{\partial \boldsymbol{X}} \boldsymbol{f}(\boldsymbol{X})^{-1}=-\boldsymbol{f}(\boldsymbol{X})^{-1} \frac{\partial \boldsymbol{f}(\boldsymbol{X})}{\partial \boldsymbol{X}} \boldsymbol{f}(\boldsymbol{X})^{-1}$$

and 2. $$ \frac{\partial}{\partial \boldsymbol{X}} \operatorname{tr}(\boldsymbol{f}(\boldsymbol{X}))=\operatorname{tr}\left(\frac{\partial \boldsymbol{f}(\boldsymbol{X})}{\partial \boldsymbol{X}}\right) $$

I should arrive at the solution. using 1. I get $$d/dX(X^{-1}) = -X^{-1}\otimes X^{-1}$$. So the answer should be the trace of that right? which = $$tr(-X^{-1})tr(X^{-1}).$$

but the solution seems to be $$-X^{-2T}$$? which I can't see

3

There are 3 best solutions below

2
On

$\newcommand{tr}{\operatorname{tr}}$If $i(X)=X^{-1}$, then by all means $D_Xi(H)=-X^{-1}HX^{-1}$. Using the chain rule $$D_Xg(H)=D_X(\tr\circ i)(H)=(D_{i(X)}\tr)(D_Xi(H))=\tr(-X^{-1}HX^{-1})$$

and since $\tr(AB)=\tr(BA)$, we have $$D_Xg(H)=-\tr(X^{-1}HX^{-1})=-\tr(X^{-2}H)=-\tr(HX^{-2})$$

0
On

We will use the following Frobenius product identity \begin{align} \operatorname{tr}\left(A^T B \right) := A:B . \end{align} and use the cyclic property of trace, e.g., \begin{align} A: BCD = B^T A: CD = B^TAD^T: C \end{align}

Further, we will use the differential of invertible matrix $X$ \begin{align} XX^{-1} = I \Longrightarrow dX X^{-1} + X dX^{-1} = 0 \Longleftrightarrow dX^{-1} = -X^{-1} dX X^{-1}. \end{align}

Now, say $f := \operatorname{tr}\left( X^{-1} \right)$, then we find the differential followed by the gradient. \begin{align} df &= d\operatorname{tr}\left( X^{-1} \right) = d\operatorname{tr}\left( I X^{-1} \right) \\ &= I : dX^{-1} \\ &= I : -X^{-1} dX X^{-1} \\ &= - X^{-T} I X^{-T} : dX \\ &= - X^{-2T} : dX \end{align}

Then the gradient is \begin{align} \frac{\partial f}{\partial X} = - X^{-2T}. \end{align}

2
On

The problem is with this equation

$$\frac{\partial}{\partial \boldsymbol{X}} \operatorname{tr}(\boldsymbol{f}(\boldsymbol{X}))=\operatorname{tr}\left(\frac{\partial \boldsymbol{f}(\boldsymbol{X})}{\partial \boldsymbol{X}}\right)$$

Note that on the LHS you are taking the derivative of a function $\mathbb R^{n\times n} \to \mathbb R$, whereas on the RHS you are taking the trying to take the trace of the derivative of a function $f\colon\mathbb R^{n\times n}\to\mathbb R^{n\times n}$. As you already figured out, this derivative can be expressed by a 4-th order tensor $-(X^{-1} \otimes X^{-1})$. Obviously, the result cannot be $-\operatorname{tr}(X^{-1})\operatorname{tr}(X^{-1})$, as this is a scalar, but the result needs to be a second order tensor.