How do we calculate the derivative of the inverse of a matrix?

54 Views Asked by At

When I was reading Chapter 10 of B&V's Convex Optimization book, I encountered a first-order approximation on Page 554, $$\tag{1} (X+\Delta X_{nt})^{-1}\approx X^{-1}-X^{-1}\Delta X_{nt} X^{-1} $$ As we know that the first-order approximation is $f(x+\Delta x)\approx f(x)+Df(x)\Delta x$ in the derivative form or $f(x+\Delta x)\approx f(x)+\nabla f(x)^T\Delta x$ in the gradient form. I tried to figure out the second term on the RHS, but I failed. I found a formula in Matrix Cookbook as follows.

$$\tag{2} \frac{\partial Y^{-1}}{\partial x}=-Y^{-1}\frac{\partial Y}{\partial x}Y^{-1} $$ But $X$ is a matrix in (1), not a scalar. I get more confused when I recalled that taking a derivative on a matrix w.r.t another matrix will result in a bigger matrix built in a form of Kronecker product, e.g., $\nabla_X(AXB)=B\otimes A^T$. So, I think that (2) is not helpful for deriving (1).

Any instruction will be appreciated.

1

There are 1 best solutions below

2
On

Let $f(X):=X^{-1}$. Starting with $$X\cdot f(X)=I$$ and differentiating both sides gives $$\Delta X\cdot f(X)+X\cdot Df(X)(\Delta X)=0.$$ Therefore $$Df(X)(\Delta X)=-X^{-1}\cdot\Delta X\cdot f(X)=-X^{-1}\cdot\Delta X\cdot X^{-1}.$$