How to evaluate the derivatives of matrix inverse?

16.3k Views Asked by At

Cliff Taubes wrote in his differential geometry book that:

We now calculate the directional derivatives of the map $$M\rightarrow M^{-1}$$ Let $\alpha\in M(n,\mathbb{R})$ denote any given matrix. Then the directional derivatives of the coordinates of the map $M\rightarrow M^{-1}$ in the drection $\alpha$ are the entries of the matrix $$-M^{-1}\alpha M^{-1}$$ Consider, for example, the coordinate given by the $(i,j)$th entry, $(M^{-1})_{ij}$. The directional derivative in the drection $\alpha$ of this function on $GL(n,\mathbb{R})$ is $$-(M^{-1}\alpha M^{-1})_{ij}$$ In particular, the partial derivative of the function $M\rightarrow (M^{-1})_{ij}$ with respect to the coordinate $M_{rs}$ is $-(M^{-1})_{ir}(M^{-1})_{sj}$.

I am wondering why this is true. He did not give any deduction of this formula, and all the formulas I know for matrix inverse does not generate anything similar to his result. So I venture to ask.

2

There are 2 best solutions below

9
On BEST ANSWER

Not sure if this is the type of answer you want, since I'm giving another argument rather than explain his argument. However, this is how I usually think of it.

Let $M$ be a matrix and $\delta M$ the infinitesimal perturbation (e.g. $\epsilon$ times the derivative). Now, let $N=M^{-1}$ and $\delta N$ the corresponding perturbation of the inverse so that $N+\delta N=(M+\delta M)^{-1}$. Including only first order perturbations (i.e. ignoring terms with two $\delta$s), this gives $$ \begin{split} I=&(M+\delta M)(N+\delta N)=MN+M\,\delta N+\delta M\,N\\ &\implies M\,\delta N=-\delta M\,N=-\delta M\,M^{-1}\\ &\implies \delta N=-M^{-1}\,\delta M\,M^{-1}.\\ \end{split} $$ Written in terms of derivatives, i.e. $M'=dM/ds$ and $N'=dN/ds$ where $M=M(s)$ and $N=N(s)$ and $M(s)N(s)=I$, the same would be written $$ 0=I'=(MN)'=M'N+MN'\implies N'=-M^{-1}\,M'\,M^{-1}. $$


To address some of the comments, although a bit belatedly:

For example, if you let $M(s)=M+s\Delta M$, this makes the derivative $M'(s)=\Delta M$ for all $s$. This makes $N(s)=M(s)^{-1}=(M+s\Delta M)^{-1}$, and you can use $M(s)\cdot N(s)=I$, and differentiate to get the above expressions.

For any partial derivative, e.g. with respect to $M_{rs}$, just set $\Delta M$ to be the matrix $E^{[rs]}$ with $1$ in cell $(r,s)$ and zero elsewhere, and you get $$ \frac{\partial}{M_{rs}} M^{-1} = -M^{-1}\frac{\partial M}{\partial M_{rs}} M^{-1} = -M^{-1} E^{[rs]} M^{-1} $$ which makes cell $(i,j)$ of the inverse $$ \frac{\partial (M^{-1})_{ij}}{\partial M_{rs}} = -(M^{-1})_{ir}(M^{-1})_{sj}. $$

0
On

There is a different (not so useful) form with the same result, but I'm not sure why. You can write Cramer's rule in the following form: $$ A_{ij} \frac{\partial |A|}{A_{kj}} = |A| \delta_{ik} $$ where $\delta_{ik}$ are the entries of the identity, and the partial derivative is the $\pm$cofactor, so that $$ {A^{-1}}_{ml} = \frac{\partial\ln(|A|)}{\partial A_{lm}} = \frac{\partial |A|}{|A| \partial A_{lm}}.$$ Using the product rule, $$ \frac{\partial A^{-1}_{ml}}{\partial A_{ij}} = \frac{\partial^2|A|}{|A|\partial A_{ij}\partial A_{lm}} - {A^{-1}_{ji}}{A^{-1}_{ml}},$$ where the first term is a repeated cofactor, and the second the product of two inverse matrix elements, compared with the outer product of columns and rows of $A^{-1}$ as in the given answer.

Are they equal? Let's try the $2\times2$ matrix $A=\pmatrix{a & b \cr c & d}$ with inverse $A^{-1}=\frac{1}{ad-bc}\pmatrix{d & -b \cr -c & a}$: $$\frac{\partial A^{-1}}{\partial a} = \frac{ad-bc}{(ad-bc)^2}\pmatrix{0 & 0 \cr 0 & 1} - \frac{d}{(ad-bc)^2}\pmatrix{d & -b \cr -c & a} = \frac{-1}{(ad-bc)^2}\pmatrix{d \cr -c}\pmatrix{d & -b},$$ and so on, so yes, at least for this case. It has to be the same in general, but I don't see why.