Derivative of inverse of matrix wrt itself in Einstein notation

Question

Derivative of inverse of matrix wrt itself in Einstein notation

1.2k Views Asked by Bumbble Comm At 27 Mar 2026 - 4:55

I'm struggling to work with Einstein notation to derive $\frac{dA^{-1}}{dA}$.

I.e. I understand given $A^{-1}A=I$ I can differentiate both sides to get $\frac{d}{dA}A^{-1}A=0$ and apply product rule then rearrange to get $\frac{dA^{-1}}{dA} = -A^{-2}$.

The problem is deriving this in Einstein notation. My indices end up all over the place. I.e.

$\frac{d}{dA_{ij}}(A^{-1}_{kl}A_{lm}) = (\frac{d}{dA_{ij}}A^{-1}_{kl})A_{lm} + A^{-1}_{kl}(\frac{d}{dA_{ij}}A_{lm})= (\frac{d}{dA_{ij}}A^{-1}_{kl})A_{lm} + A^{-1}_{kl}(\delta_{il}\delta_{jm})$

$\iff \frac{d}{dA_{ij}}A^{-1}_{kl} = -A^{-1}_{ki}\delta_{jm}A^{-1}_{lm} = -A^{-1}_{ki}A^{-1}_{lj}$

... a fourth-order tensor?! Obviously I'm doing something wrong, any hints much appreciated.

Original Q&A

There are 2 best solutions below

Bumbble Comm On 25 Sep 2020 - 7:35

The answer is in fact supposed to be a fourth-order tensor. When you differentiate a rank-$r$ tensor with respect to a rank-$s$ tensor, you get a rank-$(r+s-2k)$ tensor if you contract $k$ pairs away, but by default $k=0$.

If this surprises you, consider a result for vectors, $\frac{\partial V_i}{\partial V_j}=\delta_{ij}$. In the problem at hand, we get a "square matrix" from the vector space in which $A,\,A^{-1}$ live as vectors to itself; but relative to the space $A$ acts on, the number of indices is $2^2=4$.

Your calculation strategy shows$$0=\frac{\partial\delta_{km}}{\partial A_{ij}}=\frac{\partial A^{-1}_{kl}}{\partial A_{ij}}A_{lm}+A^{-1}_{kl}\color{blue}{\frac{\partial A_{lm}}{\partial A_{ij}}}.$$The blue derivative depends on the degrees of freedom, e.g. it shouldn't just be a product of two Kronecker deltas if $A$ varies within the symmetric matrices. So let's not evaluate it yet. What matters is$$\frac{\partial A_{kn}^{-1}}{\partial A_{ij}}=\frac{\partial A_{kl}^{-1}}{\partial A_{ij}}A_{lm}A^{-1}_{mn}=-A^{-1}_{kl}\frac{\partial A_{lm}}{\partial A_{ij}}A^{-1}_{mn}.$$Finally, if $A$ is arbitrary, you correctly deduce$$\frac{\partial A_{lm}}{\partial A_{ij}}=\delta_{il}\delta_{jm}\implies\frac{\partial A_{kn}^{-1}}{\partial A_{ij}}=-A^{-1}_{ki}A^{-1}_{jn}.$$Or in matrix notation, infinitesimal variations satisfy $dA^{-1}=-A^{-1}dA\cdot A^{-1}$, which in the $1\times1$ case reduces to the usual $dx^{-1}=-x^{-2}dx$. Notice that if $\frac{\partial A^{-1}}{\partial A}$ were some rank-$2$ matrix $M$, that would imply $dA^{-1}=MdA$, which isn't the most general possible linear relationship between infinitesimal matrices $dA,\,dA^{-1}$, because square matrices live in a space whose dimension is the square of that of the original vectors.

**Bumbble Comm** · Accepted Answer

Let $A$ be an invertible matrix $n$ by $n$ matrix. Then the derivative of the inverse function at $A$ is by definition a linear map $L_A\colon M_n(\mathbb{R})\to M_n(\mathbb{R})$ such that for all $n$ by $n$ matrices $D$:

$$\frac{(A+\delta D)^{-1}-A^{-1}}{\delta}\to L_A (D),$$ as the real valued variable $\delta\to0$.

For any $D$ and $\delta$ sufficiently small (with respect to moduli of the eigenvalues of $A^{-1}D$), we have the following sum converging: $$\Sigma= 1-A^{-1}D\delta+(A^{-1}D\delta)^2-(A^{-1}D\delta)^3+\cdots $$

Then: $$\Sigma A^{-1}(A+\delta D)=\Sigma(1+A^{-1}D\delta)=I_n$$

Thus $\Sigma A^{-1}=(A+\delta D)^{-1}$ and $$\frac{(A+\delta D)^{-1}-A^{-1}}{\delta}=\frac{\Sigma A^{-1}-A^{-1}}{\delta}=\frac{\Sigma-1}\delta A^{-1}$$ $$ =-A^{-1}DA^{-1}+(A^{-1}D)^2A^{-1}\delta+\cdots\to -A^{-1}DA^{-1}, $$ as $\delta \to 0$.

Thus $L_A(D)=-A^{-1}DA^{-1}$. Note that $L_A$ has to be a rank four tensor, as it denotes a linear map from $\mathbb{R}^n\otimes \mathbb{R}^n$ to itself.

As for the order of the indices, let $(L_A)_{ij,kl}$ deonte the coefficient in the $ij$'th entry of $L_A(D)$, where $D$ has a $1$ in the $kl$'th entry and $0$'s elsewhere. Then we have:$$ (L_A)_{ij,kl}=-(A^{-1})_{ik}(A^{-1})_{lj} $$

That is the $l$'th column of $A^{-1}D$ is the $k$'th column of $A^{-1}$, with the remaining entries of $A^{-1}D$ all $0$. Thus to get the entries in the $i$'th row of $-A^{-1}DA^{-1}$, you must multiply $-(A^{-1})_{ik}$ by the entries in the $l$'th row of $A^{-1}$. That is: $$(-A^{-1}DA^{-1})_{ij}=-(A^{-1})_{ik}(A^{-1})_{lj},$$ as required.

Here is a derivation of the expression for $L_A(D)$ which is more similar to your argument. Note this argument assumes $L_A$ exists, and just works out what it is.

Suppose some linear map $L_A$ exists such that $L_A(D)$ is the required limit (top equation). The derivative of the identity function is just the identity linear map $I_A\colon D \mapsto D$. There is a bilinear map on linear maps, sending $(A,B)\mapsto AB$.

If we take the tensor product of the inverse maps and identity, and compose with multiplication, we get the constant function sending $A\mapsto I_n$, with derivative $0_n$.

Thus by the product rule:$$ 0_n=A^{-1}I_A(D)+L_A(D)A =A^{-1}D+L_A(D)A.\qquad (1) $$ Rearranging: $$L_A(D)=-A^{-1}DA^{-1},$$ as before.

Now let's write this in your notation and see where you went wrong: $$ 0=A^{-1}+\frac{dA^{-1}}{dA}A $$ The point is that here the $A^{-1}$ denotes left multiplication by $A^{-1}$ (if you compare with my equation $(1)$). It would be better to write $$ 0=(A^{-1})_L+\frac{dA^{-1}}{dA}A $$

Thus if you compose with right multiplication by $A^{-1}$ you get: $$ 0=(A^{-1})_L(A^{-1})_R+\frac{dA^{-1}}{dA} $$ so $$ \frac{dA^{-1}}{dA}=-(A^{-1})_L(A^{-1})_R $$ The moral of the story is that when multiplication by $A^{-1}$ can mean more than one thing, it is a good idea to use notation which keeps track of exactly what it means.

Derivative of inverse of matrix wrt itself in Einstein notation

There are 2 best solutions below

Related Questions in CALCULUS

Related Questions in MATRICES

Related Questions in MULTIVARIABLE-CALCULUS

Related Questions in TENSORS

Related Questions in INDEX-NOTATION

Trending Questions

Popular # Hahtags

Popular Questions