Derivative of inverse of matrix wrt itself in Einstein notation

1.2k Views Asked by At

I'm struggling to work with Einstein notation to derive $\frac{dA^{-1}}{dA}$.

I.e. I understand given $A^{-1}A=I$ I can differentiate both sides to get $\frac{d}{dA}A^{-1}A=0$ and apply product rule then rearrange to get $\frac{dA^{-1}}{dA} = -A^{-2}$.

The problem is deriving this in Einstein notation. My indices end up all over the place. I.e.

$\frac{d}{dA_{ij}}(A^{-1}_{kl}A_{lm}) = (\frac{d}{dA_{ij}}A^{-1}_{kl})A_{lm} + A^{-1}_{kl}(\frac{d}{dA_{ij}}A_{lm})= (\frac{d}{dA_{ij}}A^{-1}_{kl})A_{lm} + A^{-1}_{kl}(\delta_{il}\delta_{jm})$

$\iff \frac{d}{dA_{ij}}A^{-1}_{kl} = -A^{-1}_{ki}\delta_{jm}A^{-1}_{lm} = -A^{-1}_{ki}A^{-1}_{lj}$

... a fourth-order tensor?! Obviously I'm doing something wrong, any hints much appreciated.

2

There are 2 best solutions below

6
On BEST ANSWER

Let $A$ be an invertible matrix $n$ by $n$ matrix. Then the derivative of the inverse function at $A$ is by definition a linear map $L_A\colon M_n(\mathbb{R})\to M_n(\mathbb{R})$ such that for all $n$ by $n$ matrices $D$:

$$\frac{(A+\delta D)^{-1}-A^{-1}}{\delta}\to L_A (D),$$ as the real valued variable $\delta\to0$.

For any $D$ and $\delta$ sufficiently small (with respect to moduli of the eigenvalues of $A^{-1}D$), we have the following sum converging: $$\Sigma= 1-A^{-1}D\delta+(A^{-1}D\delta)^2-(A^{-1}D\delta)^3+\cdots $$

Then: $$\Sigma A^{-1}(A+\delta D)=\Sigma(1+A^{-1}D\delta)=I_n$$

Thus $\Sigma A^{-1}=(A+\delta D)^{-1}$ and $$\frac{(A+\delta D)^{-1}-A^{-1}}{\delta}=\frac{\Sigma A^{-1}-A^{-1}}{\delta}=\frac{\Sigma-1}\delta A^{-1}$$ $$ =-A^{-1}DA^{-1}+(A^{-1}D)^2A^{-1}\delta+\cdots\to -A^{-1}DA^{-1}, $$ as $\delta \to 0$.

Thus $L_A(D)=-A^{-1}DA^{-1}$. Note that $L_A$ has to be a rank four tensor, as it denotes a linear map from $\mathbb{R}^n\otimes \mathbb{R}^n$ to itself.

As for the order of the indices, let $(L_A)_{ij,kl}$ deonte the coefficient in the $ij$'th entry of $L_A(D)$, where $D$ has a $1$ in the $kl$'th entry and $0$'s elsewhere. Then we have:$$ (L_A)_{ij,kl}=-(A^{-1})_{ik}(A^{-1})_{lj} $$

That is the $l$'th column of $A^{-1}D$ is the $k$'th column of $A^{-1}$, with the remaining entries of $A^{-1}D$ all $0$. Thus to get the entries in the $i$'th row of $-A^{-1}DA^{-1}$, you must multiply $-(A^{-1})_{ik}$ by the entries in the $l$'th row of $A^{-1}$. That is: $$(-A^{-1}DA^{-1})_{ij}=-(A^{-1})_{ik}(A^{-1})_{lj},$$ as required.


Here is a derivation of the expression for $L_A(D)$ which is more similar to your argument. Note this argument assumes $L_A$ exists, and just works out what it is.

Suppose some linear map $L_A$ exists such that $L_A(D)$ is the required limit (top equation). The derivative of the identity function is just the identity linear map $I_A\colon D \mapsto D$. There is a bilinear map on linear maps, sending $(A,B)\mapsto AB$.

If we take the tensor product of the inverse maps and identity, and compose with multiplication, we get the constant function sending $A\mapsto I_n$, with derivative $0_n$.

Thus by the product rule:$$ 0_n=A^{-1}I_A(D)+L_A(D)A =A^{-1}D+L_A(D)A.\qquad (1) $$ Rearranging: $$L_A(D)=-A^{-1}DA^{-1},$$ as before.

Now let's write this in your notation and see where you went wrong: $$ 0=A^{-1}+\frac{dA^{-1}}{dA}A $$ The point is that here the $A^{-1}$ denotes left multiplication by $A^{-1}$ (if you compare with my equation $(1)$). It would be better to write $$ 0=(A^{-1})_L+\frac{dA^{-1}}{dA}A $$

Thus if you compose with right multiplication by $A^{-1}$ you get: $$ 0=(A^{-1})_L(A^{-1})_R+\frac{dA^{-1}}{dA} $$ so $$ \frac{dA^{-1}}{dA}=-(A^{-1})_L(A^{-1})_R $$ The moral of the story is that when multiplication by $A^{-1}$ can mean more than one thing, it is a good idea to use notation which keeps track of exactly what it means.

5
On

The answer is in fact supposed to be a fourth-order tensor. When you differentiate a rank-$r$ tensor with respect to a rank-$s$ tensor, you get a rank-$(r+s-2k)$ tensor if you contract $k$ pairs away, but by default $k=0$.

If this surprises you, consider a result for vectors, $\frac{\partial V_i}{\partial V_j}=\delta_{ij}$. In the problem at hand, we get a "square matrix" from the vector space in which $A,\,A^{-1}$ live as vectors to itself; but relative to the space $A$ acts on, the number of indices is $2^2=4$.

Your calculation strategy shows$$0=\frac{\partial\delta_{km}}{\partial A_{ij}}=\frac{\partial A^{-1}_{kl}}{\partial A_{ij}}A_{lm}+A^{-1}_{kl}\color{blue}{\frac{\partial A_{lm}}{\partial A_{ij}}}.$$The blue derivative depends on the degrees of freedom, e.g. it shouldn't just be a product of two Kronecker deltas if $A$ varies within the symmetric matrices. So let's not evaluate it yet. What matters is$$\frac{\partial A_{kn}^{-1}}{\partial A_{ij}}=\frac{\partial A_{kl}^{-1}}{\partial A_{ij}}A_{lm}A^{-1}_{mn}=-A^{-1}_{kl}\frac{\partial A_{lm}}{\partial A_{ij}}A^{-1}_{mn}.$$Finally, if $A$ is arbitrary, you correctly deduce$$\frac{\partial A_{lm}}{\partial A_{ij}}=\delta_{il}\delta_{jm}\implies\frac{\partial A_{kn}^{-1}}{\partial A_{ij}}=-A^{-1}_{ki}A^{-1}_{jn}.$$Or in matrix notation, infinitesimal variations satisfy $dA^{-1}=-A^{-1}dA\cdot A^{-1}$, which in the $1\times1$ case reduces to the usual $dx^{-1}=-x^{-2}dx$. Notice that if $\frac{\partial A^{-1}}{\partial A}$ were some rank-$2$ matrix $M$, that would imply $dA^{-1}=MdA$, which isn't the most general possible linear relationship between infinitesimal matrices $dA,\,dA^{-1}$, because square matrices live in a space whose dimension is the square of that of the original vectors.