I'm struggling to work with Einstein notation to derive $\frac{dA^{-1}}{dA}$.
I.e. I understand given $A^{-1}A=I$ I can differentiate both sides to get $\frac{d}{dA}A^{-1}A=0$ and apply product rule then rearrange to get $\frac{dA^{-1}}{dA} = -A^{-2}$.
The problem is deriving this in Einstein notation. My indices end up all over the place. I.e.
$\frac{d}{dA_{ij}}(A^{-1}_{kl}A_{lm}) = (\frac{d}{dA_{ij}}A^{-1}_{kl})A_{lm} + A^{-1}_{kl}(\frac{d}{dA_{ij}}A_{lm})= (\frac{d}{dA_{ij}}A^{-1}_{kl})A_{lm} + A^{-1}_{kl}(\delta_{il}\delta_{jm})$
$\iff \frac{d}{dA_{ij}}A^{-1}_{kl} = -A^{-1}_{ki}\delta_{jm}A^{-1}_{lm} = -A^{-1}_{ki}A^{-1}_{lj}$
... a fourth-order tensor?! Obviously I'm doing something wrong, any hints much appreciated.
Let $A$ be an invertible matrix $n$ by $n$ matrix. Then the derivative of the inverse function at $A$ is by definition a linear map $L_A\colon M_n(\mathbb{R})\to M_n(\mathbb{R})$ such that for all $n$ by $n$ matrices $D$:
$$\frac{(A+\delta D)^{-1}-A^{-1}}{\delta}\to L_A (D),$$ as the real valued variable $\delta\to0$.
For any $D$ and $\delta$ sufficiently small (with respect to moduli of the eigenvalues of $A^{-1}D$), we have the following sum converging: $$\Sigma= 1-A^{-1}D\delta+(A^{-1}D\delta)^2-(A^{-1}D\delta)^3+\cdots $$
Then: $$\Sigma A^{-1}(A+\delta D)=\Sigma(1+A^{-1}D\delta)=I_n$$
Thus $\Sigma A^{-1}=(A+\delta D)^{-1}$ and $$\frac{(A+\delta D)^{-1}-A^{-1}}{\delta}=\frac{\Sigma A^{-1}-A^{-1}}{\delta}=\frac{\Sigma-1}\delta A^{-1}$$ $$ =-A^{-1}DA^{-1}+(A^{-1}D)^2A^{-1}\delta+\cdots\to -A^{-1}DA^{-1}, $$ as $\delta \to 0$.
Thus $L_A(D)=-A^{-1}DA^{-1}$. Note that $L_A$ has to be a rank four tensor, as it denotes a linear map from $\mathbb{R}^n\otimes \mathbb{R}^n$ to itself.
As for the order of the indices, let $(L_A)_{ij,kl}$ deonte the coefficient in the $ij$'th entry of $L_A(D)$, where $D$ has a $1$ in the $kl$'th entry and $0$'s elsewhere. Then we have:$$ (L_A)_{ij,kl}=-(A^{-1})_{ik}(A^{-1})_{lj} $$
That is the $l$'th column of $A^{-1}D$ is the $k$'th column of $A^{-1}$, with the remaining entries of $A^{-1}D$ all $0$. Thus to get the entries in the $i$'th row of $-A^{-1}DA^{-1}$, you must multiply $-(A^{-1})_{ik}$ by the entries in the $l$'th row of $A^{-1}$. That is: $$(-A^{-1}DA^{-1})_{ij}=-(A^{-1})_{ik}(A^{-1})_{lj},$$ as required.
Here is a derivation of the expression for $L_A(D)$ which is more similar to your argument. Note this argument assumes $L_A$ exists, and just works out what it is.
Suppose some linear map $L_A$ exists such that $L_A(D)$ is the required limit (top equation). The derivative of the identity function is just the identity linear map $I_A\colon D \mapsto D$. There is a bilinear map on linear maps, sending $(A,B)\mapsto AB$.
If we take the tensor product of the inverse maps and identity, and compose with multiplication, we get the constant function sending $A\mapsto I_n$, with derivative $0_n$.
Thus by the product rule:$$ 0_n=A^{-1}I_A(D)+L_A(D)A =A^{-1}D+L_A(D)A.\qquad (1) $$ Rearranging: $$L_A(D)=-A^{-1}DA^{-1},$$ as before.
Now let's write this in your notation and see where you went wrong: $$ 0=A^{-1}+\frac{dA^{-1}}{dA}A $$ The point is that here the $A^{-1}$ denotes left multiplication by $A^{-1}$ (if you compare with my equation $(1)$). It would be better to write $$ 0=(A^{-1})_L+\frac{dA^{-1}}{dA}A $$
Thus if you compose with right multiplication by $A^{-1}$ you get: $$ 0=(A^{-1})_L(A^{-1})_R+\frac{dA^{-1}}{dA} $$ so $$ \frac{dA^{-1}}{dA}=-(A^{-1})_L(A^{-1})_R $$ The moral of the story is that when multiplication by $A^{-1}$ can mean more than one thing, it is a good idea to use notation which keeps track of exactly what it means.