Let $\text{Inv}: GL_n(\mathbb{R})\to GL_n(\mathbb{R})$ be defined by $\text{Inv}(X)= X^{-1}.$ Problem $33$ in Chapter $5$ of Pugh's Real Mathematical Analysis states: "Observe that $Y = \text{Inv}(X)$ solves the implicit function problem $F(X, Y)-I =0$ where $F(X, Y)= XY.$ Assume it is known that $\text{Inv}$ is smooth and use the chain rule to derive from this equation a formula for the derivative of $\text{Inv}.$"
My attempt: Differentiating both sides of the given equation wrt $X$ at an arbitrary $A \in M_{n}(\mathbb{R})$and using the chain rule for the $LHS$ we get $\left(\dfrac{\partial F}{\partial X}\right)_{A}+ \left(\dfrac{\partial F}{\partial Y}\right)_AY'(A) =0.$ My problem is in evaluating the partials of $F$ wrt $X$ and $Y.$
For $n=2,$ I wrote $X= \begin{pmatrix} x_{11} &x_{12}\\x_{21}&x_{22}\end{pmatrix}$ and $Y= \begin{pmatrix} y_{11} &y_{12}\\y_{21}&y_{22}\end{pmatrix}$ so that $F(X, Y) = XY =\begin{pmatrix} x_{11}y_{11}+x_{12}y_{21} &x_{11}y_{12}+x_{12}y_{22}\\x_{21}y_{11}+x_{22}y_{21}&x_{21}{y_{12}+x_{22}y_{22}}\end{pmatrix}$ then $\dfrac{\partial F}{\partial X} = \begin{pmatrix} Y^{T} &0_{2\times2}\\0_{2\times2}&Y^{T}\end{pmatrix} $ and $\dfrac{\partial F}{\partial Y} = X \otimes I_{2\times 2}$ where $\otimes$ denotes the Kronecker product. For $\dfrac{\partial F}{\partial X}$ to act on $A$, we must write $A$ as a $4 \times 1$ column vector $\begin{pmatrix} a_{11}&a_{12}&a_{21}&a_{22}\end{pmatrix}^{T}$ so that rewriting $\left(\dfrac{\partial F}{\partial X}\right)_{A} = \dfrac{\partial F}{\partial X}(A)$ as a $2\times 2$ matrix we get $AY$ and rewriting $\left(\dfrac{\partial F}{\partial Y}\right)_{A} = \dfrac{\partial F}{\partial Y}(A)$ we get $XA$. Plugging back in the above equation and using the fact that $Y= X^{-1}$ we get $AY'(A) = -X^{-1}AX^{-1}.$
My question: what have I done wrong that caused me to get stuck with an extra $A$ on the LHS? (From first principles I have proven that correct answer is $Y'(A) = -X^{-1}AX^{-1})$ but I am unable to get that formula from this approach. I am unable to pinpoint what I have made a mistake in. Is it the inconsistent treatment of $A$? I'd appreciate any pointers or hints. Thanks in advance.
It always helps to write everything down in index notation (summation convention, not writing the sums).
$$F_{ij}=X_{ik}Y_{kj}$$ $$\frac{\partial F_{ij}}{\partial X_{ab}}=\delta_{ia}\delta_{kb}Y_{kj}=\delta_{ia}Y_{bj}$$ Here, $Y$ was treated as a constant and derivative of a matrix with respect to itself is just $1$ for same components and $0$ otherwise.
Analogously: $$\frac{\partial F_{ij}}{\partial Y_{ab}}=X_{ia}\delta_{jb}$$
Observe that we are differentiating with respect to $X_{ab}$: different indices than above. This gives a four-index object that only yields a matrix when applied to $A_{ab}$. So, the entire object acts on something with $ab$ indices, and you are not mixing $X$ with $A$ too soon.
Everything at once:
$$\left(\frac{\partial F_{ij}}{\partial X_{ab}}\right)+ \left(\frac{\partial F_{ij}}{\partial Y_{pq}}\right) \left(\frac{\partial Y_{pq}}{\partial X_{ab}}\right)=0$$ This is simple chain rule. Observe that the last parentheses only becomes $Y'(A)$ when applied to $A$. So now we apply this entire object to $A$: $$\left(\frac{\partial F_{ij}}{\partial X_{ab}}\right)A_{ab}+ \left(\frac{\partial F_{ij}}{\partial Y_{pq}}\right) \underbrace{\left(\frac{\partial Y_{pq}}{\partial X_{ab}}\right)A_{ab}}_{Y'_{pq}(A)}=0$$
Put in what we derived before:
$$\delta_{ia}Y_{bj}A_{ab}+ X_{ip}\delta_{jq} Y'_{pq}(A)=0$$ Use the Kronecker delta operators and put first term across:
$$X_{ip} Y'_{pj}(A)=-Y_{bj}A_{ib}$$ $$Y'_{pj}(A)=-X^{-1}_{pi}X^{-1}_{bj}A_{ib}=-X^{-1}_{pi}A_{ib}X^{-1}_{bj}=-(X^{-1}AX^{-1})_{pj}$$ From next-to-last to last line, we multiplied by the inverse of $X$ from the left (this reverses the indices), and then reordered terms so that the inner indices match, so we can go back to the indexless notation.