Finding the matrix derivative of $X^{-1}$ with respect to $X$

2k Views Asked by At

Assume $X \in \mathbb{R^{n \times n}}$. I could not found particular formula to calculate the Derivative of $X^{-1}$ with respect to $X$, but I found a formula related to inverse of matrix as follows:

(1)$\frac{\partial}{\partial X} (a^TX^{-1}b) = -X^{-T}ab^TX^{-T} \quad a, b \in \mathbb{R}^n$

Can anyone give an insight on how derive a formula for derivative of $X^{-1}$ or formula (1) please?

Thank you in advance.

======

This post shared and discussed the same topic and I was asked if the current post is redundant. I believe the way the problem stated and discussed in these two posts is different. Specifically, I was trying to learn a simple approach for finding the derivative of a matrix expression that contains inverse of a matrix. I believe the detailed answer and discussion in the post is helpful to other learners like me(with an elementary calculus and matrix understanding).

2

There are 2 best solutions below

4
On BEST ANSWER

In this case, I imagine you want the matrix derivative of the above expression. As such, let $X(t)$ be an invertible matrix on some neighbourhood of $0$, then

$$ X^{-1}(t)X(t) = I \implies \frac{\partial X^{-1}(t)}{\partial t}X(t) + X^{-1}(t)\frac{\partial X(t)}{\partial t} = 0 $$

rearranging and multiplying on the right by the inverse yields $$ \frac{\partial X^{-1}(t)}{\partial t} = -X^{-1}(t)\frac{\partial X(t)}{\partial t} X^{-1}(t). $$

This is probably the derivative you were looking for originally. Anyways, continuing to show (1) is straightforward now,

$$ \frac{\partial a^T X^{-1}b}{\partial t} = a^T\frac{\partial X^{-1}(t)}{\partial t}b = -a^T X^{-1}(t)\frac{\partial X(t)}{\partial t} X^{-1}(t) b $$

Assuming $X(t) = X + tY$, and evaluating at $t=0$ yields

$$ \frac{\partial a^T X^{-1}(t)b}{\partial t}\bigg|_{t=0} = a^T\frac{\partial X^{-1}(t)}{\partial t}\bigg|_{t=0}b =-a^T X^{-1} Y X^{-1} b $$ which, after some rearranging such that the above acts on general $Y$, gives your solution.


I guess I should probably just complete the solution. We usually define, for a differentiable function $F:\mathbb{R}^{m\times m} \to \mathbb{R}$, and $e_{ij} = e_ie_j^T$ where $e_i$ are the standard basis,

$$ \left(\frac{\partial F(A)}{\partial X}\right)_{ij} \equiv \frac{\partial F(A+te_{ij})}{\partial t}\bigg|_{t=0} $$

Note that this is equivalent to taking component-wise derivatives over $X$ when evaluated at a 'point' [i.e. matrix, as given] $M$.

Now, using this, then the above derivative becomes $$ \left(\frac{\partial a^T X^{-1}b}{\partial X}\right)_{ij} = -a^T X^{-1} e_{ij} X^{-1} b $$

or, writing out the multiplication explicitly using kronecker deltas---$\delta_{ij} =1$ when $i=j$ and 0 otherwise---and using Einstein summation convention (e.g. repeated indices are implicitly summed) we get

$$ \begin{align} \left(\frac{\partial a^T X^{-1}b}{\partial X}\right)_{ij} &= -\left(a^T X^{-1}\right)_{k} \delta_{ik}\delta_{j\ell} (X^{-1} b)_{\ell} \\ &= -\left(a^T X^{-1}\right)_{i}(X^{-1} b)_{j} \\ &= -\left(\left(a^T X^{-1}\right)^T(X^{-1} b)^T\right)_{ij}\\ &= -\left(X^{-T}ab^TX^{-T}\right)_{ij} \end{align} $$

as we wished.

4
On

A simpler way to present this uses the formal definition of the derivative of a function $f: E \to F$ where $E$ and $F$ are two normed spaces (such as a space of matrices). The function $f$ has a derivative (or differential) at point $X$ if there exists a linear map $f^\prime(X): E \to F$ such that when $\|H\| \to 0$ $$f(X+H) = f(X) + f^\prime(X)\cdot H + o(H)$$ where $o()$ is the little-o notation and $f^\prime(X)\cdot H$ means the image of $H$ by the linear map $f^\prime(X)$.

Let's apply this to $f(X) = X^{-1}$. First we have, when $H$ is small enough $$(I+H)(I-H) = I - H^2\quad\Rightarrow\quad (I+H)^{-1} = I - H + (I+H)^{-1}H^2 = I - H + o(H)$$ This proves that the derivative at $I$ is $f^\prime(I)\cdot H = -H$

Now let $X$ be invertible, we have $$(X+H)^{-1} = (X (I+X^{-1}H))^{-1} = (I + X^{-1}H)^{-1}X^{-1} = (I - X^{-1}H + o(X^{-1} H))X^{-1} $$ Hence $f(X+H)= f(X) - X^{-1} H X^{-1} + o(H)$, and it follows that $$f^\prime(X)\cdot H = - X^{-1}H X^{-1}$$ This proof is very general: it works not only for matrices but also for inversion in normed algebras.

Now if one takes $\phi(X) = a^T X^{-1} b$, we obtain $\phi^\prime(X)\cdot H = - a^T X^{-1} H X^{-1}b$. The notation $\big(\frac{\partial \phi(X)}{\partial X}\big)_{ij}$ that you are using is equal to $\phi^\prime(X)\cdot E_{ij}$ where $E_{ij}$ is the matrix which all terms are $0$ but the term in position $(i,j)$ which has value $1$. It is easy to see that $\phi^\prime(X)\cdot E_{ij} = - v_i w_j$ where $v_i$ are the components of $X^{-T}a$ and $w_i$ are the components of $X^{-1}b$.