How to show $\frac {\partial a^{T}X^{-1}b}{\partial X} = -\left( X^{-1}\right) ^{T}ab^{T}\left( X^{-1}\right) ^{T}$?

194 Views Asked by At

I am struggling with this proof where $X$ is $m \times n$ matrix, $a$ is $m$ vector, $b$ is $n$ vector.

$$\frac {\partial a^{T}X^{-1}b}{\partial X} = -\left( X^{-1}\right) ^{T}ab^{T}\left( X^{-1}\right) ^{T}$$

I know $$\frac {\partial }{\partial X}f\left( X\right) ^{-1}=-f\left( X\right) ^{-1}\dfrac {\partial f\left( X\right) }{\partial x}f\left( X\right) ^{-1}$$

and am guessing to use this fact, I also know $\dfrac {\partial a^{T}Xb}{\partial X} = ab^{T}$.

When I use the chain rule I don't seem to get the form with the transposes.

I believe the result should be $\in \mathbb{R} ^{1\times \left( m\times n\right) }$

1

There are 1 best solutions below

6
On BEST ANSWER

Before we start deriving the gradient, some facts and notations for brevity:

  • Trace and Frobenius product relation $$\left\langle A, B C\right\rangle={\rm tr}(A^TBC) := A : B C$$
  • Cyclic properties of Trace/Frobenius product \begin{align} A : B C &= BC : A \\ &= A C^T : B \\ &= {\text{etc.}} \cr \end{align}

Firstly, we obtain the differential for $X^{-1}$, which will be utilized for the gradient you are seeking: \begin{align} d\left[X^{-1}X = I\right] &= dX^{-1} X + X^{-1}dX = 0 \\ & \Leftrightarrow dX^{-1} = -X^{-1} dX X^{-1} \ . \end{align}

Let $f := a^T X^{-1} b = a: X^{-1} b$.

Now, we can obtain the differential first, and then the gradient of $\frac{\partial f}{\partial X}$. \begin{align} df &= a: dX^{-1} b \\ &= a: -X^{-1} dX X^{-1} b\\ &= -X^{-T} a b^T X^{-T} : dX \\ \end{align}

Thus, the gradient is \begin{align} \frac{\partial f}{\partial X} = -X^{-T} a b^T X^{-T}. \end{align}