Matrix Calculus Partial Derivative

2.6k Views Asked by At

Can anyone explain the partial derivative below:

$\frac{\partial a^tX^{-1}b}{\partial X} = -X^{-t}ab^tX^{-t}$

I was trying to derive this equation using the below formula, but failed.

enter image description here

3

There are 3 best solutions below

2
On

Let $Y = X^{-1}$, since it's easier to type.

Taking the differential of $I=Y\cdot X$ you'll find that $$dY = -Y\cdot dX\cdot Y$$

Now rearrange $a'\cdot Y\cdot b$ into $ab':Y$ and take the differential $$\eqalign{ d(ab':Y) &= ab':dY \cr &= -ab':(Y\cdot dX\cdot Y) \cr &= -(Y'\cdot ab'\cdot Y'):dX \cr }$$ Passing to the derivative $$ \frac{\partial(ab':Y)}{\partial X} = -(Y'\cdot ab'\cdot Y') $$

0
On

Here's another way you might consider computing the derivative of $f(X)=a^TX^{-1}b$,

\begin{align} f(X+H)=a^T(X+H)^{-1}b&=a^T((I+HX^{-1})X)^{-1}b\\[10pt] &=a^TX^{-1}(I+HX^{-1})^{-1}b\\[1pt] &=a^{T}X^{-1}\sum_{n=0}^\infty(-1)^n(HX^{-1})^nb \end{align}

Where the final equality follows from the closed form for the matrix geometric series.

For $\|H\|$ small,

$$a^{T}X^{-1}\sum_{n=0}^\infty(-1)^n(HX^{-1})^nb\approx a^{T}X^{-1}(I-HX^{-1})b=\underbrace{a^{T}X^{-1}b}_{f}+\underbrace{(-a^{T}X^{-1}HX^{-1}b)}_{\nabla_Hf}$$

Now to determine $\nabla f$, we need to write $\nabla_Hf$ as a matrix inner product,

$$\nabla_Hf=-a^{T}X^{-1}HX^{-1}b=-\text{tr}(X^{-1}ba^TX^{-1}H)=\langle -X^{-T}ab^TX^{-T},\; H\rangle\\[1pt]$$

Therefore $\nabla f=-X^{-T}ab^TX^{-T}$.

0
On

A totally mechanical approach. By the chain rule:

$$\frac{∂a^⊤ X^{-1} b}{∂ X} = \frac{∂a^⊤ X^{-1} b}{∂ X^{-1}}∘\frac{∂X^{-1} }{∂ X}$$

Consider the first term $\frac{∂a^⊤ X^{-1} b}{∂ X^{-1}}$. Note that the nominator is linear in $X^{-1}$, therefore its derivative is found directly by bringing it to the standard form of a linear function "$x↦A⋅x$":

$$a^⊤ X^{-1} b = ⟨ab^⊤∣X^{-1}⟩ ⟹ \frac{∂a^⊤ X^{-1} b}{∂ X^{-1}} = ab^⊤$$

Secondly, let's figure out $\frac{∂X^{-1}}{∂ X}$ first. Note that

$$ X⋅X^{-1} = ⟹ \frac{d}{dX}(X⋅X^{-1}) =0$$

Apply product rule:

$$\begin{aligned} 0 = \frac{d}{dX}(X⋅X^{-1}) &= \frac{∂\, Y⋅Z}{∂(Y, Z)}\Bigg|_{\begin{aligned}Y&=X\\ Z&=X^{-1}\end{aligned}} \cdot \frac{∂(X, X^{-1})}{∂X} \\&= \begin{bmatrix}⊗X^{-⊤},\, X⊗\end{bmatrix}⋅\begin{bmatrix}⊗\\ \frac{∂X^{-1}}{∂ X}\end{bmatrix} \\&= (⊗X^{-⊤}) + (X⊗)\frac{∂X^{-1}}{∂ X} \\⟹ \frac{∂X^{-1}}{∂ X} &= -(X⊗)^{-1}(⊗X^{-⊤}) \\&= -(X^{-1}⊗)(⊗X^{-⊤}) = -X^{-1}⊗X^{-⊤} \end{aligned}$$

That is, $\frac{∂X^{-1}}{∂ X}$ is the linear map $V↦ (-X^{-1}⊗X^{-⊤})⋅V = -X^{-1}VX^{-1}$


Putting both together we have:

$$\begin{aligned} \frac{∂a^⊤ X^{-1} b}{∂ X^{-1}}∘\frac{∂X^{-1} }{∂ X} &= (V↦ ⟨ab^⊤∣V⟩) ∘ (V↦ -X^{-1}VX^{-1}) \\ &= (V↦ ⟨ab^⊤∣-X^{-1}VX^{-1}⟩) \\ &= (V↦ ⟨-X^{-⊤}ab^⊤X^{-⊤}∣V⟩) \end{aligned}$$