Derivative of the inverse of a matrix in regards to a product

106 Views Asked by At

I am having problem with:

$$ \frac{\delta AX^{-1}B}{\delta X} = \space ? $$

What I did is the following:

$$ I = XX^{-1} $$ $$ \frac{\delta AXX^{-1}B}{\delta X}=A \frac{\delta X}{\delta X}X^{-1}B + AX \frac{\delta X^{-1}}{\delta X}B $$ $$ AX \frac{\delta X^{-1}}{\delta X}B = -A\frac{\delta X}{\delta X}X^{-1}B $$ $$ (A) (AX)^{-1} AX \frac{\delta X^{-1}}{\delta X}B = -(A)(AX)^{-1}A \frac{\delta X}{\delta X}X^{-1}B $$ $$ A \frac{\delta X^{-1}}{\delta X}B = -AX^{-1} \frac{\delta X}{\delta X}X^{-1}B $$ $$ \frac{\delta AX^{-1}B}{\delta X} = -AX^{-1}X^{-1}B $$

But in the Matrix Cookbook the result for inverse of a trace is:

$$ \frac{\delta tr(AX^{-1}B)}{\delta X} = -(X^{-1}BAX^{-1}){T} $$

From which I conclude that it should be

$$ \frac{\delta AX^{-1}B}{\delta X} = -X^{-1}BAX^{-1} $$

And I cannot find the reason why the BA is inside and switched places and please do not use frobenious product for solution.

2

There are 2 best solutions below

0
On BEST ANSWER

Alternative to greg's fourth-order tensor, one can vectorize and exploit Kronecker products.

Using differential result of greg's solution, that is, \begin{align} dY = -\underbrace{AX^{-1}}_{:=\color{blue}{\widetilde{A}}} \ \color{red}{dX} \ \underbrace{X^{-1}B}_{:=\color{green}{\widetilde{B}}} := -\color{blue}{\widetilde{A}} \ \color{red}{dX} \ \color{green}{\widetilde{B}}, \end{align} we now vectorize both sides such that \begin{align} \operatorname{vec}\left(dY\right) = -\operatorname{vec}\left(\color{blue}{\widetilde{A}} \ \color{red}{dX} \ \color{green}{\widetilde{B}}\right) = -\left(\color{green}{\widetilde{B}}^T \otimes \color{blue}{\widetilde{A}}\right) \operatorname{vec}\left(\color{red}{dX}\right). \end{align}

Then, the gradient can be written as \begin{align} \frac{\partial y}{\partial x} = -\left(\color{green}{\widetilde{B}}^T \otimes \color{blue}{\widetilde{A}}\right) . \end{align}

2
On

$ \def\o{{\tt1}}\def\p{\partial} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\L{\left}\def\R{\right} \def\LR#1{\L(#1\R)} \def\c#1{\color{red}{#1}} \def\vec#1{\operatorname{vec}\LR{#1}} $The gradient of a matrix with respect to itself can be written in component form using the single-entry matrix $E_{ij}$ all of whose elements equal zero except the $(i,j)$ element which equals $\o$. $$\eqalign{ \grad{X}{X_{ij}} &=E_{ij} \\ }$$ This can be used to calculate the element-wise gradient of the function $$\eqalign{ Y &= AX^{-1}B \\ dY &= A\;\c{dX^{-1}}B = A\,\c{\Big(\!-\!X^{-1}\,dX\,X^{-1}\Big)}B \\ \grad{Y}{X_{ij}} &= -AX^{-1}E_{ij}\,X^{-1}B \\ }$$ The full gradient is a fourth-order tensor, which can be written as the sum of the dyadic products $(\star)$ of these matrix components with the corresponding single-entry matrix $$\eqalign{ \grad{Y}{X} &= \sum_{i=\o}^n\sum_{j=\o}^n\LR{\grad{Y}{X_{{ij}}}}\star E_{{ij}} \\ }$$