How to calculate $\dfrac{\partial a^{\rm T}A^{-\rm T}bb^{\rm T}A^{-1}a}{\partial A}$?

563 Views Asked by At

How can I calculate $\dfrac{\partial a^{\rm T}A^{-\rm T}bb^{\rm T}A^{-1}a}{\partial A}$, where $A\in\mathbb{R}^{n\times n}$ and $a,b\in\mathbb{R}^n$?

3

There are 3 best solutions below

2
On BEST ANSWER

Hint

Name $\phi_1 : A \mapsto A^{-1}$, $\phi_2 : A \mapsto b^T A a$ and $\phi_3: A \mapsto A^T A$. Note that your map $\phi$ is $\phi = \phi_3 \circ \phi_2 \circ \phi_1$.

You can then use the chain rule $\phi^\prime = \phi_3^\prime \cdot \phi_2^\prime \cdot \phi_1^\prime$, based on $\phi_1^\prime(A).H =-A^{-1}HA^{-1}$, $\phi_2^\prime(A).H = b^T H a$ and $\phi_3^\prime(A).H = 2A^T H$.

You’ll finally get:

$$\frac{\partial \phi}{\partial A}.H = -2 (b^TA^{-1}a)^Tb^TA^{-1}HA^{-1}a =-2a^T\left(A^{-1}\right)^T bb^T A^{-1}HA^{-1}a$$

4
On

The problem was just modified. If there is b (as now), then the solution would be much simpler. Note that $$a^{\rm T}A^{-\rm T}b = b^{\rm T}A^{-1}a$$ since they are numbers and transposing one of them would give you the other. Hence from chain rule, $$\frac{\partial}{\partial A}(a^{\rm T}A^{-\rm T}bb^{\rm T}A^{-1}a)=2(b^{\rm T}A^{-1}a)\frac{\partial}{\partial A}(b^{\rm T}A^{-1}a)$$ Also note that when we take derivative with respect to $A$, both $a$ and $b$ are treated as constants. Then $$\frac{\partial}{\partial A}(b^{\rm T}A^{-1}a)=b^{\rm T}\frac{\partial A^{-1}}{\partial A}a$$ Finally it remains to calculate $\partial A^{-1}/\partial A$. From the identity $$AA^{-1} = I$$ taking derivative with respect to $A$, we obtain $$\frac{\partial}{\partial A}(AA^{-1})=IA^{-1}+A\frac{\partial A^{-1}}{\partial A}=0$$ Thus $$\frac{\partial A^{-1}}{\partial A}=-A^{-2}.$$

0
On

$ \def\l{\lambda}\def\o{{\tt1}}\def\p{\partial} \def\A{A^{-1}} \def\B{A^{-T}} \def\L{\left}\def\R{\right} \def\LR#1{\L(#1\R)} \def\BR#1{\Big(#1\Big)} \def\trace#1{\operatorname{Tr}\LR{#1}} \def\qiq{\quad\implies\quad} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\c#1{\color{red}{#1}} $Use a colon to denote the Frobenius product, which is a concise notation for the trace, i.e. $$\eqalign{ A:B &= \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij} \;=\; \trace{A^TB} \\ A:A &= \big\|A\big\|^2_F \\ }$$ This is also called the double-dot or double contraction product.
When applied to vectors $(n=\o)$ it reduces to the standard dot product.

The properties of the underlying trace function allow the terms in a Frobenius product to be rearranged in many different but equivalent ways, e.g. $$\eqalign{ A:B &= B:A \\ A:B &= A^T:B^T \\ C:AB &= CB^T:A = A^TC:B \\ }$$

Introduce the scalar variable $$\eqalign{ \l \;=\; {a^T\B b} \;=\; {b^T\A a} \;=\; {ba^T:\A} }$$ whose differential is $$\eqalign{ d\l &= {ba^T:\c{d\A}} \\ &= ba^T:\c{\LR{-\A\;dA\;\A}} \\ &= -\LR{\B ba^T\B}:dA \\ }$$


Use the above notation to write the function, then calculate its differential and gradient. $$\eqalign{ f &= \l^2 \\ df &= 2\l\;\c{d\l} \\ &= -2\l \c{\LR{\B ba^T\B}:dA} \\ \grad{f}{A} &= -2\l \LR{\B ba^T\B} \\ &= -2 \LR{b^T\A a} \LR{\B ba^T\B} \\\\ }$$