How can I calculate $\dfrac{\partial a^{\rm T}A^{-\rm T}bb^{\rm T}A^{-1}a}{\partial A}$, where $A\in\mathbb{R}^{n\times n}$ and $a,b\in\mathbb{R}^n$?
How to calculate $\dfrac{\partial a^{\rm T}A^{-\rm T}bb^{\rm T}A^{-1}a}{\partial A}$?
563 Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail AtThere are 3 best solutions below
On
The problem was just modified. If there is b (as now), then the solution would be much simpler. Note that $$a^{\rm T}A^{-\rm T}b = b^{\rm T}A^{-1}a$$ since they are numbers and transposing one of them would give you the other. Hence from chain rule, $$\frac{\partial}{\partial A}(a^{\rm T}A^{-\rm T}bb^{\rm T}A^{-1}a)=2(b^{\rm T}A^{-1}a)\frac{\partial}{\partial A}(b^{\rm T}A^{-1}a)$$ Also note that when we take derivative with respect to $A$, both $a$ and $b$ are treated as constants. Then $$\frac{\partial}{\partial A}(b^{\rm T}A^{-1}a)=b^{\rm T}\frac{\partial A^{-1}}{\partial A}a$$ Finally it remains to calculate $\partial A^{-1}/\partial A$. From the identity $$AA^{-1} = I$$ taking derivative with respect to $A$, we obtain $$\frac{\partial}{\partial A}(AA^{-1})=IA^{-1}+A\frac{\partial A^{-1}}{\partial A}=0$$ Thus $$\frac{\partial A^{-1}}{\partial A}=-A^{-2}.$$
On
$
\def\l{\lambda}\def\o{{\tt1}}\def\p{\partial}
\def\A{A^{-1}}
\def\B{A^{-T}}
\def\L{\left}\def\R{\right}
\def\LR#1{\L(#1\R)}
\def\BR#1{\Big(#1\Big)}
\def\trace#1{\operatorname{Tr}\LR{#1}}
\def\qiq{\quad\implies\quad}
\def\grad#1#2{\frac{\p #1}{\p #2}}
\def\c#1{\color{red}{#1}}
$Use a colon to denote the Frobenius product, which is a concise notation for the trace, i.e.
$$\eqalign{
A:B &= \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij} \;=\; \trace{A^TB} \\
A:A &= \big\|A\big\|^2_F \\
}$$
This is also called the double-dot or double contraction product.
When applied to vectors $(n=\o)$ it reduces to the standard dot product.
The properties of the underlying trace function allow the terms in a Frobenius product to be rearranged in many different but equivalent ways, e.g. $$\eqalign{ A:B &= B:A \\ A:B &= A^T:B^T \\ C:AB &= CB^T:A = A^TC:B \\ }$$
Introduce the scalar variable $$\eqalign{ \l \;=\; {a^T\B b} \;=\; {b^T\A a} \;=\; {ba^T:\A} }$$ whose differential is $$\eqalign{ d\l &= {ba^T:\c{d\A}} \\ &= ba^T:\c{\LR{-\A\;dA\;\A}} \\ &= -\LR{\B ba^T\B}:dA \\ }$$
Use the above notation to write the function, then calculate its differential and gradient. $$\eqalign{ f &= \l^2 \\ df &= 2\l\;\c{d\l} \\ &= -2\l \c{\LR{\B ba^T\B}:dA} \\ \grad{f}{A} &= -2\l \LR{\B ba^T\B} \\ &= -2 \LR{b^T\A a} \LR{\B ba^T\B} \\\\ }$$
Hint
Name $\phi_1 : A \mapsto A^{-1}$, $\phi_2 : A \mapsto b^T A a$ and $\phi_3: A \mapsto A^T A$. Note that your map $\phi$ is $\phi = \phi_3 \circ \phi_2 \circ \phi_1$.
You can then use the chain rule $\phi^\prime = \phi_3^\prime \cdot \phi_2^\prime \cdot \phi_1^\prime$, based on $\phi_1^\prime(A).H =-A^{-1}HA^{-1}$, $\phi_2^\prime(A).H = b^T H a$ and $\phi_3^\prime(A).H = 2A^T H$.
You’ll finally get:
$$\frac{\partial \phi}{\partial A}.H = -2 (b^TA^{-1}a)^Tb^TA^{-1}HA^{-1}a =-2a^T\left(A^{-1}\right)^T bb^T A^{-1}HA^{-1}a$$