The Gradient of a Matrix Product When the Variable Matrix Is Not at the Tight Most

137 Views Asked by At

A,B,C are all matrix, how to calculate the gradient of $f(A)$ $$ f(A) = ABC $$ and $$ g(A) = BAC $$

I know that when the unknown matirx is at the right most the gradient of $$ f(A) = BCA $$ is $$ \nabla f(A)=(BC)^T $$ but I cannot figure out how to get the gradient of the situation I listed above.

2

There are 2 best solutions below

0
On

If suppose f is differentiable then gradient of a function at a point is same as total differentiation at that point for any direction.

Here $DF(A)(H)=HBC$

and hence $\nabla f(A)(H)=HBC.$

0
On

We assume that the matrices have dimension $n\times n$.

The OP is wrong. If $\phi(A)=tr(BCA)=tr(ABC)$, then $D\phi_A(H)=tr(HBC)=<H,(BC)^T>$ (using the scalar product) and, by duality, $\nabla(\phi)(A)=(BC)^T$.

As Rodrigo wrote, you first function $f$ has values in $\mathbb{R}^{n^2}$ and not in $\mathbb{R}$; as Gobinda wrote, $Df_A(H)=HBC$ but $f$ has $n^2$ gradient functions (it is a tensor).

To see that, it suffices to consider the coordinate-functions $f_{i,j}(A)=(ABC)_{i,j}=tr(e_i^TABCe_j)=tr(BCe_je_i^TA)$. $D{f_{i,j}}_A(H)=tr(BCe_je_i^TH)$ and

$\nabla(f_{i,j})(A)=(BCe_je_i^T)^T=e_ie_j^T(BC)^T=E_{i,j}(BC)^T$.

In the same way, if $f(A)=BCA$, then $Df_A(H)=BCH$ and

the tensor is $\nabla(f_{i,j})(A)=(BC)^TE_{i,j}$.