gradient of trace$(ABA^TC)$ w.r.t a Matrix A.

2k Views Asked by At

With n-order Matrix A,B,C.I was trying to find $ \nabla_A trace(ABA^TC)$

This answer:Proof for the funky trace derivative : $d (\operatorname{trace} (ABA'C))$?

suggested: $$ \nabla_A \operatorname{trace}( ABA^{T}C ) = CAB + C^T AB^T $$

with a implication that $$\nabla_A AB = B^T$$

can somebody show me why?

I also have my own proof based on the clue(using the chain rule) from that link.

First, let $$ H(X,Y) = trace(XY^TC) \qquad\qquad (1) \\ f(A) = AB \qquad\qquad (2) \\ g(A) = trace(ABA^TC) \qquad\qquad(3) $$ $g(A)$ can be rewritten as: $$ g(A) = H(f(A),A)$$ we know the chain rule: $$ \nabla_A g(A) = \nabla_XH(X,Y)\cdot \nabla_Af(A)+\nabla_YH(X,Y)\cdot \nabla_AA $$ to simplify this equation, we need: $$ \nabla_A trace(AB) = B^T \qquad\qquad (4)\\ trace(AB) = trace(BA) \qquad\qquad (5)\\ \nabla_{A^T}f(A) = [\nabla_Af(A)]^T \qquad\qquad(6) $$ with (4), the first term $$ \nabla_XH(X,Y)\cdot \nabla_Af(A) $$ can be write as: $$ \nabla_X trace(XY^TC) \cdot \nabla_A f(A) = C^TY \cdot \nabla_A AB = C^TA \cdot \nabla_A AB $$ and with(5) the second term can be write as: $$ \nabla_YH(X,Y)\cdot = \nabla_Y trace(XY^TC) \cdot \\ =\nabla_Y trace(Y^TCX) $$ with (6): $$ \nabla_Y trace(Y^TCX) = [\nabla_{Y^T} trace(Y^TCX)]^T $$ with (4): $$ [\nabla_{Y^T} trace(Y^TCX)]^T = CX = CAB $$ now I get $$ \nabla_A ( ABA^{T}C ) = C^T A \cdot \nabla_A AB + CAB $$

but i'm not sure that $\nabla_A AB = B^T$, can somebody show me why? or give my another proof?

Thank you for your honest suggestions!

1

There are 1 best solutions below

5
On BEST ANSWER

The problem is much easier if you use the Frobenius Inner Product instead of the trace.

Write the objective function and find its differential $$\eqalign{ f &= {\rm tr}(ABA^TC) \cr &= I:ABA^TC \cr\cr df &= I:(dA)BA^TC + I:AB(dA^T)C \cr &= C^TAB^T:dA + B^TA^TC^T:dA^T \cr &= C^TAB^T:dA + CAB:dA \cr &= (C^TAB^T + CAB):dA \cr }$$where some of the expressions were rearranged using these mixed product rules $$\eqalign{ {\rm tr}(A^TBC) &= A:BC \cr &= AC^T:B \cr &= B^TA:C \cr &= A^T:(BC)^T \cr }$$which are derived from the cyclic property of the trace function.

Anyway, since $df=\big(\frac{\partial f}{\partial A}:dA\big),\,$ the gradient of the function must be $$\eqalign{ \frac{\partial f}{\partial A} &= C^TAB^T + CAB \cr }$$