Question about Matrix Derivative Rule

102 Views Asked by At

Let ∇A(x) denote the derivative of X with respect to the matrix A. Let X^T denote the transpose of matrix X. Then the following two rules hold.

1) ∇A (trace of AB) = B^T

2) ∇A (trace of AB A^T C) = CAB + C^T A B^T

While both rules are mathematically correct, I was wondering why they both hold.

For instance, from 1), we can say that

∇A (trace of AB A^T C) = ∇A (trace of A (B A^T C) ) = (B A^T C)^T = C^T A B^T

However, the answer is CAB + C^T A B^T
not C^T A B^T

Is there something wrong with the way I calculated it? I just used the rule 1.

1

There are 1 best solutions below

0
On BEST ANSWER

Short answer: You must use the first expression on each of the two occurrences of $A$ in the second expression.

An approach from first principles is to write the differential form of the second formula $$\eqalign{ d\,{\rm tr}(ABA^TC) &= {\rm tr}(dA\,BA^TC) + {\rm tr}(AB\,dA^T\,C) \cr &= {\rm tr}(dA^T\,C^TAB^T) + {\rm tr}(dA^T\,CAB) \cr }$$ where the second line utilizes the transpositional and cyclic properties of the trace.

From the differential, the gradient is seen to be $$\eqalign{ \nabla_A\,{\rm tr}(ABA^TC) &= C^TAB^T + CAB \cr\cr }$$