Gradient of $X \mapsto \mbox{Tr}(AX)$

1.2k Views Asked by At

I know that the gradient of $X \mapsto \mbox{Tr}(XA)$ is $A^T$. However, how does this change if we had a scenario where $A$ and $X$ are swapped. Is the gradient $X \mapsto \mbox{Tr}(AX)$ the same?

Also, how does this extend if we have more matrices? We can just assume everything before our "$X$" is $A$, correct? For example, $X \mapsto\mbox{Tr}\left(U^T V X\right)$. We can assume this is similar to the above where $U^TV$ is our "$A$" matrix, right?

1

There are 1 best solutions below

9
On

Theorem: ${\mathrm{d} f({X})= \text{trace}(M^T \mathrm{d} {X}) \iff \frac{\partial f}{\partial {X}} = M}$


In your case,

$$\mathrm d \ \text{trace}(AXB) = \text{trace}(\mathrm d (AX B)) = \text{trace}(A \ \mathrm d X\ B) = \text{trace}(B A \ \mathrm d X)$$ and thus we identify $(BA)^T = A^T B^T$ as the derivative.