Given $A$ and $B$ are matrix,I know the true answer of the derivative $\frac{\partial tr(ABA^{T})}{\partial A}=A(B+B^{T})$
However, I don't know why my solution is wrong?
Here is my solution:
" First we have, $\frac{\partial tr(ABA^{T})}{\partial A}=\frac{\partial tr(A^{T}AB)}{\partial A}$.
According to chain rule, $\frac{\partial tr(A^{T}AB)}{\partial A}=\frac{\partial tr(A^{T}AB)}{\partial (A^{T}A)}\cdot \frac{\partial A^{T}A}{\partial A}$
As $\frac{\partial tr(AB)}{\partial A}=B^{T}$, hence $\frac{\partial tr(A^{T}AB)}{\partial (A^{T}A)}=B^{T}$
And I know $\frac{\partial A^{T}A}{\partial A}$ is a supermatrix.
So, the results must not be $A(B+B^{T})$ "
Could anyone tell me which steps are wrong?
2026-03-30 06:09:20.1774850960
On
why my solution to $\frac{\partial tr(ABA^{T})}{\partial A}$ is wrong?
125 Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail At
2
There are 2 best solutions below
0
On
Let's write the expression in terms of the Frobenius Product for simplicity:
$$\text{Tr}(ABA^T) = AB:A$$
Then, we have:
\begin{equation} \begin{split} d\text{Tr}(ABA^T) & = d(AB):A + AB:dA \\ & = ((dA)B + A(dB)):A + AB:dA \\ & = (dA)B:A + AB:dA \\ & = A:(dA)B + AB:dA \\ & = AB^T:(dA) + AB:dA \\ & = (AB^T + AB):dA \\ & = A(B^T + B):dA \\ \end{split} \end{equation}
Finally, we get:
$$ \frac{\partial \text{Tr}(ABA^T)}{\partial A} = A(B + B^T)$$
In above, I used the following properties:
$$ A:BC = B^TA:C = AC^T:B$$
and
$$(A + B):C = A:C + B:C$$
One good way to find derivatives of scalar functions of matrices is by use of first order approximation. Let $f(X) = Tr(XBX^\top)$ be our function. Let $Z = X+\Delta X$. Then, $$f(Z) \approx f(X)+<\Delta X,\nabla f(X)>$$ where $<X,Y> = Tr(X^\top Y) = Tr(Y^\top X)$. Now, let's calculate $f(Z)$ directly. $$f(Z) = Tr((X+\Delta X)B(X+\Delta X)^\top)=Tr(XBX^\top)+Tr(XB\Delta X^\top)+Tr(\Delta XBX^\top)+Tr(\Delta XB\Delta X^\top).$$ Last term is quadratic in $\Delta X$ and can be ignored for our first order approximation. Hence, $$f(Z) \approx f(X) + Tr(\Delta X^\top X(B+B^\top)) = f(X)+<\Delta X,X(B+B^\top)>.$$ Comparing with the definition of first-order approximation above we conclude $\nabla f(X) = X(B+B^\top)$.