Derivative of trace-based linear scalar field

101 Views Asked by At

I'm just getting into matrix calculus, so this question might be really easy for you guys out there. I'm trying to understand some of the simpler derivations in The Matrix Cookbook. I've been looking at derivatives of the trace predominantly. The following formulas are given:

$$ \frac{\delta}{\delta X} {\rm Tr}(XA) = A^T $$

$$ \frac{\delta}{\delta X} {\rm Tr}(AX^T) = A $$

I can't completely reproduce these example though. When I try to take the derivative with a small example manually, I always get $A$.

\begin{align} \frac{\delta}{\delta X} {\rm Tr}\left( \begin{bmatrix} x_1 & x_2 \\ x_3 & x_4 \end{bmatrix} \begin{bmatrix} a_1 & a_2 \\ a_3 & a_4 \end{bmatrix} \right) &= \frac{\delta}{\delta X} {\rm Tr}\left( \begin{bmatrix} a_1x_1 + a_3x_2 & a_2x_1 + a_4x_2 \\ a_1x_3 + a_3x_4 & a_2x_3 + a_4x_4 \end{bmatrix}\right) \\ &= {\rm Tr}\left( \begin{bmatrix} \begin{bmatrix} a_1 & a_2 \\ 0 & 0 \end{bmatrix} & \begin{bmatrix} a_3 & a_4 \\ 0 & 0 \end{bmatrix} \\ \begin{bmatrix} 0 & 0\\ a_1 & a_2 \end{bmatrix} & \begin{bmatrix} 0 & 0 \\ a_3 & a_4 \end{bmatrix} \end{bmatrix} \right)\\ &= \begin{bmatrix} a_1 & a_2 \\ a_3 & a_4 \end{bmatrix} \end{align}

\begin{align} \frac{\delta}{\delta X} {\rm Tr}\left( \begin{bmatrix} a_1 & a_2 \\ a_3 & a_4 \end{bmatrix} \begin{bmatrix} x_1 & x_3 \\ x_2 & x_4 \end{bmatrix} \right) &= \frac{\delta}{\delta X} {\rm Tr}\left( \begin{bmatrix} a_1x_1 + a_2x_2 & a_1x_3 + a_2x_4 \\ a_3x_1 + a_4x_2 & a_3x_3 + a_4x_4 \end{bmatrix}\right) \\ &= {\rm Tr}\left( \begin{bmatrix} \begin{bmatrix} a_1 & 0 \\ a_3 & 0 \end{bmatrix} & \begin{bmatrix} 0 & a_2 \\ 0 & a_4 \end{bmatrix} \\ \begin{bmatrix} a_1 & 0\\ a_3 & 0 \end{bmatrix} & \begin{bmatrix} 0 & a_2 \\ 0 & a_4 \end{bmatrix} \end{bmatrix} \right)\\ &= \begin{bmatrix} a_1 & a_2 \\ a_3 & a_4 \end{bmatrix} \end{align}

I'm sorry if the notation is wrong and thanks for your help.

2

There are 2 best solutions below

3
On BEST ANSWER

$\def\p{\partial} \def\m#1{ \left[\begin{array}{r}#1\end{array}\right] }$ Before you start differentiating, you must reduce the trace to a scalar expression.

So here's how you should perform the calculate for the first example $$\eqalign{ t &= {\rm Tr}(XA) \\&= a_1x_1+a_3x_2 + a_2x_3+a_4x_4 \\\\ \frac{\p t}{\p X} &= \m{ \frac{\p t}{\p x_1}&\frac{\p t}{\p x_2} \\ \frac{\p t}{\p x_3}&\frac{\p t}{\p x_4} } = \m{ a_1&a_3 \\ a_2&a_4 } = A^T }$$

0
On

Solving the problem for a small case isn't helpful at all. In general, the basic idea is to expand the product $XA$ in terms of the elements $X_{ij}$ and $A_{ij}$ of the matrices $X$ and $A$ . Clearly, it'll turn out to be a sum in terms of $X_{ij}$ and $A_{ij}$ for $i,j = 1,2,\cdots , n$ .

Let $F(X) = \text{tr}(XA)$ . Now, you can partially differentiate the whole thing w.r.t. $X_{ij}$ , and eventually you'll obtain only $A_{ji}$ . Therefore, the Jacobian matrix corresponding to the linear transformation $DF$ has $A_{ji}$ in its $(i,j)^\text{th}$ position, and this holds for every pair $(i,j)$ . Thus, $DF_X = A^T$ .


Note : If you want a clearly written proof, you may refer to this link. Thank you.