Derivative of a trace w.r.t to matrix's elements

251 Views Asked by At

I am checking out the Matrix Cookbook, and I am puzzled by the following:

Assume $F(X)$ to be a differentiable function of each of the elements of $X$. It then holds that $$\dfrac{\partial\operatorname{Tr}(F(X))}{\partial X} = f(X)^T,$$ where $f(\cdot)$ is the scalar derivative of $F(\cdot)$.

However, I fail to grasp how does it make any sense. For instance, let's assume that we have the following matrix function: $$F(x) = \begin{pmatrix} 0 & x \\ 0 & 0 \\ \end{pmatrix}$$ Its trace is zero $\forall\,x \in\mathbb{R}$. Furthermore, as $\operatorname{Tr}(F(x))\colon \mathbb{R}^2\times\mathbb{R}^2 \to \mathbb{R}$, I would expect the derivative of $\operatorname{Tr}(F(x))$ to be the same. But according to the equation given above, it is $\begin{pmatrix} 0 & 0 \\ 1 & 0\end{pmatrix}$ .

1

There are 1 best solutions below

0
On BEST ANSWER

Part of problem is the confusion of the matrix $X$ with the scalar parameter $x$. If we use $\alpha$ for the scalar parameter, then $$X = \left[\begin{array}{ccc} 0 & \alpha \\ 0 & 0 \\ \end{array}\right] = N\alpha$$ where $N$ is a $2\times 2\,$ nilpotent matrix.

As concrete examples of the functions, let's use $$\eqalign{ F(X) &= X^2 \cr f(X) &= 2X \cr }$$ Then what the Cookbook is saying is $$\eqalign{ \frac{\partial\,{\rm tr}(X^2)}{\partial X} &= 2X^T \cr }$$ Which is different than saying that $$\eqalign{ \frac{\partial\,{\rm tr}(X^2)}{\partial\alpha} &= 2X^T \cr }$$ since the LHS is a scalar quantity while the RHS is a matrix.

To find an expression for the gradient with respect to $\alpha$, we can use the Cookbook result to write down the differential, and then perform a change of variables $$\eqalign{ d\,{\rm tr}(X^2) &= 2X^T:dX \cr &= 2\alpha N^T:N\,d\alpha \cr\cr \frac{\partial {\rm tr}(X^2)}{\partial\alpha} &= 2\alpha N^T:N \cr &= 2\alpha\,{\rm tr}(N^2) \cr &= 0 \cr }$$ since $N$ is nilpotent.

Note that colons were used in some of the steps above to denote the trace/Frobenius product, i.e. $$A:B = {\rm tr}(A^TB)$$