Gradient of $\mbox{tr} \left (X^T X \right)$

160 Views Asked by At

My goal is to compute

$$\frac{\mathrm{d} \operatorname{tr}\left(\mathbf{X}^{T} \mathbf{X}\right)}{\mathrm{d} \mathbf{X}}$$

Following the common way of approaching vector/matrix differentiation, I performed. entry-wise differentation as follows.

$$ d_{i j}=\frac{\partial \operatorname{tr}\left(\mathbf{X}^{T} \mathbf{X}\right)}{\partial x_{ij}}=\frac{\partial \sum_{k, l} x_{k l}^{2}}{\partial x_{ij}}=2 x_{ij} \rightarrow D=2X $$

However, the answer is $D=2X^T$ with following computation.

$$ d_{i j}=\frac{\partial \operatorname{tr}\left(\mathbf{X}^{T} \mathbf{X}\right)}{\partial x_{j i}}=\frac{\partial \sum_{k, l} x_{k l}^{2}}{\partial x_{j i}}=2 x_{j i} \rightarrow D=2X^T $$

I still can't understand why $d_{ij} = \frac{\cdot}{\partial x_{ji}}$ where order of $i$ and $j$ is switched.

Can anyone help me to understand the reason and know-how not to make a mistake afterward?

3

There are 3 best solutions below

2
On

Observe that matrices are a vector space with inner product $A:B:=A_{ij}B_{ij}=tr(A^TB)$.

Let $L(A)=tr(A^TA)$ then compute \begin{align} L(A+\delta B) &= tr((A+\delta B)^T(A+\delta B))\\ &=tr(A^TA)+\delta tr(B^TA)+\delta tr(A^TB)+\delta^2 tr(B^TB) \\ \end{align} Derive respect $\delta$ and evaluate by $\delta=0$ \begin{align} \frac{dL}{dA}:B:&=\frac{dL(A+\delta B)}{d\delta}\Big|_{\delta=0} = tr(B^TA)+tr(A^TB) \\&= B_{ij}A_{ij}+A_{ij}B_{ij} = 2A_{ij}B_{ij} = 2tr(A^TB) = 2A:B \quad \text{for all $B$} \end{align} Hence, $\frac{dL}{dA}=2A.$

0
On

Getting all the indices right for multidimensional derivatives can be tricky. Letting $D= \frac{d}{d\mathbf{X}} \operatorname{tr}(\mathbf{X}^T\mathbf{X})$, then the rule using the "denominator layout" is that $d_{ij}= \frac{d}{d x_{ij}} \operatorname{tr}(\mathbf{X}^T\mathbf{X})$. This means that your second computation is for $d_{ji}$, not $d_{ij}$, so you do indeed get $D=2\mathbf{X}$ both times.

0
On

Let scalar field $f : \Bbb R^{m \times n} \to \Bbb R_0^+$ be defined by

$$f ({\bf X}) := \mbox{tr} \left( {\bf X}^\top {\bf X} \right) =: \| {\bf X} \|_{\text{F}}^2$$

The directional derivative of $f$ at $\bf X$ in the direction of $\bf V$ is

$$\lim_\limits{h \to 0} \frac{f ({\bf X} + h {\bf V}) - f ({\bf X})}{h} = \dots = \langle {\bf V} , {\bf X} \rangle + \langle {\bf X} , {\bf V} \rangle = \langle {\bf V} , \color{blue}{2 {\bf X}} \rangle$$

where $\langle \cdot, \cdot \rangle$ denotes the Frobenius inner product. Thus, the gradient of $f$ is

$$\nabla_{{\bf X}} f({\bf X}) = \color{blue}{2 {\bf X}}$$