My goal is to compute
$$\frac{\mathrm{d} \operatorname{tr}\left(\mathbf{X}^{T} \mathbf{X}\right)}{\mathrm{d} \mathbf{X}}$$
Following the common way of approaching vector/matrix differentiation, I performed. entry-wise differentation as follows.
$$ d_{i j}=\frac{\partial \operatorname{tr}\left(\mathbf{X}^{T} \mathbf{X}\right)}{\partial x_{ij}}=\frac{\partial \sum_{k, l} x_{k l}^{2}}{\partial x_{ij}}=2 x_{ij} \rightarrow D=2X $$
However, the answer is $D=2X^T$ with following computation.
$$ d_{i j}=\frac{\partial \operatorname{tr}\left(\mathbf{X}^{T} \mathbf{X}\right)}{\partial x_{j i}}=\frac{\partial \sum_{k, l} x_{k l}^{2}}{\partial x_{j i}}=2 x_{j i} \rightarrow D=2X^T $$
I still can't understand why $d_{ij} = \frac{\cdot}{\partial x_{ji}}$ where order of $i$ and $j$ is switched.
Can anyone help me to understand the reason and know-how not to make a mistake afterward?
Observe that matrices are a vector space with inner product $A:B:=A_{ij}B_{ij}=tr(A^TB)$.
Let $L(A)=tr(A^TA)$ then compute \begin{align} L(A+\delta B) &= tr((A+\delta B)^T(A+\delta B))\\ &=tr(A^TA)+\delta tr(B^TA)+\delta tr(A^TB)+\delta^2 tr(B^TB) \\ \end{align} Derive respect $\delta$ and evaluate by $\delta=0$ \begin{align} \frac{dL}{dA}:B:&=\frac{dL(A+\delta B)}{d\delta}\Big|_{\delta=0} = tr(B^TA)+tr(A^TB) \\&= B_{ij}A_{ij}+A_{ij}B_{ij} = 2A_{ij}B_{ij} = 2tr(A^TB) = 2A:B \quad \text{for all $B$} \end{align} Hence, $\frac{dL}{dA}=2A.$