Conversion from differential to derivative for trace of a matrix

Question

Conversion from differential to derivative for trace of a matrix

906 Views Asked by Bumbble Comm At 10 May 2026 - 6:43

I am studying the matrix calculus page written in wikipedia and I have a question. In the table ' $\text{Identities: scalar-by-matrix}\frac{\partial y}{\partial \mathbf{X}}$ ' , it is shown :

(1) $\frac{\partial \operatorname{tr}(\mathbf{AX})}{\partial \mathbf{X}} = \frac{\partial \operatorname{tr}(\mathbf{XA})}{\partial \mathbf{X}} =\mathbf{A}^\top$ This can be easily proved by showing: $\frac{\partial}{\partial \mathbf{X_{ij}}} \operatorname{tr}(\mathbf{AX}) = \frac{\partial}{\partial \mathbf{X_{ij}}} (\sum_r \sum_s a_{rs}x_{sr}) =a_{ji} $

(2)In another section 'Conversion from differential to derivative form' it is shown the canonical form $dy = \operatorname{tr}(\mathbf{A}\,d\mathbf{X}) $ is equivalent to differential form$\frac{dy}{d\mathbf{X}} = \mathbf{A}$.

Can anyone help please to understand the this conversion? How this conversion is related to to formula (1)? I thought if they are related then the $\frac{dy}{d\mathbf{X}} = \mathbf{A}^\top$.

I found this proof from the following note.

but I don't understand what happen in the last part ( It is shown inside the red box). I assumed it will be $a_{ji}$. Why the derivative is written with respect to $x_{ji}$.

I would appreciate any insights on this.

Thank you.

Original Q&A

There are 1 best solutions below

**user91684** · Accepted Answer

Let $f:X\in M_n\rightarrow tr(AX)$. Your first equality concerns the gradient of $f$. The derivative of $f$ in $A$ is the following linear application:

$Df_A:H\in M_n\rightarrow tr(AH)\in \mathbb{R}$ (that is $Df_A=f$, because $f$ is linear!).

Note that $tr(AH)=<A^T,H>$ (the scalar product on $M_n$). The gradient is defined by duality: $<\nabla(f)(A),H>=Df_A(H)$, that is $\nabla(f)(A)=A^T$. In particular $\dfrac{\partial f}{\partial X_{i,j}}(A)=tr(AE_{i,j})=a_{j,i}=\nabla(f)(A)_{i,j}$.

Now, the matrix associated to the derivative $Df_A$ (or the differential) is a line with $n^2$ elements.

For example, if $n=2$: $tr(AX)=a_{1,1}x_{1,1}+a_{1,2} x_{2,1}+a_{2,1}x_{1,2}+a_{2,2}x_{2,2}$. If we stack the matrix $X$ column by column: $[x_{1,1},x_{2,1},x_{1,2},x_{2,2}]$, then we obtain for the derivative: $[a_{1,1},a_{1,2},a_{2,1},a_{2,2}]$; if we stack$^{-1}$ this last vector row by row, then we obtain $A$; yet, if we stack$^{-1}$ it column by column, then we obtain $A^T$. You choose the convention you want.

The author confuses with the case: $f:x\in\mathbb{R}^n\rightarrow \mathbb{R}$. The $1\times n$ matrix associated to $Df_A$ is $U=[\dfrac{\partial f}{\partial x_{1}},\cdots,\dfrac{\partial f_i}{\partial x_{n}}]$. The gradient of $f$ is the vector $V=[\dfrac{\partial f}{\partial x_{1}},\cdots,\dfrac{\partial f}{\partial x_{n}}]^T$ because (for $h\in\mathbb{R}^n$) $<V,h>=tr(V^Th)=V^Th$ and $Df(h)=Uh$, and then $V=U^T$.

EDIT. Answer to Crimson. The formula (1), given the gradient, is correct.

Rigorously, the formula (2) is not correct. Indeed $Df_A(H)=tr(AH)=tr((A\otimes I_n)(H))$ when we stack the matrices row by row (cf. https://en.wikipedia.org/wiki/Kronecker_product). Thus $Df_A$ is the composition $tr\circ (A\otimes I_n)$. If we stack column by column the variable and we stack$^{-1}$ row by row the image, then we find the matrix $A$. Clearly, it's not a fine dining!

Conversion from differential to derivative for trace of a matrix

There are 1 best solutions below

Related Questions in MATRICES

Related Questions in DERIVATIVES

Related Questions in MATRIX-CALCULUS

Related Questions in TRACE

Related Questions in CANONICAL-TRANSFORMATION

Trending Questions

Popular # Hahtags

Popular Questions