How did they get the final result here?

56 Views Asked by Bumbble Comm At 26 Mar 2026 - 12:13

I am trying to understand the answer of this question. How do you get this?

$$\nabla_{\mathrm W}\left(\mbox{tr} \left( \mathrm W^{\top} \mathrm X^{\top} \mathrm X \mathrm W - \mathrm Y^{\top} \mathrm X \mathrm W - \mathrm W^{\top} \mathrm X^{\top} \mathrm Y + \mathrm Y^{\top} \mathrm Y \right)\right)$$ $$= 2 \, \mathrm X^{\top} \mathrm X \mathrm W - 2 \, \mathrm X^{\top} \mathrm Y$$

Specifically, I want to know what kind of magic happens to these:

$$-\mathrm Y^{\top} \mathrm X \mathrm W - \mathrm W^{\top} \mathrm X^{\top} \mathrm Y$$

Thank you so much.

Original Q&A

There are 3 best solutions below

user2468 On 13 Jun 2021 - 2:36

The trace of a matrix is equal to the trace of the transpose. So

$$\operatorname{tr}(Y^T XW+ W^TX^TY)= 2\operatorname{tr}(Y^TXW)$$ and

$$W \mapsto 2\operatorname{tr}(Y^TXW)$$ is linear so its derivative is equal to itself. See derivative in product in trace if required.

Bumbble Comm On 13 Jun 2021 - 2:46

And for the other part, $\operatorname{tr}(AB)=\operatorname{tr}(BA)$, so $$\operatorname{tr}(W^{\top}X^{\top} X W)=\operatorname{tr}(X^{\top} X W W^{\top})$$ and he seems to be differentiating inside the trace to take a constant out $$\nabla_W(\operatorname{tr}(X^{\top} X W W^{\top}))=X^{\top} X \nabla_W(\operatorname{tr}(W W^{\top}))=2X^{\top} X W$$ but I guess this needs some justification even if true.

Bumbble Comm On 13 Jun 2021 - 4:09

Alternative approach

The Frobenius product by a colon can be defined as \begin{align} {\rm Tr}\left( A^T B C \right) := A: BC \end{align}

We will use the cyclic property of trace, e.g., \begin{align} A: BCD = B^T A: CD = B^TAD^T: C \end{align}

To find the gradient, we will exploit differential. To this end, we can rewrite the problem at hand as \begin{align} f &:= {\rm Tr}\left( W^TX^TXW - Y^TXW - W^TX^TY + Y^TY \right) \\ &\equiv XW : XW - Y:XW - XW:Y + Y:Y \end{align}

Compute the differential and then gradient. \begin{align} df &= XdW : XW + XW : XdW - Y:XdW - Y:XdW \\ &= 2X^TXW : dW - X^TY:dW - X^TY:dW \\ &= \left(2X^TXW - 2X^TY\right):dW \end{align}

The gradient is \begin{align} \frac{\partial f}{\partial W} &= 2X^TXW - 2X^TY . \end{align}

How did they get the final result here?

There are 3 best solutions below

Related Questions in MATRICES

Related Questions in DERIVATIVES

Related Questions in MATRIX-CALCULUS

Related Questions in LEAST-SQUARES

Related Questions in SCALAR-FIELDS

Trending Questions

Popular # Hahtags

Popular Questions