I came across this matrix calculus calculation:
$$\begin{align} &\nabla_\mathbf{w} \left( \mathbf{w}^T \mathbf{X}^{\text{(train)}T} \mathbf{X}^{\text{(train)}}\mathbf{w} - 2\mathbf{w}^T \mathbf{X}^{\text{(train)}T} \mathbf{y}^{\text{(train)}} + \mathbf{y}^{\text{(train)}T}\mathbf{y}^{\text{(train)}} \right) = 0 \\ &\Rightarrow 2\mathbf{X}^{\text{(train)}T} \mathbf{X}^{\text{(train)}} \mathbf{w} - 2\mathbf{X}^{\text{(train)}T} \mathbf{y}^{\text{(train)}} = 0 \end{align}$$
I'm having difficulty understanding how the authors performed the differentiation to $\mathbf{w}^T \mathbf{X}^{\text{(train)}T} \mathbf{X}^{\text{(train)}}\mathbf{w}$ of the first equation to get $2\mathbf{X}^{\text{(train)}T} \mathbf{X}^{\text{(train)}} \mathbf{w}$ of the second equation. In particular, I'm unsure of how to treat the transposed term $\mathbf{w}^T$. Can someone please explain this? Thank you.
Let $A$ be a symmetric matrix.
Let's consider the derivative of $w^TAw=\sum_{ij}w_iA_{ij}w_j$.
Differentiating with respect to $w_i$, we have $2\sum_{j}A_{ij}w_j$.
To see an example, suppose we want to differentiate with respect to $w_1$ where $n=2$
$$A_{11}w_1^2+A_{12}w_1w_2+A_{21}w_1w_2+A_{22}w_2^2$$
The partial derivative with respect to $w_1$ is
$$2A_{11}w_1 + A_{12}w_2+A_{21}w_2=2A_{11}w_1+(A_{12}+A_{21})w_2=2(A_{11}w_1+A_{12}w_2)$$
Similarly, the partial derivative with respect to $w_2$ is
$$ A_{12}w_1+A_{21}w_1+2A_{22}w_2=(A_{12}+A_{21})w_1+ 2A_{22}w_2=2(A_{21}w_1+A_{22}w_2)$$
That is we have $$\nabla_w(w^TAw)=2Aw$$