Matrix gradient differentiation calculation

54 Views Asked by At

I came across this matrix calculus calculation:

$$\begin{align} &\nabla_\mathbf{w} \left( \mathbf{w}^T \mathbf{X}^{\text{(train)}T} \mathbf{X}^{\text{(train)}}\mathbf{w} - 2\mathbf{w}^T \mathbf{X}^{\text{(train)}T} \mathbf{y}^{\text{(train)}} + \mathbf{y}^{\text{(train)}T}\mathbf{y}^{\text{(train)}} \right) = 0 \\ &\Rightarrow 2\mathbf{X}^{\text{(train)}T} \mathbf{X}^{\text{(train)}} \mathbf{w} - 2\mathbf{X}^{\text{(train)}T} \mathbf{y}^{\text{(train)}} = 0 \end{align}$$

I'm having difficulty understanding how the authors performed the differentiation to $\mathbf{w}^T \mathbf{X}^{\text{(train)}T} \mathbf{X}^{\text{(train)}}\mathbf{w}$ of the first equation to get $2\mathbf{X}^{\text{(train)}T} \mathbf{X}^{\text{(train)}} \mathbf{w}$ of the second equation. In particular, I'm unsure of how to treat the transposed term $\mathbf{w}^T$. Can someone please explain this? Thank you.

1

There are 1 best solutions below

8
On BEST ANSWER

Let $A$ be a symmetric matrix.

Let's consider the derivative of $w^TAw=\sum_{ij}w_iA_{ij}w_j$.

Differentiating with respect to $w_i$, we have $2\sum_{j}A_{ij}w_j$.

To see an example, suppose we want to differentiate with respect to $w_1$ where $n=2$

$$A_{11}w_1^2+A_{12}w_1w_2+A_{21}w_1w_2+A_{22}w_2^2$$

The partial derivative with respect to $w_1$ is

$$2A_{11}w_1 + A_{12}w_2+A_{21}w_2=2A_{11}w_1+(A_{12}+A_{21})w_2=2(A_{11}w_1+A_{12}w_2)$$

Similarly, the partial derivative with respect to $w_2$ is

$$ A_{12}w_1+A_{21}w_1+2A_{22}w_2=(A_{12}+A_{21})w_1+ 2A_{22}w_2=2(A_{21}w_1+A_{22}w_2)$$

That is we have $$\nabla_w(w^TAw)=2Aw$$