Derviation of the gradient for $E(f_\vec{w}) = (X\vec{w} - \vec{y})^{T}\cdot (X\vec{w} - \vec{y})$.

72 Views Asked by At

I am tying to derive the following gradient $$\nabla_{\vec{w}}E(f_{\vec{w}}) = 2X^{T}X\vec{w} - 2X^{T}\vec{y}$$ from this formula $$E(f_\vec{w}) = (X\vec{w} - \vec{y})^{T}\cdot (X\vec{w} - \vec{y})$$

where $X$ is an $n \times d$ vandermonde matrix, $d$ is the degree of the polynomial and $\vec{w}$ is an $d \times 1$ matrix.

This is what I have tried to do, \begin{align*} \frac{\partial}{\partial \vec{w}} E(f_{\vec{w}}) &= \frac{\partial}{\partial \vec{w}}\Big((X\vec{w} - \vec{y})^{T}\cdot (X\vec{w} - \vec{y})\Big) \\ &= \frac{\partial}{\partial \vec{w}}\Big( \vec{w}^T X^TX\vec{w} - \vec{w}^T X^{T}\vec{y} - \vec{y}^TX\vec{w} + \vec{y}^T \vec{y}\Big) \\ &= 2X^TX\vec{w} - X^{T}\vec{y} - \underbrace{\vec{y}^TX}_{\text{$\stackrel{?}{=} X^{T}\vec{y}$}} \end{align*} Now, I am almost there, however, I cannot make any sense of the last term in the last equation. Is it true that $X^{T}\vec{y} = \vec{y}^TX$? If yes how? If not, what is wrong with the derivation?

1

There are 1 best solutions below

0
On BEST ANSWER

A scalar value equals its own transpose $$y^TXw = (y^TXw)^T = w^TX^Ty$$ Use this result to simplify the cost function before finding the gradient $$\eqalign{ E &= w^TX^TXw - 2w^TX^Ty + y^Ty \cr \frac{\partial E}{\partial w} &= 2X^TXw - 2X^Ty \cr }$$