How to compute derivative/gradient of matrices? E.g. $w^T X^T Xw - 2w^T X^T t +t^Tt$?
Intuitively $2w^TX^Tt$ looks like $2xy$ so the derivative would be $2X^Tt$.
But what about $w^TX^TXw$?
$X$ is $\mathbb{R}^{n \times (p+1)}$. $w,t$ are row/column vectors corresponding to "weights" and "predictors" in $w^T x=y$ sense.
The gradient is $\frac{\partial}{\partial w}$.
$w^TX^Tt = t^TXw$ is linear: its gradient is $t^TX$.
$w^T(X^TX)w$ is quadratic: its gradient is $2X^TXw$.
Edit: consider $f(w)=w^T A w$ for a symmetric matrix $A$. Then $$ f(u+v)-f(u) = (u+v)^TA(u+v) - u^Tau = 2u^TAv + v^TAv = 2u^TAv + o(|v|), $$ so $f'(u)v = 2u^TAv$.