Linear Regression.

72 Views Asked by At

I have been going through the book called elements of statistical learning, in which I came across the given below equation for the linear regression solution. $$RSS(β)=(Y−Xβ)^T(Y−Xβ)$$ on expanding we have $$RSS(β)=Y^TY−β^TX^TY−Y^TXβ+β^TX^TXβ$$.

now we have $$Y^TXβ=(Y^TXβ)^T=β^TX^TY$$. since $Y^TXβ$ is scalar.

Hence we have $$RSS(β)=Y^TY−2β^TX^TY+β^TX^TXβ$$ or $$RSS(β)=Y^TY−2Y^TXβ+β^TX^TXβ$$

now they say that they are diff $RSS(β)$ wrt to $β$ which yields $−2X^T(Y−Xβ)$, how is this differentiation achieved. In the sense how can you differentiate with respect to matrix to achieve this. Thanks in advance, I have been breaking my head over this. Could you please help me with the sources where I might get some clue with respect to this operation.