Derivation of normal equation for linear least squares in matrix form

3.6k Views Asked by At

The derivation can be found on wikipedia but it's not clear how each step follows.

We have $y=X\beta+\epsilon$, and want to minimize $\epsilon^2$. We write objective function as $S(\beta)=||y-X\beta||^2=y^Ty-y^TX\beta-\beta^TX^Ty+\beta^TX^TX\beta=y^Ty-2\beta X^Ty+\beta^TX^T X\beta $. This follows by a dimension argument, so we combine the two middle terms. Now I don't understand how the derivative is taken, since the derivation proceeds to partial derivative with respect of $\beta$, yielding $-X^Ty+X^T X\beta=0$

In the last step, what happened to the $2$? And why did $\beta^T$ disappear but the $\beta$ remain? I can guess that $-2X^Ty+2(X^tX)\beta=0$. But specifically how to take the partial derivative without respect to $\beta$ of $\beta^TX^TX \beta$?

1

There are 1 best solutions below

0
On

By Eq. 69 in the Matrix Cookbook (p. 10)

$\frac{\partial}{\partial\beta}(\beta^TX^Ty) = X^Ty.$

By Eq. 81 (p. 11)

$\frac{\partial}{\partial\beta}(\beta^TX^TX\beta) = (X^TX + (X^TX)^T)\beta = 2X^TX\beta.$

So you are right, there is a factor of 2:

$\frac{\partial}{\partial\beta}(y^Ty - 2\beta^TX^Ty + \beta^TX^TX\beta) = 0 - 2X^Ty + 2X^TX\beta.$