Ordinary Least Squares Derivative

110 Views Asked by At

I have been trying to follow the derivation of the normal equations, but there is one part I do not understand.

So, if we minimize

$L(\mathbf{b})=\mathbf{y}^T\mathbf{y}-(2\mathbf{y}^T\mathbf{X})\mathbf{b}+\mathbf{b}^T(\mathbf{X}^T\mathbf{X})\mathbf{b}$

then $\frac{\delta L(\mathbf{b})}{\delta \mathbf{b}}= \mathbf{0}-2\mathbf{X}^T\mathbf{y}+2(\mathbf{X}^T\mathbf{X})\mathbf{b}$

I would have thought $(2\mathbf{y}^T\mathbf{X})\mathbf{b}$ simply becomes $(2\mathbf{y}^T\mathbf{X})$. But apparently it does not, and I cannot find the full derivation anywhere. I'd be very grateful for an explanation.

2

There are 2 best solutions below

0
On BEST ANSWER

There are two ways of writing it - in either way you must make sure you are consistent with where the index of your derivative goes.$$\frac{dL}{db_p}=\frac{d}{db_p}\left( y_jy_j-2y_iX_{ij}b_j+b_i X_{ki}X_{kj}b_j \right)=0-2y_iX_{ip}+X_{kp}X_{kj}b_j+b_iX_{ki}X_{kp}\\=-2y_iX_{ip}+2X_{ki}X_{kp}b_i$$This can be written in one of two ways: $\left[-2y^TX+2b^TX^TX \right]_p$ or $\left[-2X^Ty+2X^TXb \right]_p$. The former is in the form of a row vector, the latter is a column vector. You probably want your answer to be a column vector, so you go for $$\frac{dL}{d\vec b}=-2X^T\vec y+2X^TX\vec b$$

3
On

(Since everything I'm gong to talk about would be bold, I'm not going to bother.)

Did you try the $2 \times 2$ version to get some insight? $$ y = \begin{pmatrix}y_1 \\ y_2 \end{pmatrix}, X = \begin{pmatrix} x_{11} & x_{12} \\ x_{21} & x_{22}\end{pmatrix} \text{.} $$ So, $$2 X^T y = \begin{pmatrix}2x_{11}y_1 + 2x_{21}y_2 \\ 2x_{12}y_1 + 2x_{22}y_2 \end{pmatrix}$$ and $$2 y^T X = \begin{pmatrix}2x_{11}y_1 + 2x_{21}y_2 & 2x_{12}y_1 + 2x_{22}y_2 \end{pmatrix} \text{.}$$ They're the same thing, up to transposition.