Naive question re. normal equation for linear regression

47 Views Asked by At

The typical normal equation for linear regression is $\theta=(X^TX)^{−1} X^T Y$ such that the gradient of $J(\theta)$ is zero. Why does $X^{-1} Y$ not work? What are the numerical reasons for this?

2

There are 2 best solutions below

0
On

$X$ might not be invertible. It might not even be square for that matter. The normal equation works if $X$ is non-invertible, and if $X$ is invertible: $(X^TX)^{-1}X^TY = X^{-1}X^{-T}X^TY = X^{-1}IY = X^{-1}Y$

0
On

Please count the dimensions. $X$ is, by the nature of regression, a matrix that has much more rows than columns, there is no inverse for general rectangular matrices.

You can use a QR decomposition of $X$, then $\|Xθ-Y\|=\|QRθ-Y\|=\|Rθ-Q^TY\|$, and the last form can be trivially minimized by solving the triangular system at the top and disregarding all the zero rows of $R$.