The typical normal equation for linear regression is $\theta=(X^TX)^{−1} X^T Y$ such that the gradient of $J(\theta)$ is zero. Why does $X^{-1} Y$ not work? What are the numerical reasons for this?
2026-03-27 16:20:12.1774628412
On
Naive question re. normal equation for linear regression
47 Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail At
2
There are 2 best solutions below
0
On
Please count the dimensions. $X$ is, by the nature of regression, a matrix that has much more rows than columns, there is no inverse for general rectangular matrices.
You can use a QR decomposition of $X$, then $\|Xθ-Y\|=\|QRθ-Y\|=\|Rθ-Q^TY\|$, and the last form can be trivially minimized by solving the triangular system at the top and disregarding all the zero rows of $R$.
$X$ might not be invertible. It might not even be square for that matter. The normal equation works if $X$ is non-invertible, and if $X$ is invertible: $(X^TX)^{-1}X^TY = X^{-1}X^{-T}X^TY = X^{-1}IY = X^{-1}Y$