What is the essence of Least Square Regression?

164 Views Asked by At

I have attempted to explain why Least Square Regression is "leagal" in many ways. It is the outcome of MLE, if errors are normal random variables.

I have also find that MLE is closed related to orthogonal projections. (Square is related to modulus!) However, I have already forgot how to explain Least Square Regression in terms of orthogonal projections. Can anyone explain that?

Also, are there other ways to explain why Least Square is appropriate?

1

There are 1 best solutions below

2
On

The regression problem is basically a problem of finding the "best" solution of over-determined system. I.e., you have $n$ equations with $p+1$ coefficients. Assuming that the sample size $n$ is much larger then the number of explanatory variables. Then for every $p+1$ equations you will get different set of $\beta$s. So, instead of finding the coefficients w.r.t. $x_j$ for every $p+1$ of observations, you are trying to find the coefficients that will be the "closest" to the best $\beta$ for all the observations. I.e., instead of expressing the dependent variable $y$ with the columns of $X$ (columns of $X = C(X)$), you are searching for the closest vector to $y$ in $\operatorname{sp} \{C(X)\}$, which is the OLS fitted values. As such, the "closest" vector to $y$ in $\operatorname{sp}\{C(X)\}$ is the orthogonal projection of $y$ onto $C(X)$.

Recall that the OLS (estimated) coefficients are $$ \hat{\beta} = (X^TX)^{-1}X^Ty. $$ Thus, the fitted $y$, which denoted by $\hat{y}$, is given by $$ \hat{y} = X\hat{\beta} = X(X'X)^{-1}X'y=Hy. $$ Where $H$ called the "hat matrix" is the orthogonal projection onto $C(X)$. Note that $C(X)$ span an affine space, thus the OLS coefficients are the coordinates of the projected $y$ w.r.t to the columns of $X$.