Why in Statistics do we use R-squared when Comparing Linear Models instead of Least Squares?

77 Views Asked by At

In Machine Learning we use a cost function such as least squared errors to evaluate how good the model is and if one model has a better score than the other, assuming that it does not overfit we choose said model.

But in statistics, R-squared seems to be favored in model selection and not Least Squares.

What's the point of R-squared/Adjusted R-squared when we have Least squared to measure performance in general?

What am I missing? or am I just confused?

2

There are 2 best solutions below

0
On

$R^2$ is generally used for linear regression, while $MSE$ can be for arbitrary functions in high dimensions... the general domain of neural nets, for instance.

0
On

What you call "least squares" and "R-squared" are equivalent approaches, at least before you start trying to do special things to them

  • a "least squares" approach is shorthand for minimising the sum of the squares of the residuals, i.e the sum of the squares of the differences between the actual values of the dependent variable and a model's predicted values: $\sum_i \left(y_i- \hat{y}_i\right)^2$

  • an "R-squared" approach is shorthand for maximising the proportion of the variance in the dependent variable that is predicted from the model; in linear regression this is $\frac{\sum_i \left(\hat{y}_i- \bar{y}\right)^2}{\sum_i \left(y_i- \bar{y}\right)^2}$ and is equal to $1- \frac{\sum_i \left(y_i- \hat{y}_i\right)^2}{\sum_i \left(y_i- \bar{y}\right)^2}$, with this latter expression being used more generally

Since $1$ and $\sum_i \left(y_i- \bar{y}\right)^2$ do not vary with the model, any least squares result maximises R-squared, and any result which maximises R-squared is a least squares result