On the definition of the adjusted R-squared

51 Views Asked by At

Something I don't understand in regression models:

Following the definition of $R^2$, $$R^2=1-\frac{SS_{error}}{SS_{total}}$$ the adjusted R-squared is set as: $$\bar{R}^2=1-\frac{SS_{error}/df_{error}}{SS_{total}/df_{total}}$$ Now, as we have the equivalent definition of $R^2$, $$R^2=\frac{SS_{regression}}{SS_{total}}$$ why don't we use an adjusted R-squared set as: $$\bar{R}^2=\frac{SS_{regression}/df_{regression}}{SS_{total}/df_{total}}$$ A subsidiary question is: is there really equivalence between the two expressions of $R^2$ ? Or should one be preferred to the other? (I didn't find any paper showing who introduced first the $R^2$ and in which form)

1

There are 1 best solutions below

0
On

Your formula wouldn’t work, because (using your terminology) you’d have:

$\overline{R}^2 = \frac{SS_{regression}/df_{regression}}{SS_{total}/df_{total}} = \left(1- \frac{SS_{error}}{SS_{total}} \right) \frac{df_{total}}{df_{regression}}$

For a simple linear regression with many data points, this would potentially be much larger than 1.

For your subsidiary question, I’m assuming that $SS_{regression}$ refers to what I’ve heard called the explained sum of squares (ESS), and $SS_{error}$ refers to the residual sum of squares (RSS), in which case the equivalences of the formulas comes from TSS = ESS + RSS for a simple linear regression.

There are other formulas for adjusted r-squared, for example:

https://stats.stackexchange.com/questions/48703/what-is-the-adjusted-r-squared-formula-in-lm-in-r-and-how-should-it-be-interpret