R_2 score for non-linear models

44 Views Asked by At

I read this tutorial today about why we should not use $R^2$ score to evaluate the non-linear models.

I think the key reason for the conclusion is the fact "SS Regression + SS Error = SS Total." However, I don't see why this property holds for general linear models. I may have misunderstood the equation here. But here's my disproof:

$$ SS_{total} = \sum\limits_i(y_i - \bar y)^2 $$

$$ SS_{reg} = \sum\limits_i(f_i - \bar y)^2 $$ $$ SS_{error} = \sum\limits_i(y_i - f_i)^2 $$

Suppose $SS_{reg} + SS_{error} = SS_{total}$ is indeed true, we should have $$\sum\limits_i(f_i - \bar y)^2 + \sum\limits_i(y_i - f_i)^2 = \sum\limits_i(y_i - \bar y)^2 \iff $$

$$ \sum\limits_i(f_i^2 + \bar y^2 - 2f_i \bar y) + \sum\limits_i(y_i^2 + f_i^2 - 2f_i y_i) = \sum\limits_i(y_i^2 + \bar y^2 - 2y_i \bar y) \iff $$

$$ (\sum\limits_i f_i^2 + \sum\limits_i \bar y^2 - 2 \sum\limits_i f_i \bar y) + (\sum\limits_i y_i^2 + \sum\limits_i f_i^2 - 2\sum\limits_i f_i y_i) = (\sum\limits_i y_i^2 + \sum\limits_i \bar y^2 - 2\sum\limits_i y_i \bar y) \iff $$

$$ \sum\limits_i f_i^2 - \sum\limits_i f_i \bar y - \sum\limits_i f_i y_i = - \sum\limits_i y_i \bar y \iff $$

$$ \sum\limits_i f_i (f_i - \bar y) = \sum\limits_i y_i (f_i- \bar y) $$

which does not hold in general.

1

There are 1 best solutions below

0
On

So my above proof is correct. "SS Regression + SS Error = SS Total." only holds for simple linear regression, with an additional assumption that the we are using ordinary least squares to minimize the residuals.

A detailed proof can be found on Wikipedia.