Coefficient of determination is always 1 - high values

80 Views Asked by At

I've got some measurement on my abscissa reaching from about 7500 to 10300. On my ordinate my measurements reach from 10 to 90. Now, I'm doing linear regression and I'm also calculating the coefficient of determination, please see the picture.

linear regresssion

My question: Coefficient of determination is always 1, which is quite strange, because obviously you can see some outliners. Is this the case because I've got some very high values on my abscissa compared to the ordinate? Does it make sense to scale the abscissa values down, but how?

Sorry for my bad english! Thank you very much in advance! :)

2

There are 2 best solutions below

2
On BEST ANSWER

You nicely sent me the data $(x_i,y_i)$ you used for the regression.

In order to work with exact arithmetics, I defined $$X_i=10^{-6}\times{\text{Round}[10^6\, x_i]}\qquad Y_i=10^{-6}\times{\text{Round}[10^6\, y_i]}$$ which makes all numbers to be rational.

Doing it, the model is $$y=-\frac{345121169983}{1961628980}+\frac{62038153 }{2452036225}x$$

Using the formulae in Michael Hardy's answer, I obtained $$R^2=\frac{61579718842422544}{62758163406024421}\approx 0.981222$$ which is exactly what the linear correlation of the $(x_i,y_i)$ data gives.

I suspect that there is something wrong is the formula you use.

0
On

The coefficient of determination is $$ R^2 \quad = \quad 1 - \frac{\sum_{i=1}^n (y_i - \widehat {y_i})^2}{\sum_{i=1}^n (y_i - \bar y)^2} \quad = \quad \frac{ \sum_{i=1}^n (\widehat{y_i} - \bar y)^2 }{\sum_{i=1}^n (y_i-\bar y)^2}. $$ where $\widehat{y_i}$ is the $i$th fitted value, i.e. the value that the least squares line predicts for the corresponding $x$ value, and $\bar y$ is the average $y$ value. The coefficient of determination tells you what proportion of the variability in $y$ is "explained" by the variability of $x$.

The coefficient of determination is the square of the correlation.

Look at the illustrations in this answer, and you'll see that the scatterplot you show should have a correlation close to $1$; hence a coefficient of determination close to $1$.