Distinction between correlation coefficient and coefficient of determination

1k Views Asked by At

In my stats class, I am learning about correlation coefficient and coefficient of determination.

I dont understand what the difference is between them. there are $r,\,$ $r^2$ and $R^2$.

$r^2$ and $R^2$ are both coefficient of determination but what is the difference?

Also, when do you use them? Any help would be much appreciated.

1

There are 1 best solutions below

0
On

Correlation. In bivariate data $(x, y),$ the sample correlation $r$ estimates the population correlation $\rho$. The sample correlation measures the linear component of association. Positive values of $r$ indicate the extent to which $y$ increases linearly as $x$ increases. If all $x$'s are $0$ or all $y$'s are $0,$ the sample correlation $r$ is undefined, otherwise $-1 \le r \le 1.$

Consider the following data in R statistical software. (R is the name of the software, and has nothing to do with correlation.) Each of the variables $y_1, y_2,$ and $y_3$ is defined as a function of $x$, but only $y_1$ has a perfect linear association with $x,$ and so it is the only variable for which there is correlation $r = 1$ with $x$.

 x = 0:10;  y1 = x + 3;  y2 = x^2;  y3 = (x - 5)^2 
 cbind(x, y1, y2, y3)
 ##       x y1  y2 y3
 ##       0  3   0 25
 ##       1  4   1 16
 ##       2  5   4  9
 ##       3  6   9  4
 ##       4  7  16  1
 ##       5  8  25  0
 ##       6  9  36  1
 ##       7 10  49  4
 ##       8 11  64  9
 ##       9 12  81 16
 ##      10 13 100 25
 cor(x,y1)
 ## 1          # perfect linear association
 cor(x,y2)
 ## 0.9631427  # functional association, 'nearly' linear
 cor(x,y3)
 ## 0          # functional association, no linear component

As computed above, the correlations in the three plots below (left to right) are: $r = 1, .96$ and $0.$

enter image description here

If we define $y4 = -3x + 2$,there is a perfect negative linear association. If plotted, the line will have a negative slope, and the correlation between $x$ and $y_4$ is $r = -1$.

If you want more illustrations of correlations for various degrees of linear association and of nonlinear association, see the start of the Wikipedia article on 'correlation and dependence'.

Coefficient of determination. As the notation indicates, the coefficient of determination $r^2$ is the square of the correlation. Thus, by algebra, $0 \leq r^2 \le 1.$

The coefficient of determination is often used in regression. Roughly speaking, it expresses the degree to which the variation in $y$ is 'explained' by the regression line of $y$ on $x$. Some software uses capital $R^2$ for the coefficient of determination. (Perhaps this is a hold-over from early computer days when computer terminals used only capital letters.) In Minitab you will see the label R-Sq which is just a way to print $r^2.$

Below is part of a printout of a simple linear regression problem I did earlier today. In this case the regression line is $\hat Y = 11.03 + 24.863x$, the correlation is $r = 0.954$ and the coefficient of determination is $r^2 = 0.910116.$ You can see how $r^2$ is shown as R-Sq at the end of the printout (rounded and as a percentage).

 MTB > print 'x'

 x
    4   2   5   7   1   3   4   5   2   6

 MTB > print 'y'

 y
   90    60   170   190    40    80   100   130    70   150

 MTB > regr 'y' 1 'x'

 Regression Analysis: y versus x 

 The regression equation is
 y = 11.0 + 24.9 x

 Predictor    Coef  SE Coef     T      P
 Constant    11.03    11.92  0.93  0.382
 x          24.863    2.772  8.97  0.000

 S = 15.8977   R-Sq = 91.0%