Introduction to Statistical Learning, Chapter 3,Ex 3.7.7: Proofing equality of $ R^2=Cor^2$ in simple linear regression. Figuring out the algebra

313 Views Asked by At

I am a hobby mathematican without any formal training. Currently I am chewing through 'An Introduction to Statistical Learning', 1st Ed. (abbreviated: ISLR, https://www.statlearning.com/). Now I am stuck on an algebraic proof in the linear regression exercise chapter 3, specifically 3.7.7.

As this is a somewhat 'uphill' difficulty, I have not made much progress on the desired proof.

The excercise states:

"It is claimed in the text that in the case of simple linear regression of $Y$ onto $X$, the $ R^2 $ statistic (3.17) is equal to the square of the correlation between $X$ and $Y $(3.18). Prove that this is the case. For simplicity, you may assume that $ \bar{x}=\bar{y}=0 $".

I have come this far up to now:

Excercise 3.7.7:

Prove that $ R^2 = Cor^2 $

DEFINITIONS:

$ \bar{x} \equiv \frac{1}{n} \sum_{i=1}^n { x_i }, $

$ \bar{y} \equiv \frac{1}{n} \sum_{i=1}^n { y_i }, $

$ \hat{\beta}_1 = \frac { \sum_{i=1}^n{ (x_i - \bar{x}) (y_i - \bar{y}) } } { \sum_{i=1}^n{ (x_i - \bar{x})^2 } }, $

$ \hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x}, $

$ \hat{y}_i = \hat{\beta}_0 + \hat{\beta}_1 x_i = \hat{\beta}_0 + \frac { \sum_{i=1}^n{ (x_i - \bar{x}) (y_i - \bar{y}) } } { \sum_{i=1}^n{ (x_i - \bar{x})^2 } } x_i = \bar{y} - \frac { \sum_{i=1}^n{ (x_i - \bar{x}) (y_i - \bar{y}) } } { \sum_{i=1}^n{ (x_i - \bar{x})^2 } } \bar{x} + \frac { \sum_{i=1}^n{ (x_i - \bar{x}) (y_i - \bar{y}) } } { \sum_{i=1}^n{ (x_i - \bar{x})^2 } } x_i, $

$ RSS = \sum_{i=1}^n { (y_i - \hat{y}_i)^2 }, $

$ TSS = \sum_{i=1}^n { (y_i - \bar{y} )^2 }, $

$ R^2 = \frac{TSS - RSS}{TSS} = 1 - \frac{RSS}{TSS} = 1 - \frac { \sum_{i=1}^n { (y_i - \hat{y}_i)^2 } } { \sum_{i=1}^n { (y_i - \bar{y} )^2 } }, $

$ Cor = \frac { \sum_{i=1}^n { (x_i-\bar{x})(y_i-\bar{y}) } } { \sqrt{ \sum_{i=1}^n { (x_i-\bar{x})^2 } } \sqrt{ \sum_{i=1}^n { (y_i-\bar{y})^2 } } } = \frac { \sum_{i=1}^n { (x_i-\bar{x})(y_i-\bar{y}) } } { \sqrt{ \sum_{i=1}^n { (x_i-\bar{x})^2 } \sum_{i=1}^n { (y_i-\bar{y})^2 } } } $

$ Cor^2 = \left( \frac { \sum_{i=1}^n { (x_i-\bar{x})(y_i-\bar{y}) } } { \sqrt{ \sum_{i=1}^n { (x_i-\bar{x})^2 } \sum_{i=1}^n { (y_i-\bar{y})^2 } } } \right)^2 = \frac { \left( \sum_{i=1}^n { (x_i-\bar{x})(y_i-\bar{y}) } \right)^2 } { \left( \sqrt{ \sum_{i=1}^n { (x_i-\bar{x})^2 } \sum_{i=1}^n { (y_i-\bar{y})^2 } } \right)^2 } = \frac { \left( \sum_{i=1}^n { (x_i-\bar{x}) (y_i-\bar{y}) } \right)^2 } { \sum_{i=1}^n { (x_i-\bar{x})^2 } \sum_{i=1}^n { (y_i-\bar{y})^2 } } $

TO PROVE:

$ R^2 = Cor^2 $, with $ \bar{x} = \bar{y} = 0 $. Initial expansion:

$ \hat{y}_i = \bar{y} - \frac { \sum_{i=1}^n{ (x_i - \bar{x}) (y_i - \bar{y}) } } { \sum_{i=1}^n{ (x_i - \bar{x})^2 } } \bar{x} + \frac { \sum_{i=1}^n{ (x_i - \bar{x}) (y_i - \bar{y}) } } { \sum_{i=1}^n{ (x_i - \bar{x})^2 } } x_i = 0 - \frac { \sum_{i=1}^n{ (x_i - 0) (y_i - 0) } } { \sum_{i=1}^n{ (x_i - 0)^2 } } 0 + \frac { \sum_{i=1}^n{ (x_i - 0) (y_i - 0) } } { \sum_{i=1}^n{ (x_i - 0)^2 } } x_i = \frac { \sum_{i=1}^n{ x_i y_i } } { \sum_{i=1}^n{ x_i^2 } } x_i $

$ 1 - \frac { \sum_{i=1}^n { (y_i - \hat{y}_i)^2 } } { \sum_{i=1}^n { (y_i - \bar{y} )^2 } } = \frac { \left( \sum_{i=1}^n { (x_i-\bar{x})(y_i-\bar{y}) } \right)^2 } { \sum_{i=1}^n { (x_i-\bar{x})^2 } \sum_{i=1}^n { (y_i-\bar{y})^2 } }, $

$ 1 - \frac { \sum_{i=1}^n { (y_i - \hat{y}_i)^2 } } { \sum_{i=1}^n { (y_i - 0 )^2 } } = \frac { \left( \sum_{i=1}^n { (x_i-0)(y_i-0) } \right)^2 } { \sum_{i=1}^n { (x_i-0)^2 } \sum_{i=1}^n { (y_i-0)^2 } }, $

$ 1 - \frac { \sum_{i=1}^n { (y_i - \hat{y}_i)^2 } } { \sum_{i=1}^n { y_i^2 } } = \frac { \left( \sum_{i=1}^n { x_i y_i } \right)^2 } { \sum_{i=1}^n { x_i^2 } \sum_{i=1}^n { y_i^2 } }, $

$ \boxed{ 1 - \frac { \sum_{i=1}^n { \left( y_i - \frac { \sum_{i=1}^n{ x_i y_i } } { \sum_{i=1}^n{ x_i^2 } } x_i \right)^2 } } { \sum_{i=1}^n { y_i^2 } } = \frac { \left( \sum_{i=1}^n { x_i y_i } \right)^2 } { \sum_{i=1}^n { x_i^2 } \sum_{i=1}^n { y_i^2 } } } $

Ok, so now I want to prove the equality of the boxed formula. And that is were I start to stumble. I manage to make a couple of transformations, however I can only achieve a common denominator between the two. The top part of the vulgar fraction escapes me.

Here is what I have tried:

RHS equals to:

$ \frac { \left( \sum_{i=1}^n { x_i y_i } \right)^2 } { \sum_{i=1}^n { x_i^2 } \sum_{i=1}^n { y_i^2 } } $

Transformations for LHS:

T0: $ 1 - \frac { \sum_{i=1}^n { \left( y_i - \frac { \sum_{i=1}^n{ x_i y_i } } { \sum_{i=1}^n{ x_i^2 } } x_i \right)^2 } } { \sum_{i=1}^n { y_i^2 } } $

T1: $ \frac { \left( \sum_{i=1}^n { y_i^2 } \right) - \sum_{i=1}^n { \left( y_i - \frac { \sum_{i=1}^n{ x_i y_i } } { \sum_{i=1}^n{ x_i^2 } } x_i \right)^2 } } { \sum_{i=1}^n { y_i^2 } } $

T2: $ \frac { \left( \sum_{i=1}^n { y_i^2 } \right) - \sum_{i=1}^n { \left( y_i - \frac { x_i \sum_{i=1}^n{ x_i y_i } } { \sum_{i=1}^n{ x_i^2 } } \right)^2 } } { \sum_{i=1}^n { y_i^2 } } $

T3: $ \frac { \left( \sum_{i=1}^n { y_i^2 } \right) - \sum_{i=1}^n { \left( \frac { \left( y_i \sum_{i=1}^n{ x_i^2 } \right) - \left( x_i \sum_{i=1}^n{ x_i y_i } \right) } { \sum_{i=1}^n{ x_i^2 } } \right)^2 } } { \sum_{i=1}^n { y_i^2 } } $

T4: $ \frac { \sum_{i=1}^n { x_i^2 } \left( \left( \sum_{i=1}^n { y_i^2 } \right) - \sum_{i=1}^n { \left( \frac { \left( y_i \sum_{i=1}^n{ x_i^2 } \right) - \left( x_i \sum_{i=1}^n{ x_i y_i } \right) } { \sum_{i=1}^n{ x_i^2 } } \right)^2 } \right) } { \sum_{i=1}^n { x_i^2 } \sum_{i=1}^n { y_i^2 } } $

T5: $ \frac { \sum_{i=1}^n { x_i^2 } \left( \left( \sum_{i=1}^n { y_i^2 } \right) - \sum_{i=1}^n { \left( \left( \left( y_i \sum_{i=1}^n{ x_i^2 } \right) - \left( x_i \sum_{i=1}^n{ x_i y_i } \right) \right) \frac { 1 } { \sum_{i=1}^n{ x_i^2 } } \right)^2 } \right) } { \sum_{i=1}^n { x_i^2 } \sum_{i=1}^n { y_i^2 } } $

... and here I am running massively out of ideas. I can see the term $ \left( \sum_{i=1}^n { x_i y_i } \right)^2 $ in the top, but no idea how to isolate it and remove the rest.

Does anyone have any pointers to get me back on the right road?

-terminal

2

There are 2 best solutions below

3
On

Recall that $R^2 = \frac{SSreg}{SST} = \frac{\sum( \hat y_i - \bar y ) ^ 2}{\sum( y_i - \bar y ) ^ 2} $. You can easily (by replacing $\hat y_i$ with $\hat{\beta_0} + \hat \beta_1 x_i$) show that $\sum( \hat y_i - \bar y ) ^ 2 = \hat \beta_1 ^2 \sum( x_i - \bar x ) ^ 2$. Now, use the fact that the OLS of $\beta_1$ is $$ \hat \beta_1 = \frac{\sum( y_i - \bar y )( x_i - \bar x ) }{ \sum( x_i - \bar x )^2}, $$ by plugging it in $\hat \beta_1 ^2 \sum( x_i - \bar x ) ^ 2 /\sum( y_i - \bar y ) ^ 2 $ you'll get $$ \frac{\left(\sum( y_i - \bar y )( x_i - \bar x )\right)^2}{\left(\sum( x_i - \bar x )^2 \sum( y_i - \bar y )^2 \right)^2 } $$ which is the square of the Pearson corr. coefficient $r_{xy}$.

0
On

So, after tons of headscratching, I think I have arrived at a solution. Big thanks to V. Vancak with this, as he pushed me thinking into the right direction =)

My solution looks like this (feel free to correct me):

RHS: $ \frac { \left( \sum_{i=1}^n { x_i y_i } \right)^2 } { \sum_{i=1}^n { x_i^2 } \sum_{i=1}^n { y_i^2 } } $

LHS with transformations:

T0: $ 1 - \frac { \sum_{i=1}^n { \left( y_i - \frac { \sum_{i=1}^n { x_i y_i } } { \sum_{i=1}^n { x_i^2 } } x_i \right)^2 } } { \sum_{i=1}^n { y_i^2 } } $

T1: $ \frac { \left( \sum_{i=1}^n { y_i^2 } \right) - \sum_{i=1}^n { \left( y_i - \frac { \sum_{i=1}^n { x_i y_i } } { \sum_{i=1}^n { x_i^2 } } x_i \right)^2 } } { \sum_{i=1}^n { y_i^2 } } $

T2: $ \frac { \sum_{i=1}^n { y_i^2 - \left( y_i - \frac { \sum_{i=1}^n { x_i y_i } } { \sum_{i=1}^n { x_i^2 } } x_i \right)^2 } } { \sum_{i=1}^n { y_i^2 } } $

T3: $ \frac { \sum_{i=1}^n { y_i^2 - y_i^2 + \left( - \frac { \sum_{i=1}^n { x_i y_i } } { \sum_{i=1}^n { x_i^2 } } \right)^2 x_i^2 } } { \sum_{i=1}^n { y_i^2 } } $

T4: $ \frac { \sum_{i=1}^n { \left( - \frac { \sum_{i=1}^n { x_i y_i } } { \sum_{i=1}^n { x_i^2 } } \right)^2 x_i^2 } } { \sum_{i=1}^n { y_i^2 } } $

T5: $ \frac { \left( - \frac { \sum_{i=1}^n { x_i y_i } } { \sum_{i=1}^n { x_i^2 } } \right)^2 \sum_{i=1}^n { x_i^2 } } { \sum_{i=1}^n { y_i^2 } } $

T6: $ \frac { \frac { \left( \sum_{i=1}^n { x_i y_i } \right)^2 } { \left( \sum_{i=1}^n { x_i^2 } \right)^2 } \sum_{i=1}^n { x_i^2 } } { \sum_{i=1}^n { y_i^2 } } $

T7: $ \frac { \frac { \sum_{i=1}^n { x_i^2 } \left( \sum_{i=1}^n { x_i y_i } \right)^2 } { \left( \sum_{i=1}^n { x_i^2 } \right)^2 } } { \sum_{i=1}^n { y_i^2 } } $

T8: $ \frac { \frac { \left( \sum_{i=1}^n { x_i y_i } \right)^2 } { \sum_{i=1}^n { x_i^2 } } } { \sum_{i=1}^n { y_i^2 } } $

T9: $ \frac { \left( \sum_{i=1}^n{ x_i y_i } \right)^2 } { \sum_{i=1}^n { x_i^2 } \sum_{i=1}^n { y_i^2 } } $