I am a hobby mathematican without any formal training. Currently I am chewing through 'An Introduction to Statistical Learning', 1st Ed. (abbreviated: ISLR, https://www.statlearning.com/). Now I am stuck on an algebraic proof in the linear regression exercise chapter 3, specifically 3.7.7.
As this is a somewhat 'uphill' difficulty, I have not made much progress on the desired proof.
The excercise states:
"It is claimed in the text that in the case of simple linear regression of $Y$ onto $X$, the $ R^2 $ statistic (3.17) is equal to the square of the correlation between $X$ and $Y $(3.18). Prove that this is the case. For simplicity, you may assume that $ \bar{x}=\bar{y}=0 $".
I have come this far up to now:
Excercise 3.7.7:
Prove that $ R^2 = Cor^2 $
DEFINITIONS:
$ \bar{x} \equiv \frac{1}{n} \sum_{i=1}^n { x_i }, $
$ \bar{y} \equiv \frac{1}{n} \sum_{i=1}^n { y_i }, $
$ \hat{\beta}_1 = \frac { \sum_{i=1}^n{ (x_i - \bar{x}) (y_i - \bar{y}) } } { \sum_{i=1}^n{ (x_i - \bar{x})^2 } }, $
$ \hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x}, $
$ \hat{y}_i = \hat{\beta}_0 + \hat{\beta}_1 x_i = \hat{\beta}_0 + \frac { \sum_{i=1}^n{ (x_i - \bar{x}) (y_i - \bar{y}) } } { \sum_{i=1}^n{ (x_i - \bar{x})^2 } } x_i = \bar{y} - \frac { \sum_{i=1}^n{ (x_i - \bar{x}) (y_i - \bar{y}) } } { \sum_{i=1}^n{ (x_i - \bar{x})^2 } } \bar{x} + \frac { \sum_{i=1}^n{ (x_i - \bar{x}) (y_i - \bar{y}) } } { \sum_{i=1}^n{ (x_i - \bar{x})^2 } } x_i, $
$ RSS = \sum_{i=1}^n { (y_i - \hat{y}_i)^2 }, $
$ TSS = \sum_{i=1}^n { (y_i - \bar{y} )^2 }, $
$ R^2 = \frac{TSS - RSS}{TSS} = 1 - \frac{RSS}{TSS} = 1 - \frac { \sum_{i=1}^n { (y_i - \hat{y}_i)^2 } } { \sum_{i=1}^n { (y_i - \bar{y} )^2 } }, $
$ Cor = \frac { \sum_{i=1}^n { (x_i-\bar{x})(y_i-\bar{y}) } } { \sqrt{ \sum_{i=1}^n { (x_i-\bar{x})^2 } } \sqrt{ \sum_{i=1}^n { (y_i-\bar{y})^2 } } } = \frac { \sum_{i=1}^n { (x_i-\bar{x})(y_i-\bar{y}) } } { \sqrt{ \sum_{i=1}^n { (x_i-\bar{x})^2 } \sum_{i=1}^n { (y_i-\bar{y})^2 } } } $
$ Cor^2 = \left( \frac { \sum_{i=1}^n { (x_i-\bar{x})(y_i-\bar{y}) } } { \sqrt{ \sum_{i=1}^n { (x_i-\bar{x})^2 } \sum_{i=1}^n { (y_i-\bar{y})^2 } } } \right)^2 = \frac { \left( \sum_{i=1}^n { (x_i-\bar{x})(y_i-\bar{y}) } \right)^2 } { \left( \sqrt{ \sum_{i=1}^n { (x_i-\bar{x})^2 } \sum_{i=1}^n { (y_i-\bar{y})^2 } } \right)^2 } = \frac { \left( \sum_{i=1}^n { (x_i-\bar{x}) (y_i-\bar{y}) } \right)^2 } { \sum_{i=1}^n { (x_i-\bar{x})^2 } \sum_{i=1}^n { (y_i-\bar{y})^2 } } $
TO PROVE:
$ R^2 = Cor^2 $, with $ \bar{x} = \bar{y} = 0 $. Initial expansion:
$ \hat{y}_i = \bar{y} - \frac { \sum_{i=1}^n{ (x_i - \bar{x}) (y_i - \bar{y}) } } { \sum_{i=1}^n{ (x_i - \bar{x})^2 } } \bar{x} + \frac { \sum_{i=1}^n{ (x_i - \bar{x}) (y_i - \bar{y}) } } { \sum_{i=1}^n{ (x_i - \bar{x})^2 } } x_i = 0 - \frac { \sum_{i=1}^n{ (x_i - 0) (y_i - 0) } } { \sum_{i=1}^n{ (x_i - 0)^2 } } 0 + \frac { \sum_{i=1}^n{ (x_i - 0) (y_i - 0) } } { \sum_{i=1}^n{ (x_i - 0)^2 } } x_i = \frac { \sum_{i=1}^n{ x_i y_i } } { \sum_{i=1}^n{ x_i^2 } } x_i $
$ 1 - \frac { \sum_{i=1}^n { (y_i - \hat{y}_i)^2 } } { \sum_{i=1}^n { (y_i - \bar{y} )^2 } } = \frac { \left( \sum_{i=1}^n { (x_i-\bar{x})(y_i-\bar{y}) } \right)^2 } { \sum_{i=1}^n { (x_i-\bar{x})^2 } \sum_{i=1}^n { (y_i-\bar{y})^2 } }, $
$ 1 - \frac { \sum_{i=1}^n { (y_i - \hat{y}_i)^2 } } { \sum_{i=1}^n { (y_i - 0 )^2 } } = \frac { \left( \sum_{i=1}^n { (x_i-0)(y_i-0) } \right)^2 } { \sum_{i=1}^n { (x_i-0)^2 } \sum_{i=1}^n { (y_i-0)^2 } }, $
$ 1 - \frac { \sum_{i=1}^n { (y_i - \hat{y}_i)^2 } } { \sum_{i=1}^n { y_i^2 } } = \frac { \left( \sum_{i=1}^n { x_i y_i } \right)^2 } { \sum_{i=1}^n { x_i^2 } \sum_{i=1}^n { y_i^2 } }, $
$ \boxed{ 1 - \frac { \sum_{i=1}^n { \left( y_i - \frac { \sum_{i=1}^n{ x_i y_i } } { \sum_{i=1}^n{ x_i^2 } } x_i \right)^2 } } { \sum_{i=1}^n { y_i^2 } } = \frac { \left( \sum_{i=1}^n { x_i y_i } \right)^2 } { \sum_{i=1}^n { x_i^2 } \sum_{i=1}^n { y_i^2 } } } $
Ok, so now I want to prove the equality of the boxed formula. And that is were I start to stumble. I manage to make a couple of transformations, however I can only achieve a common denominator between the two. The top part of the vulgar fraction escapes me.
Here is what I have tried:
RHS equals to:
$ \frac { \left( \sum_{i=1}^n { x_i y_i } \right)^2 } { \sum_{i=1}^n { x_i^2 } \sum_{i=1}^n { y_i^2 } } $
Transformations for LHS:
T0: $ 1 - \frac { \sum_{i=1}^n { \left( y_i - \frac { \sum_{i=1}^n{ x_i y_i } } { \sum_{i=1}^n{ x_i^2 } } x_i \right)^2 } } { \sum_{i=1}^n { y_i^2 } } $
T1: $ \frac { \left( \sum_{i=1}^n { y_i^2 } \right) - \sum_{i=1}^n { \left( y_i - \frac { \sum_{i=1}^n{ x_i y_i } } { \sum_{i=1}^n{ x_i^2 } } x_i \right)^2 } } { \sum_{i=1}^n { y_i^2 } } $
T2: $ \frac { \left( \sum_{i=1}^n { y_i^2 } \right) - \sum_{i=1}^n { \left( y_i - \frac { x_i \sum_{i=1}^n{ x_i y_i } } { \sum_{i=1}^n{ x_i^2 } } \right)^2 } } { \sum_{i=1}^n { y_i^2 } } $
T3: $ \frac { \left( \sum_{i=1}^n { y_i^2 } \right) - \sum_{i=1}^n { \left( \frac { \left( y_i \sum_{i=1}^n{ x_i^2 } \right) - \left( x_i \sum_{i=1}^n{ x_i y_i } \right) } { \sum_{i=1}^n{ x_i^2 } } \right)^2 } } { \sum_{i=1}^n { y_i^2 } } $
T4: $ \frac { \sum_{i=1}^n { x_i^2 } \left( \left( \sum_{i=1}^n { y_i^2 } \right) - \sum_{i=1}^n { \left( \frac { \left( y_i \sum_{i=1}^n{ x_i^2 } \right) - \left( x_i \sum_{i=1}^n{ x_i y_i } \right) } { \sum_{i=1}^n{ x_i^2 } } \right)^2 } \right) } { \sum_{i=1}^n { x_i^2 } \sum_{i=1}^n { y_i^2 } } $
T5: $ \frac { \sum_{i=1}^n { x_i^2 } \left( \left( \sum_{i=1}^n { y_i^2 } \right) - \sum_{i=1}^n { \left( \left( \left( y_i \sum_{i=1}^n{ x_i^2 } \right) - \left( x_i \sum_{i=1}^n{ x_i y_i } \right) \right) \frac { 1 } { \sum_{i=1}^n{ x_i^2 } } \right)^2 } \right) } { \sum_{i=1}^n { x_i^2 } \sum_{i=1}^n { y_i^2 } } $
... and here I am running massively out of ideas. I can see the term $ \left( \sum_{i=1}^n { x_i y_i } \right)^2 $ in the top, but no idea how to isolate it and remove the rest.
Does anyone have any pointers to get me back on the right road?
-terminal
Recall that $R^2 = \frac{SSreg}{SST} = \frac{\sum( \hat y_i - \bar y ) ^ 2}{\sum( y_i - \bar y ) ^ 2} $. You can easily (by replacing $\hat y_i$ with $\hat{\beta_0} + \hat \beta_1 x_i$) show that $\sum( \hat y_i - \bar y ) ^ 2 = \hat \beta_1 ^2 \sum( x_i - \bar x ) ^ 2$. Now, use the fact that the OLS of $\beta_1$ is $$ \hat \beta_1 = \frac{\sum( y_i - \bar y )( x_i - \bar x ) }{ \sum( x_i - \bar x )^2}, $$ by plugging it in $\hat \beta_1 ^2 \sum( x_i - \bar x ) ^ 2 /\sum( y_i - \bar y ) ^ 2 $ you'll get $$ \frac{\left(\sum( y_i - \bar y )( x_i - \bar x )\right)^2}{\left(\sum( x_i - \bar x )^2 \sum( y_i - \bar y )^2 \right)^2 } $$ which is the square of the Pearson corr. coefficient $r_{xy}$.