How to show for a simple regression with an intercept and one independent variable $R^2 = r ^2$ , where $r$ is the ordinary correlation coefficient.

3.7k Views Asked by At

How to show for a simple regression with an intercept and one independent variable $R^2 = r ^2$, where $r$ is the ordinary correlation coefficient.

Here is where I'm at.

$R^2= \textrm{SSR}/\textrm{SST}$ then I substituted for $\hat{Y}$. Now my question is what is the difference and meaning between $\hat{Y}$, $\bar{Y}$, and $Y$ in the substitution of SSR and SST?

2

There are 2 best solutions below

2
On BEST ANSWER

Consider:

The Pearson Product Moment Correlation Coefficient $r$ is an estimate of $\rho$, the population correlation coefficient, which measures the strength of a linear relationship between the two variables $x$ and $y$ ($x$ independent and $y$ dependent):

$r$ $=$ $\dfrac{\sum_{i=1}^{n}(x_i-\bar{x})(y_i-\bar{y})}{\sqrt{\sum_{i=1}^{n}(x_i-\bar{x})^2 \cdot \sum_{i=1}^{n}(y_i-\bar{y})^2 }}$

where the bar in $\bar{x}$ and $\bar{y}$ represents the mean value of $x$ and $y$. $R^2$ is the percentage of variance accounted for by the regression model and is defined as indicated above and more fully as:

$R^2$ $=$ $\dfrac{Sum Squares Regression}{Sum Squares Total)}$ $=$ $\dfrac{\sum_{i=1}^{n}(\hat{y_i}-\bar{y})^2}{\sum_{i=1}^{n}(y_i-\bar{y})^2}$

where the hat in $\hat{y_i}$ represents the estimated $ith$ $y$ value using the estimated regression line. Note that $\hat{Y}$ represents the estimated $Y$ value based on the estimated regression line, $\bar{Y}$ represents the mean of the dependent variable $Y$, and $Y$ represents the dependent variable.

Post Script: $\hat{y_i}$ $=$ $b_0 + b_1 \cdot x_i$ where $b_0$ is the estimate of the intercept and $b_1$ is the estimate of the slope.

0
On

Here's a proof that in simple linear regression, the coefficient of determination $R^2$ equals the square of the correlation coefficient for $(x_1,y_1), (x_2,y_2),\ldots, (x_n,y_n)$.

For the simple linear regression model $$y_i=\beta_0+\beta_1 x_i + \epsilon_i\tag1$$ the least squares estimator for the intercept $\beta_0$ is $$\hat\beta_0:=\bar y -\hat\beta_1\bar x,\tag2$$ while the least squares estimator for the slope $\beta_1$ is $$\hat\beta_1:=\frac{\sum(x_i-\bar x)(y_i-\bar y)}{\sum(x_i-\bar x)^2}.\tag 3$$ The $i$th predicted response is $$\hat y_i:=\hat\beta_0 +\hat\beta_1x_i\stackrel{(2)}=\bar y + \hat\beta_1(x_i-\bar x)\tag4$$ which rearranges to $$\hat y_i-\bar y=\hat\beta_1(x_i-\bar x).\tag 5$$ The coefficient of determination is the fraction of variation in $y$ that is explained by the regression model: $$R^2:=\frac{\sum(\hat y_i-\bar y)^2}{\sum(y_i-\bar y)^2}\stackrel{(5)}= \frac{(\hat\beta_1)^2\sum(\hat x_i-\bar x)^2}{\sum(y_i-\bar y)^2} \stackrel{(3)}= \left(\frac{\sum(x_i-\bar x)(y_i-\bar y)}{\sum(x_i-\bar x)^2}\right)^2 \frac{\sum(x_i-\bar x)^2}{\sum(y_i-\bar y)^2}.\tag6 $$ The RHS of (6) simplifies to $$\frac{\left(\sum(x_i-\bar x)(y_i-\bar y)\right)^2}{\sum(x_i-\bar x)^2\sum(y_i-\bar y)^2}=\left(\frac{\sum(x_i-\bar x)(y_i-\bar y)}{\sqrt{\sum(x_i-\bar x)^2}\sqrt{\sum(y_i-\bar y)^2}}\right)^2 $$ which is the square of the correlation coefficient $r$.