Why is the regression line an estimate of the average value of y for each value of x?

2.4k Views Asked by Bumbble Comm At 28 Mar 2026 - 3:21

The regression line, passing through the point of averages with a slope equivalent to r, is said to be a good estimate of the average value of y for each value of x.

I can see why this is the cases when r = 1,0 and -1. When r=1, all points lie on a line. SD increases in equal proportions. Likewise for -1, they have an inverse relationship. For r=0, there is no correlation, so on average, an increase in x will have no effect on y.

But what about the values in between? I am using Freedman's Statistics textbook, and it mentions that while r is the correct factor to use, for values in between 1 and -1, a "complicated mathematical argument is needed". What is this argument?

Original Q&A

There are 1 best solutions below

Bumbble Comm On 13 Jul 2016 - 3:48 BEST ANSWER

Pearson's correlation coefficient between $X$ and $Y$ is given by $$ \rho = \frac{cov(X,Y)}{\sigma_X \sigma_Y}. $$ In the simple linear regression model $y=\beta_0 + \beta_1x + \epsilon$, the slope, $\beta_1$, has the following form/interpretation $$ \beta_1 = \frac{cov(X,Y)}{\sigma^2_X}, $$ hence, $$ \beta_1 = \frac{cov(X,Y)\sigma_Y}{\sigma_X \sigma_X \sigma_Y}=\rho\frac{\sigma_Y}{\sigma_X}. $$ So the rate of change in $y$ as a function of $x$ depends on the proportion of the corresponding standard deviation, $$ \frac{\partial}{\partial x}y=\beta_1 = \rho\frac{\sigma_Y}{\sigma_X}. $$

Why is the regression line an estimate of the average value of y for each value of x?

There are 1 best solutions below

Related Questions in STATISTICS

Related Questions in REGRESSION

Related Questions in CORRELATION

Trending Questions

Popular # Hahtags

Popular Questions