Different Regression Lines?

85 Views Asked by At

Hi quick question with regression. If the coefficients of a simple regression line, B0 and B1, are the same then why are the regression lines of y on x and x on y different given the condition r^2 < 1. I have tried all the manipulation and graphical analysis I can but can't seem to see why this is happening. Any help is appreciated.

3

There are 3 best solutions below

0
On

In one you are trying to write a formula for $y$ in terms of $x$. The error in the fit is the vertical distance from the data to the regression line.

In the second case the error is the horizontal distance from the data to the regression line.

Minimizing the sum of vertical errors is not the same as minimizing the sum of horizontal errors. In fact the discrepancy depends on $r^2-1$.

2
On

You received good explanations from user 44197 but let me add a few points.

When you perform a regression (linear or nonlinear) for $Y = F(a_0,a_1,a_2,...,X]$, the assumption is that there is no error on the $X$'s and that the errors on the $Y$'s are normally distributed. So, you minimize the sum of the squared errors on the $Y$'s. I suppose that you understand that the reverse process will lead to another regression.

The problem is more complex when there are errors on both $X$'s and $Y$'s. In this case, orthogonal distance regression is typically used
http://en.wikipedia.org/wiki/Total_least_squares will give you good explanations. In this case, the regression is unique.

0
On

To save effort, let's assume the means of the original data $x$s and $y$s are both $0$, that their standard deviations are $\sigma_x$ and $\sigma_y$ are positive, and that the covariance $\sigma_{xy}$ is positive; the correlation coefficient $r = \dfrac{\sigma_{xy}}{\sigma_x\sigma_y}$ will therefore also be positive though not more than $1$.

Then one possible line for the graph between the points is $\dfrac{y}{\sigma_y}= \dfrac{x}{\sigma_x}$, i.e. $y = \dfrac{\sigma_y}{\sigma_x} x$. You could easily end up with something close to this drawing by hand. If the correlation coefficient was $r=1$ then all the points would lie on this line.

But regressing $y$ on $x$ will not give you this result for $r\lt 1$, as it is trying to minimise the sum of squares of the vertical residuals, and it will do this by giving $y = \dfrac{\sigma_{xy}}{\sigma_x^2} x$, i.e. $y = r \dfrac{\sigma_y}{\sigma_x}x$, reducing the slope.

Meanwhile regressing $x$ on $y$ will give you a different result for $r\lt 1$, as it is trying to minimise the sum of squares of the horizontal residuals, and it will do this by giving $x = \dfrac{\sigma_{xy}}{\sigma_y^2} y$, i.e. $y = \dfrac1r \dfrac{\sigma_y}{\sigma_x}x$ and increasing the slope.

One lesson to draw from this is that a lower correlation will increase the difference between regressing $y$ on $x$ and regressing $x$ on $y$.