Why does the order of elements affect the line of best fit/linear regression?

351 Views Asked by At

Consider the following y-values ($(0,y_0),(1,y_1),...$):

$$580,382,854,193,128,901,283,294,854,490$$

Plotting the linear regression gives the following formula:

$$y = 4.5x + 475.8$$

However, switching around the order of the y-values, like so:

$$580,382,854,854,128,901,283,294,193,490$$

Gives the following line of best fit:

$$y = -35.6x + 656.1$$

Why does the order matter when it comes to a line of best fit? The elements are the same and the algorithm I am using has no interaction between x- and y-variables:

sx = 0; sy = 0; stt = 0; sts = 0;
yArray = {}; //ten numbers from above
for (i = 0; i < 10; ++i) {
    sx = sx + i;
    sy = sy + yArray[i];
}
for (i = 0; i < 10; ++i) {
    t = i - (sx / 10);
    stt = stt + (t * t);
    sts = sts + (t * yArray[i]);
}
slope = sts / stt;
intercept = (sy - (sx * slope)) / 10;

There's nothing like sx + sy or sx * sy, etc. I just don't see where the order matters here.

1

There are 1 best solutions below

0
On BEST ANSWER

The algorithm you are using must somehow assess the association between x and y. Otherwise, it can't give you a regression line. (I believe @CarlHeckman has put his finger on it in his second Comment.)

By changing the order of the y's without making a corresponding change in the order of the x's, you are destroying the bivariate nature of the data. Consider the following fake data:

 Subject:  1  2  3  4  5
 x:        2  4  6  8 10
 y1:       0  1  2  3  4

Here the x's and y's give points on an ascending line. Their correlation is 1.

But if I mix the y-values around, I destroy the 'pairing' that traces back to the subjects. Now the plotted (x, y) points no longer lie in a straight line. Their correlation is $r = 0.6$.

 Subject:  1  2  3  4  5
 x:        2  4  6  8 10
 y2:       0  3  2  1  4

And if I put the y's in reverse order, then I get a line that goes the other direction, and the correlation is -1.

 Subject:  1  2  3  4  5
 x:        2  4  6  8 10
 y3:       4  3  2  1  0

Changing the correlation changes the slope of the regression line. Here are plots of the original and changed data.

enter image description here