Simple linear regression seems off

58 Views Asked by At

I have the following datapoints: $$p1(52,730)$$ $$p2(53,409)$$ $$p3(52,250)$$ $$p4(52,90)$$ Now I want to find the best fitting line between these points. When I use simple linear regression I get $$y = 52.33 x - 2364.67$$ However, I would expect a much higher slope, since the points are nearly on a vertical line. When I plot the line and the points, I also visually see that the found line is not optimal, in other words I would be able to draw a line with less distance to the points.
E.g. if I draw a line from one end of my graph to the other I get $$p_{y0}: (45,0)$$ $$p_{y816}: (60,816)$$ This seems way off. I would expect something where x is close to 52.
What am I missing?

2

There are 2 best solutions below

0
On BEST ANSWER

A possible solution was to make a linear regression of $x$ on $y$ instead of $y$ on $x$ as proposed in the question.

line of regression of Y on X $$y-\bar{y} = \tfrac{cov(X,Y)}{\sigma_{x}^{2}}(x-\bar{x})$$ line of regression of X on Y $$x-\bar{x} = \tfrac{cov(X,Y)}{\sigma_{y}^{2}}(y-\bar{y})$$ By using the points shown above we get
Finding the arithmetic means: $$\bar{x}=\tfrac{209}{4}=52.25$$ $$\bar{y}=\tfrac{1479}{4}=369.75$$ Calculating the covariance: $$cos(X,Y)=\tfrac{77317}{4}-52.25*369.75=9,8125$$ Calculating the variances: $$\sigma_{x}^{2}=\tfrac{10921}{4}-52.25^2=0,1875$$ $$\sigma_{y}^{2}=\tfrac{770781}{4}-369.75^2=55980,1875$$ Linear regression of y on x: $$y-369.75 = \tfrac{9,8125}{0,1875}(x-52,25)$$ $$y=52,\bar{3}x-2364,\bar{6}$$ Linear regression of x on y: $$x-52,25 = \tfrac{9,8125}{55980,1875}(y-369,75)$$ $$y=5704x-297716$$ The linear regression of $x$ on $y$ gave the results I was looking for (note that the results are rounded).
More information on this issue can be found here: What is the difference between linear regression on y with x and x with y

0
On

As there are four points: $$n=4$$ Computing sums: $$S_x=\sum x_i=209$$ $$S_y=\sum y_i=1479$$ $$S_{xx}=\sum x_i^2=10921$$ $$S_{xy}=\sum x_iy_i=77317$$ $$S_{yy}=\sum y_i^2=$$ Computing $\hat\alpha$ and $\hat\beta$ $$\hat \beta=\frac{nS_{xy}-S_xS_y}{nS_{xx}-S_x^2}=52.\bar3$$ $$\hat \alpha=\frac1nS_y-\hat\beta\frac1nS_x=-2364.\bar6$$


Your questions: Yes it does have a high slope i.e. $52.\bar3$ or at an angle of $88.905^o\sim90^o$ This is the best simple regression possible until you want to minimize specific parameters such as sum of cube of distances, etc.