I have the following datapoints:
$$p1(52,730)$$
$$p2(53,409)$$
$$p3(52,250)$$
$$p4(52,90)$$
Now I want to find the best fitting line between these points.
When I use simple linear regression I get
$$y = 52.33 x - 2364.67$$
However, I would expect a much higher slope, since the points are nearly on a vertical line. When I plot the line and the points, I also visually see that the found line is not optimal, in other words I would be able to draw a line with less distance to the points.
E.g. if I draw a line from one end of my graph to the other I get
$$p_{y0}: (45,0)$$
$$p_{y816}: (60,816)$$
This seems way off. I would expect something where x is close to 52.
What am I missing?
2026-03-28 09:54:26.1774691666
On
Simple linear regression seems off
58 Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail At
2
There are 2 best solutions below
0
On
As there are four points: $$n=4$$ Computing sums: $$S_x=\sum x_i=209$$ $$S_y=\sum y_i=1479$$ $$S_{xx}=\sum x_i^2=10921$$ $$S_{xy}=\sum x_iy_i=77317$$ $$S_{yy}=\sum y_i^2=$$ Computing $\hat\alpha$ and $\hat\beta$ $$\hat \beta=\frac{nS_{xy}-S_xS_y}{nS_{xx}-S_x^2}=52.\bar3$$ $$\hat \alpha=\frac1nS_y-\hat\beta\frac1nS_x=-2364.\bar6$$
Your questions: Yes it does have a high slope i.e. $52.\bar3$ or at an angle of $88.905^o\sim90^o$ This is the best simple regression possible until you want to minimize specific parameters such as sum of cube of distances, etc.
A possible solution was to make a linear regression of $x$ on $y$ instead of $y$ on $x$ as proposed in the question.
line of regression of Y on X $$y-\bar{y} = \tfrac{cov(X,Y)}{\sigma_{x}^{2}}(x-\bar{x})$$ line of regression of X on Y $$x-\bar{x} = \tfrac{cov(X,Y)}{\sigma_{y}^{2}}(y-\bar{y})$$ By using the points shown above we get
Finding the arithmetic means: $$\bar{x}=\tfrac{209}{4}=52.25$$ $$\bar{y}=\tfrac{1479}{4}=369.75$$ Calculating the covariance: $$cos(X,Y)=\tfrac{77317}{4}-52.25*369.75=9,8125$$ Calculating the variances: $$\sigma_{x}^{2}=\tfrac{10921}{4}-52.25^2=0,1875$$ $$\sigma_{y}^{2}=\tfrac{770781}{4}-369.75^2=55980,1875$$ Linear regression of y on x: $$y-369.75 = \tfrac{9,8125}{0,1875}(x-52,25)$$ $$y=52,\bar{3}x-2364,\bar{6}$$ Linear regression of x on y: $$x-52,25 = \tfrac{9,8125}{55980,1875}(y-369,75)$$ $$y=5704x-297716$$ The linear regression of $x$ on $y$ gave the results I was looking for (note that the results are rounded).
More information on this issue can be found here: What is the difference between linear regression on y with x and x with y