Calculate parameters of regression line

86 Views Asked by At

I'm learning statistics and trying to calculate the parameters of a regression line based on the data:

$(x,y)=(0,-1),(1,2),(2,9/2)$

Could you please tell me how would I do so? I'm not sure how to calculate the parameters based just on a table of values.

Thanks

3

There are 3 best solutions below

4
On BEST ANSWER

I'll meet you half way. Here is a complete run-through of doing linear regression by hand, with hints of how the method can be generalized. The compromise though, is I'll use different numbers than you provided. The machinery will work the same.

Starting with a simple example, consider the following (six) data points occurring as ordered pairs: $$x_{i} : 0.6 , 1.8 , 2.8 , 3.6 , 4.2 , 5.6$$ $$y_{i} : 1.6 , 1.6 , 2.6 , 2.0 , 4.0 , 3.6$$

A scatter plot of $\left\{x_i,y_i\right\}$ would show a roughly linear arrangement of points, perhaps approximated the equation $$y = \frac{1}{2}x + 1 \:,$$ which could be discerned using a ruler. Of course, we want to do better than guessing, so begin with the most general form of a line in the $xy$-plane, namely $$f\left(x\right) = mx + b \:.$$ Note that some choice of $m$ and $b$ correspond to a line that passes closer to the group of points $\left\{x_i,y_i\right\}$ than any other line.

The task now is to determine $m$ and $b$ based on the data given. To do this, suppose the line $f\left(x\right)=mx+b$ is sketched somewhere in the plane. Next, write the square of the vertical displacement from any given $y_i$ (up or down) to $f\left(x_i\right)$ for a single point: $$z_i^2 = \left(f\left(x_i\right) - y_i\right)^2 \:,$$ which can be done for all data points: $$F = \sum_{i=1}^N z_i^2 = \sum_{i=1}^N \left(mx_i + b - y_i\right)^2$$ In this form, we can guarantee a best-fitting line by finding the proper $m$ and $b$ that minimizes $F$. Taking the partial derivative with respect to $m$ and $b$ respectively, we have $$\frac{\partial F}{\partial m} = \sum_{i=1}^N x_i \cdot \left(m x_i + b - y_i\right) \hspace{2.54cm} \frac{\partial F}{\partial b} = \sum_{i=1}^N \left(mx_i + b - y_i\right) \:,$$ where setting $\partial F/\partial m = \partial F / \partial b = 0$ we get, after distributing the summation signs $$0 = m \sum_{i=1}^N x_i^2 + b \sum_{i=1}^N x_i - \sum_{i=1}^N x_i \: y_i \hspace{2.54cm} 0 = m \sum_{i=1}^N x_i + b \sum_{i=1}^N \left(1\right) - \sum_{i=1}^N y_i \:.$$ Letting $$X^2 = \sum_{i=1}^N x_i^2 \hspace{2.00cm} X = \sum_{i=1}^N x_i \hspace{2.00cm} Y = \sum_{i=1}^N y_i \hspace{2.00cm} XY = \sum_{i=1}^N x_i \: y_i \:,$$ the above is easily recognized as a system of two equations and two unknowns $m$ and $b$ (the rest are all numbers): $$0 = mX^2 + bX - XY \hspace{2.54cm} 0 = mX + bN - Y$$ Note that the variables $X^2$, $X$, $Y$, $XY$ don't follow ordinary algebra, that is $X \cdot X \neq X^2$, and so on. Solving for $m$ and $b$, we find $$m = \frac{N \cdot XY - X \cdot Y}{N \cdot X^2 - X \cdot X} \hspace{2.54cm} b = \frac{Y \cdot X^2 - X \cdot XY}{N \cdot X^2 - X \cdot X} \:.$$ Evidently, $m$ and $b$ are calculated in one iteration with many sub-steps.


For the example on hand, we have $N=6$, along with $$X^2 = 73.4 \hspace{2.00cm} X = 18.6 \hspace{2.00cm} Y = 15.4 \hspace{2.54cm} XY = 55.28 \:,$$ finally giving $$m = \frac{6 \cdot 55.28 - 18.6 \cdot 15.4}{6 \cdot 73.4 - 18.6^2} \approx 0.479 \hspace{2.54cm} b = \frac{15.4 \cdot 73.4 - 18.6 \cdot 55.28}{6 \cdot 73.4 - 18.6^2} \approx 1.082 \:,$$ corresponding to the best-fit line $$y = 0.479 \: x + 1.082 \:.$$

1
On

I'll meet you the "other" half way:

Your data and fit should look like this:

enter image description here

2
On

Let´s say the regression function is $y=a+bx$ Then you have to minimize

$$S=\sum_{i=1}^3 (y_i-\alpha-\beta x_i)^2,$$

where $\alpha$ and $\beta$ are the estimators for the parameters $a$ and $b$. To obtain the minimum you differentiate $S$ w.r.t a and b respectively and set them equal to $0$. $$\frac{\partial S}{\partial \alpha}=-2\cdot \sum_{i=1}^3 (y_i-\alpha-bx_i)=0$$

$$\frac{\partial S}{\partial \beta}=-2\cdot \sum_{i=1}^3 (y_i-\alpha-\beta x_i)\cdot x_i=0$$

The solutions of that two equations are

$$\beta=\frac{ \sum\limits_{i=1}^3 (x_i - \bar{x})(y_i - \bar{y}) }{ \sum\limits_{i=1}^3 (x_i - \bar{x})^2 } \ \ (I), \qquad\qquad \alpha=\overline y-\beta\overline x \ \ (II)$$

So in your case $$\beta=\frac{ (0-1)\cdot (-1-5.5/3)+(1-1)\cdot (2-5.5/3)+(2-1)\cdot (4.5-5.5/3) }{ (0-1)^2+(1-1)^2+(2-1)^2 }=2.75$$

With the help of $(II)$ it is straightforward to calculate the value of $\alpha$.