Help regarding least squares regression method formula

99 Views Asked by At

I have the following two formulas for linear regression.

Formula 1: $$m = \frac{\sum_{k=1}^n (x_i - \bar{x})(y_i - \bar{y})}{\sum_{k=1}^n (x_i - \bar{x})^2}$$ $$c = \bar{y} - m\bar{x}$$

Formula 2:

$$b = \frac{n\sum(xy) - \sum x \sum y}{n \sum{(x^2)} - (\sum{x})^2}$$

So I want to know which formula is the correct one for finding a linear regression line using the least squares method. I checked online and saw that some used formula 1 and others used formula 2. And I have no idea if either of them is right or both are the same. Please help me understand this. My professor's notes also don't state which is the least squares method.

2

There are 2 best solutions below

0
On BEST ANSWER

(I'm going to leave out the bounds on most of the sums to keep the notation clean, so assume all sums go from $k = 1$ to $n.$)

In your second link, $b$ represents the slope of the line of best fit. This is equivalent to the formula given for $m$ in the first link:

$$m = \frac{\sum(x - \bar{x})(y - \bar{y})}{\sum{(x-\bar{x})^2}} = \frac{\sum(xy - \bar{x}y - x\bar{y} + \bar{x}\bar{y})}{\sum(x^2 - 2x\bar{x} + \bar{x}^2)}$$

$$=\frac{\sum(xy) - \sum(\bar{x}y) - \sum(x\bar{y}) + \sum(\bar{x}\bar{y})}{\sum(x^2) - \sum(2x\bar{x}) + \sum(\bar{x}^2)}$$

Now, because $\bar{x}$ and $\bar{y}$ are constant, we can pull them out of the sums like this:

$$m = \frac{\sum(xy) - \bar{x}\sum(y) - \bar{y}\sum(x) + \bar{x}\bar{y}\sum(1)}{\sum(x^2) - 2\bar{x}\sum(x) + \bar{x}^2\sum(1)}$$

Now because $\sum_{k=1}^n 1 = n$ and $\bar{x} = \frac{\sum x}{n}$, we can rewrite this as:

$$m = \frac{\sum(xy) - \frac{1}{n}\sum x\sum y - \frac1{n}\sum y\sum x + \frac{1}{n}\sum{x}\sum{y}}{\sum(x^2) - \frac2{n}\sum{x}\sum x + \frac{1}{n}(\sum{x})^2}$$

$$= \frac{\sum(xy) - \frac{1}{n}\sum x\sum y}{\sum(x^2) - \frac{1}{n}(\sum{x})^2}$$

Now simply multiplying the numerator and denominator by $n$ gives us our second formula:

$$m = \frac{n\sum(xy) - \sum x\sum y}{n\sum(x^2) - (\sum{x})^2}$$

Hope this helps!

0
On

Both formulas are right and they are the same. Let's inspect the numerator:

$$\sum\limits_{i=1}^{n} (x_i-\overline x)(y_i-\overline y)$$

Multiplying out the brackets

$$\sum\limits_{i=1}^{n} (x_i\cdot y_i-\overline yx_i-\overline xy_i+\overline x\overline y)$$

Now when we break up the sum over addition, $\overline y$ and $\overline x$ can be put in front of the sigma sign since they do not depend on index $i$.

$$\sum\limits_{i=1}^{n} x_i\cdot y_i-\overline y\sum\limits_{i=1}^{n}x_i-\overline x\sum\limits_{i=1}^{n}y_i+\overline x\overline y\sum\limits_{i=1}^{n} 1$$

We use that $\frac1n\sum\limits_{i=1}^{n}x_i=\overline x$, and similar for the y-values. And also $\sum\limits_{i=1}^{n} 1=n$

$$\sum\limits_{i=1}^{n} x_i\cdot y_i-\frac1n\sum\limits_{i=1}^{n}y_i\sum\limits_{i=1}^{n}x_i-\frac1n\sum\limits_{i=1}^{n}x_i\sum\limits_{i=1}^{n}y_i+n\cdot\frac1n\sum\limits_{i=1}^{n}x_i \frac1n\sum\limits_{i=1}^{n}y_i$$

$$\sum\limits_{i=1}^{n} x_i\cdot y_i-\frac1n\sum\limits_{i=1}^{n}y_i\sum\limits_{i=1}^{n}x_i\underbrace{-\frac1n\sum\limits_{i=1}^{n}x_i\sum\limits_{i=1}^{n}y_i+\frac1n\sum\limits_{i=1}^{n}x_i \sum\limits_{i=1}^{n}y_i}_{0}$$

$$\sum\limits_{i=1}^{n} x_i\cdot y_i-\frac1n\sum\limits_{i=1}^{n}y_i\sum\limits_{i=1}^{n}x_i$$

At the last step we have to multiply the numerator by $n$ to obtain the term at the second formula. The same operation has to be done for the denominator. So we expand the whole fraction by $n$. This does not change the value of the fraction.