I have the following two formulas for linear regression.
Formula 1: $$m = \frac{\sum_{k=1}^n (x_i - \bar{x})(y_i - \bar{y})}{\sum_{k=1}^n (x_i - \bar{x})^2}$$ $$c = \bar{y} - m\bar{x}$$
Formula 2:
$$b = \frac{n\sum(xy) - \sum x \sum y}{n \sum{(x^2)} - (\sum{x})^2}$$
So I want to know which formula is the correct one for finding a linear regression line using the least squares method. I checked online and saw that some used formula 1 and others used formula 2. And I have no idea if either of them is right or both are the same. Please help me understand this. My professor's notes also don't state which is the least squares method.
(I'm going to leave out the bounds on most of the sums to keep the notation clean, so assume all sums go from $k = 1$ to $n.$)
In your second link, $b$ represents the slope of the line of best fit. This is equivalent to the formula given for $m$ in the first link:
$$m = \frac{\sum(x - \bar{x})(y - \bar{y})}{\sum{(x-\bar{x})^2}} = \frac{\sum(xy - \bar{x}y - x\bar{y} + \bar{x}\bar{y})}{\sum(x^2 - 2x\bar{x} + \bar{x}^2)}$$
$$=\frac{\sum(xy) - \sum(\bar{x}y) - \sum(x\bar{y}) + \sum(\bar{x}\bar{y})}{\sum(x^2) - \sum(2x\bar{x}) + \sum(\bar{x}^2)}$$
Now, because $\bar{x}$ and $\bar{y}$ are constant, we can pull them out of the sums like this:
$$m = \frac{\sum(xy) - \bar{x}\sum(y) - \bar{y}\sum(x) + \bar{x}\bar{y}\sum(1)}{\sum(x^2) - 2\bar{x}\sum(x) + \bar{x}^2\sum(1)}$$
Now because $\sum_{k=1}^n 1 = n$ and $\bar{x} = \frac{\sum x}{n}$, we can rewrite this as:
$$m = \frac{\sum(xy) - \frac{1}{n}\sum x\sum y - \frac1{n}\sum y\sum x + \frac{1}{n}\sum{x}\sum{y}}{\sum(x^2) - \frac2{n}\sum{x}\sum x + \frac{1}{n}(\sum{x})^2}$$
$$= \frac{\sum(xy) - \frac{1}{n}\sum x\sum y}{\sum(x^2) - \frac{1}{n}(\sum{x})^2}$$
Now simply multiplying the numerator and denominator by $n$ gives us our second formula:
$$m = \frac{n\sum(xy) - \sum x\sum y}{n\sum(x^2) - (\sum{x})^2}$$
Hope this helps!