I am trying to understand linear regression. I have a limited knowledge in math (Algebra I) but I still want to be able to learn and understand what this is. I don't need to know all the math surrounding linear regression but a basic working understanding would be great. Can someone give me an simple formula, an example, and an explanation for what all the symbols and variables are for basic linear regression?
Thanks
Okay, so one of the simplest is called least-squares interpolation. I won’t go through how it is derived because that requires calculus. But lets say you want to find a line $$y=ax+b$$ that best fits the data points $$(x_1,y_1),\ (x_2,y_2),\ \ldots,\ (x_n,y_n).$$ Then least squares interpolation tells you that you want to minimize the sum of the squares of the deviations $D$ from the observed values and the values that would be predicted by the best fit line. That is you want to minimize $$D(a,b)=\sum_{i=1}^{n}(y_i-[ax_i+b])^2.$$ And that’s where the calculus comes in, but after you apply the calculus you find that the best $a$ and $b$ satisfy the system of linear equations $$\left(\sum_{i=1}^{n}x_i^2\right)a+\left(\sum_{i=1}^{n}x_i\right)b=\sum_{i=1}^{n}x_iy_i,$$ $$\left(\sum_{i=1}^{n}x_i\right)a+nb=\sum_{i=1}^{n}y_i.$$ Which of course, although the sums look intimidating, they are essentially just constants which can be calculated given your data points.
Also, by dividing each equation by $n$ this can also be expressed as the system $$\bar{s}a+\bar{x}b=\bar{p}$$ $$\bar{x}+b=\bar{y}$$ where $\bar{x}$ and $\bar{y}$ are the averages of $x_i$ and $y_i$ respectively, $\bar{s}$ is the average of the squares of the $x_i$s, and $\bar{p}$ is the average of the products of $x_i$ and $y_i$.
Bonus I won’t go into much detail on it here, but it turns out that this method can also, quite easily, be applied to finding the best fit quadratic equation as well as best fit exponential functions, that is another reason why this method is so widely used. Read More