linear regression for dummies

2.5k Views Asked by At

I am trying to understand linear regression. I have a limited knowledge in math (Algebra I) but I still want to be able to learn and understand what this is. I don't need to know all the math surrounding linear regression but a basic working understanding would be great. Can someone give me an simple formula, an example, and an explanation for what all the symbols and variables are for basic linear regression?

Thanks

3

There are 3 best solutions below

0
On BEST ANSWER

Okay, so one of the simplest is called least-squares interpolation. I won’t go through how it is derived because that requires calculus. But lets say you want to find a line $$y=ax+b$$ that best fits the data points $$(x_1,y_1),\ (x_2,y_2),\ \ldots,\ (x_n,y_n).$$ Then least squares interpolation tells you that you want to minimize the sum of the squares of the deviations $D$ from the observed values and the values that would be predicted by the best fit line. That is you want to minimize $$D(a,b)=\sum_{i=1}^{n}(y_i-[ax_i+b])^2.$$ And that’s where the calculus comes in, but after you apply the calculus you find that the best $a$ and $b$ satisfy the system of linear equations $$\left(\sum_{i=1}^{n}x_i^2\right)a+\left(\sum_{i=1}^{n}x_i\right)b=\sum_{i=1}^{n}x_iy_i,$$ $$\left(\sum_{i=1}^{n}x_i\right)a+nb=\sum_{i=1}^{n}y_i.$$ Which of course, although the sums look intimidating, they are essentially just constants which can be calculated given your data points.
Also, by dividing each equation by $n$ this can also be expressed as the system $$\bar{s}a+\bar{x}b=\bar{p}$$ $$\bar{x}+b=\bar{y}$$ where $\bar{x}$ and $\bar{y}$ are the averages of $x_i$ and $y_i$ respectively, $\bar{s}$ is the average of the squares of the $x_i$s, and $\bar{p}$ is the average of the products of $x_i$ and $y_i$.

Bonus I won’t go into much detail on it here, but it turns out that this method can also, quite easily, be applied to finding the best fit quadratic equation as well as best fit exponential functions, that is another reason why this method is so widely used. Read More

0
On

A figure helps. The blue shows a set of points (($x_1,y_1$), $(x_2, y_2)$, ...) and the red the least-squares fit:

enter image description here

Given the data, you want to find the best fit linear function (line) that minimizes the sum of the squares of the vertical distances from each point to the line.

If your data is three-dimensional, then the linear least squares solution can be visualized as a plane.

And so on, into higher dimensions.

0
On

Look at the four red dots in the picture below. Imagine you draw a blue line somewhere across them. You want to draw the best line that could go through the dots. The simplest technique is the least-square fit. For each dot, go up or down until you reach a blue line: this dashed segment helps you build a little square (pale blue). So you get four squares, whose total area form a certain amount. This amount depends on the blue line position, and can be easily computed from the points $(x_i,y_i)$, the slope $a$ and the intercept $b$. This amount can be minimized for a certain slope $a_{\textrm{min}}$ and intercept $b_{\textrm{min}}$.

This is exactly the (least-square) linear regression.

Sum  of squares