What does mathematically and logically represents the variable $\epsilon_i$ in linear regression formula $y = b_1*x + b_0 + \epsilon$?

57 Views Asked by At

Why in most tutorials teachers skip it ? What is a purpose of this variable and why we use it to derive slope coefficient $a = \frac{\sum(y-\overline{y})(x-\overline{x})}{\sum{(x-\overline{x})^2}}$ (using derivative of $\sum{\epsilon_i^2}$ ), while in most tutorials teachers skip it (i.e. $y = b_1*x + b_0$), but use the derived formula for $a$ ? Why $\epsilon_i$ is a random variable chosen from a normal probability distribution ? Shed some light please.

2

There are 2 best solutions below

0
On

You can have 2 view of this (Which are 2 sides of the same coin):

  • The model isn't accurate and this represent the Unknowns in the Model (Model imperfections).
  • Added noise to the measurement.

The nice thing that given the imperfection in the model you can find a noise which behaves the same such that their solution collide.

0
On

Error or noise. Namely, you can view it as your data comes from some "model", i.e., $$ y_i = \beta_0 + \beta_1 x_i, $$ in this case you will have some $n$ points on a straight line $y = \beta_0 + \beta_1 x$, hence you will need only $2$ data points to find $\beta_0$ and $\beta_1$. Assume now that every time you observe a realization (point) $(x_i, y_i = \beta_0 + \beta_1x_i)$ the "model" gets some "shock" and gives $(x_i, y_i + e_i =\beta_0 + \beta_1x_i + e_i$). In this case your data won't be on a straight line, rather scattered around some trend line. To account for this dispersion it is convenient to view this "shock" as a random variable with $\mathbb{E}[\epsilon_i|X]=0$ and finite (conditional) variance. These assumptions (plus i.i.d) allows you to derive the best linear unbiased estimators of the coefficients. Normal distribution is unnecessary assumption. It will give you very interesting and useful results beyond the "BLUE" property, however in practical settings this assumption may be too restrictive.