Why in most tutorials teachers skip it ? What is a purpose of this variable and why we use it to derive slope coefficient $a = \frac{\sum(y-\overline{y})(x-\overline{x})}{\sum{(x-\overline{x})^2}}$ (using derivative of $\sum{\epsilon_i^2}$ ), while in most tutorials teachers skip it (i.e. $y = b_1*x + b_0$), but use the derived formula for $a$ ? Why $\epsilon_i$ is a random variable chosen from a normal probability distribution ? Shed some light please.
What does mathematically and logically represents the variable $\epsilon_i$ in linear regression formula $y = b_1*x + b_0 + \epsilon$?
57 Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail AtThere are 2 best solutions below
On
Error or noise. Namely, you can view it as your data comes from some "model", i.e., $$ y_i = \beta_0 + \beta_1 x_i, $$ in this case you will have some $n$ points on a straight line $y = \beta_0 + \beta_1 x$, hence you will need only $2$ data points to find $\beta_0$ and $\beta_1$. Assume now that every time you observe a realization (point) $(x_i, y_i = \beta_0 + \beta_1x_i)$ the "model" gets some "shock" and gives $(x_i, y_i + e_i =\beta_0 + \beta_1x_i + e_i$). In this case your data won't be on a straight line, rather scattered around some trend line. To account for this dispersion it is convenient to view this "shock" as a random variable with $\mathbb{E}[\epsilon_i|X]=0$ and finite (conditional) variance. These assumptions (plus i.i.d) allows you to derive the best linear unbiased estimators of the coefficients. Normal distribution is unnecessary assumption. It will give you very interesting and useful results beyond the "BLUE" property, however in practical settings this assumption may be too restrictive.
You can have 2 view of this (Which are 2 sides of the same coin):
The nice thing that given the imperfection in the model you can find a noise which behaves the same such that their solution collide.