What does mathematically and logically represents the variable $\epsilon_i$ in linear regression formula $y = b_1*x + b_0 + \epsilon$?

61 Views Asked by Bumbble Comm At 10 May 2026 - 4:05

Why in most tutorials teachers skip it ? What is a purpose of this variable and why we use it to derive slope coefficient $a = \frac{\sum(y-\overline{y})(x-\overline{x})}{\sum{(x-\overline{x})^2}}$ (using derivative of $\sum{\epsilon_i^2}$ ), while in most tutorials teachers skip it (i.e. $y = b_1*x + b_0$), but use the derived formula for $a$ ? Why $\epsilon_i$ is a random variable chosen from a normal probability distribution ? Shed some light please.

Original Q&A

There are 2 best solutions below

Bumbble Comm On 16 Mar 2018 - 3:47

You can have 2 view of this (Which are 2 sides of the same coin):

The model isn't accurate and this represent the Unknowns in the Model (Model imperfections).
Added noise to the measurement.

The nice thing that given the imperfection in the model you can find a noise which behaves the same such that their solution collide.

Bumbble Comm On 29 Mar 2018 - 10:09

Error or noise. Namely, you can view it as your data comes from some "model", i.e., $$ y_i = \beta_0 + \beta_1 x_i, $$ in this case you will have some $n$ points on a straight line $y = \beta_0 + \beta_1 x$, hence you will need only $2$ data points to find $\beta_0$ and $\beta_1$. Assume now that every time you observe a realization (point) $(x_i, y_i = \beta_0 + \beta_1x_i)$ the "model" gets some "shock" and gives $(x_i, y_i + e_i =\beta_0 + \beta_1x_i + e_i$). In this case your data won't be on a straight line, rather scattered around some trend line. To account for this dispersion it is convenient to view this "shock" as a random variable with $\mathbb{E}[\epsilon_i|X]=0$ and finite (conditional) variance. These assumptions (plus i.i.d) allows you to derive the best linear unbiased estimators of the coefficients. Normal distribution is unnecessary assumption. It will give you very interesting and useful results beyond the "BLUE" property, however in practical settings this assumption may be too restrictive.

What does mathematically and logically represents the variable $\epsilon_i$ in linear regression formula $y = b_1*x + b_0 + \epsilon$?

There are 2 best solutions below

Related Questions in LINEAR-ALGEBRA

Related Questions in STATISTICS

Related Questions in LEAST-SQUARES

Related Questions in LINEAR-REGRESSION

Trending Questions

Popular # Hahtags

Popular Questions