Why errors are random in linear regression

87 Views Asked by Bumbble Comm At 28 Mar 2026 - 8:17

In data science when we study linear regression from a mathematical point of view we often have the following hypothesis:

We have points $y_i = x_i\beta + \epsilon_i$ with $\epsilon_i$ being a random variable. I don't understand why we talk about a random variable, a probability when everything is deterministic.

If we observe the data $\{(y_i, x_i)\}$ then by taking $\beta = \min_\beta \frac{1}{n}\lVert y_i - x_i\beta \rVert^2$, we can write each of the data as: $y_i = x_i\beta + \epsilon_i$ and thus $\epsilon_i$ is completely deterministic, it is not a random variable. $\epsilon_i$ is simply equal to : $y_i-x_i\beta$.

The same goes for the different hypotheses of application of a linear regression, I don't understand why we need them (average of zero errors, ...). As long as: $\beta = \min_\beta \frac{1}{n}\lVert y_i - x_i\beta \rVert^2$ is well defined we can apply the model on our data without problems

Original Q&A

There are 1 best solutions below

Bumbble Comm On 05 Mar 2023 - 7:00 BEST ANSWER

The way I think of it is that once we have a data set those values are crystalized and constant. We then attempt to fit a linear regression model on the data set. Since the data points are constant any residual from the linear model is fixed and deterministic.

However, we frequently use models to predict things that we haven't seen. If we got a new data set, the data points are very likely going to be different than the ones in the first data set. They'll have different residuals or distances from the linear model. These residuals are of course $\epsilon_i$.

Since we are not God or someone who can just look into any point in the future to figure out what the particular $X_i$ and $Y_i$ coming our way is, we'll add some amount of uncertainty that our model is not accounting for which is $\epsilon_i$ an unknown, or random, value.

Why errors are random in linear regression

There are 1 best solutions below

Related Questions in PROBABILITY

Related Questions in STATISTICS

Related Questions in MACHINE-LEARNING

Related Questions in LINEAR-REGRESSION

Trending Questions

Popular # Hahtags

Popular Questions