In data science when we study linear regression from a mathematical point of view we often have the following hypothesis:
We have points $y_i = x_i\beta + \epsilon_i$ with $\epsilon_i$ being a random variable. I don't understand why we talk about a random variable, a probability when everything is deterministic.
If we observe the data $\{(y_i, x_i)\}$ then by taking $\beta = \min_\beta \frac{1}{n}\lVert y_i - x_i\beta \rVert^2$, we can write each of the data as: $y_i = x_i\beta + \epsilon_i$ and thus $\epsilon_i$ is completely deterministic, it is not a random variable. $\epsilon_i$ is simply equal to : $y_i-x_i\beta$.
The same goes for the different hypotheses of application of a linear regression, I don't understand why we need them (average of zero errors, ...). As long as: $\beta = \min_\beta \frac{1}{n}\lVert y_i - x_i\beta \rVert^2$ is well defined we can apply the model on our data without problems
The way I think of it is that once we have a data set those values are crystalized and constant. We then attempt to fit a linear regression model on the data set. Since the data points are constant any residual from the linear model is fixed and deterministic.
However, we frequently use models to predict things that we haven't seen. If we got a new data set, the data points are very likely going to be different than the ones in the first data set. They'll have different residuals or distances from the linear model. These residuals are of course $\epsilon_i$.
Since we are not God or someone who can just look into any point in the future to figure out what the particular $X_i$ and $Y_i$ coming our way is, we'll add some amount of uncertainty that our model is not accounting for which is $\epsilon_i$ an unknown, or random, value.