Where does the Normal Distribution of Error terms come from, and why?

528 Views Asked by At

In a Simple or Multiple Linear Regression Model, we are given the assumption that the error terms, $\epsilon$, are distributed like so:

$ε ∼ N(0, I_nσ^2).$

Can anybody explain where the mathematical logic comes from to derive this assumption, or what the intuition is behind it?

2

There are 2 best solutions below

2
On

Some of the reasons for this choice, I would say, include

  1. Intuitively reasonable: the Normal distribution is very nice, in that it is mean zero, symmetric (i.e. there is no bias direction in the error), has a memorable bell-shape, etc. All these properties make it a very "plausible" assumption for how errors would be distributed.

  2. Mathematically convenient: this is essentially the backbone for a lot of statistical inference, including hypothesis tests, models, etc. For example, this allows you to deduce the distributions of different test statistics, sums, etc.

Of course, it is not the only possibility; after all, it is only an assumption. For example, generalisations such as GLMs allow for other error distributions of the response variable.

0
On

It comes mainly from the requirements of the classical statistical theory. For example, in econometrics you usually assume only uncorrelated errors with constant variance and zero expectation. These assumptions are suffice to derive the best linear unbiased estimator (Gauss-Markov theorem). However, this is not enough for the likelihood-based statistical inference that relies on a specified probability distribution. The most straightforward distribution that satisfies the aforementioned conditions is multivariate normal. In such a case, the uncorrelated error terms become independent and thus the estimation and inference becomes very simple. Before the computer era, mathematical simplicity was crucial, since non-analytical derivations were very hard and even impossible, hence closed-form solutions were much desirable.

In practice, normality assumed merely as approximation, if assumed at all, and much of the inference relies on large-sample theory, i.e., the asymptotic distributions of the estimators. However, note that at least one assumption is really essential for the very analysis. The finite variance assumption is crucial, otherwise your estimators (either OLS or ML) will not converge to finite values. The zero expectation assumption should not bother you, since it is mere technical. If you have non-zero constant expectation you can add it to the intercept term, if it depends on the explanatory variable, you should change your model (conditional expectation of $Y$ given $X$). In both cases, you can obtain the zero expectation after a proper modification. The uncorrelated errors assumption can be dropped by using GLS or GLM.

To sum it up, normal distribution is straightforward model to work with that is usually plausible to assume where linear regression models are applicable. However, you should always bear in mind that as more assumptions you have, the more mistakes you prone to do. Thus, you have various tests of model assumptions and model-misspecifcation, that checks the validity of your assumptions and quantifies the cost of error.