Assumptions on the random error in a linear regression model

878 Views Asked by At

I came across the following points for the random error term $e$ in simple linear
regression model ( $Y = A + B*X + e$ )
I am not sure which one of them are true :

  1. The expected value of $e$ is zero
  2. The variance of $e$ is the same for all values of the independent variable $X$
  3. The error term is normally distributed
  4. The values of the error term are independent of one another

My Thoughts
On some posts, I have seen that the error term is supposed to be normally distributed
and the expected value of $e$ is zero and then some definitions don't mention anything about
normal distribution. So I am a bit confused there. Also want to know the opinion on
other points
2

There are 2 best solutions below

0
On

You'll see hypothesis coming and going in different definitions because authors will bring them up depending on context and what exactly they are trying to show you. For example, if you're studying ordinary least squares, you don't need to assume (2.) nor normality of the error term to prove unbiasidness of the estimator. It is sufficient to assume (1.) and (4.). If, on the other hand, you want to also prove efficiency (Gauss-Markov theorem), you need to also assume (2.). Note that the normality assumption isn't required for either unbiasidness nor efficiency, that's the reason it's so common to see it being omitted. When people require normality then? well, when they want to do inference (for example: hypothesis testing). it's very hard to make strong inference without knowing the distribution of the error term. Since normality has some neat properties and is very easy to account for, people often go for it.

0
On

Basically, any of these assumptions can be relaxed or corrected.

  1. If $E[e] = a$, where $a\neq 0$, then you can absorb $a$ into the intercept to get $E[e]=0$.
  2. Non-constant variance can be corrected using the weighted least squares instead of the ordinary least squares.
  3. Normal distribution is essential only for statistical inference in small samples that is based on normal assumptions.
  4. Independent error terms - for the OLS it is suffice if the errors merely uncorrelated, and if the normality assumption holds, then the independence stems from zero correlation. Otherwise, you can correct it using generalized least squares.

To sum it up, the exact set of assumptions are context related. Usually, in introductory statistics, it is assumed that $e_1,...,e_n$ are i.i.d $N(0, \sigma^2)$, namely all four points are true. While in introductory Econometrics textbooks, it is usually required only that $e_i$ are uncorrelated, with zero mean and constant finite variance, namely, $1$, $2$, and sort-of $4$.