I came across the following points for the random error term $e$ in simple linear
regression model ( $Y = A + B*X + e$ )
I am not sure which one of them are true :
- The expected value of $e$ is zero
- The variance of $e$ is the same for all values of the independent variable $X$
- The error term is normally distributed
- The values of the error term are independent of one another
My Thoughts
On some posts, I have seen that the error term is supposed to be normally distributed
and the expected value of $e$ is zero and then some definitions don't mention anything about
normal distribution. So I am a bit confused there. Also want to know the opinion on
other points
You'll see hypothesis coming and going in different definitions because authors will bring them up depending on context and what exactly they are trying to show you. For example, if you're studying ordinary least squares, you don't need to assume (2.) nor normality of the error term to prove unbiasidness of the estimator. It is sufficient to assume (1.) and (4.). If, on the other hand, you want to also prove efficiency (Gauss-Markov theorem), you need to also assume (2.). Note that the normality assumption isn't required for either unbiasidness nor efficiency, that's the reason it's so common to see it being omitted. When people require normality then? well, when they want to do inference (for example: hypothesis testing). it's very hard to make strong inference without knowing the distribution of the error term. Since normality has some neat properties and is very easy to account for, people often go for it.