Linear regression models have to follow 2 key assumptions: (1)error terms are iid and each follows normal distribution with zero mean and variance sigma^2 (2)the matrix X has to be non-random and full column rank. However, I am confused why can we assume the error terms are normally distributed? Also, does the second assumption implies that all the explanatory variables are independent to others? Thanks
2026-04-06 01:37:51.1775439471
On
Linear regression model assumptions
1.8k Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail At
2
There are 2 best solutions below
10
On
- Linear regression does not assume any distribution on the errors whatsoever. they can be drawn from any distribution i.i.d or not.
- X has to be deterministic and of full rank - Yes
- Also, does the second assumption implies that all the explanatory variables are independent to others? - please elaborate as I don't understand the question
I write the assumptions out using the acronym LINE to make the assumptions simple and easy to remember.
While it makes sense for $X$ to be of full rank, this does not necessarily need to be the case. There are numerous benefits to $X$ being of full rank and it allows for maximum interpretability. However, one can conduct inference on parameters without $X$ being of full rank. There are also methods (i.e. PCA) that are designed to take non-independent IVs and project them such that they will be independent in your analysis.
To your question above, we can assume that the error terms are iid $Normal(0,\sigma^2)$, but only if this assumption makes sense. If you know from the subject material or from your data that the assumptions of independence, Normality, or equality of variances are violated, then perhaps a linear regression model is not appropriate. (While not encapsulated in your question, the linearity assumption is also very important.) In this case, I would suggest looking into ways to transform your data to ensure the conditions are met or research different types of models (i.e. generalized linear models) that are designed to account for data that do not follow the four LINE assumptions mentioned above.