Residuals of regression model

493 Views Asked by At

Let's suppose I do a regression between earnings and age (and suppose I do not know the distribution of earnings). Would it be possible for the residuals to be normally distributed?

I am thinking it would not be possible since earnings only takes on positive values and since the support of the normal is from $-\infty$ to $\infty$, it would not be normal. However, since residuals are errors, they can be both positive and negative, so I am starting to question my hypothesis here.

Any help would be great on whether or not it is possible for residuals to be normal for the scenario I described.

2

There are 2 best solutions below

2
On BEST ANSWER

If earnings are always positive then no, the residuals cannot be normally distributed, even though many may be negative: the magnitude of the negative residuals are bounded by the highest predicted earnings on the regression line.

That may not be the major issue: more important might be issues such as the skewness of earnings distributions at any age, or a non-linear relationship between earnings and age .

0
On

The normality assumption may be good (or not), even tough you should expect some skewness due to the non negativity of wages. The only way to make sure that your assumption of normally distributed error terms is good, is to test it. To do that, I have some suggestions (as far as my very weak statistics knowledge reaches).

As a first stage plot your data in several complementary ways, such as in box plots and histograms. If your assumption of normally distributed data is bad, it will probably show up at this stage. To complement the above diagrams you can also do a normal qq-plot. (See for example wikipedia http://en.wikipedia.org/wiki/Normal_probability_plot).

I do not know if you are familiar with hypothesis testing? As a second stage you can try to perform some hypothesis testing, for example the Shapiro Wilk test.

Thirdly, if your hypothesis of normally distributed error terms seems bad, try to identify if there is an obvious source causing this, for example outliers. (See http://en.wikipedia.org/wiki/Outlier#Identifying_outliers).

Lastly, there is a lot that can be done to test your normality assumption (Surprise).There are certainly other aspects of the model above that you also need to consider. If you are really interested in these matters I would recommend you to buy an introductory book on econometric analysis.

Note also: The normality assumption of your error terms is only important if you want to perform hypothesis testing on the parameters of your model. Look at the wikipedia-page on regression analysis to see which underlying assumptions that are made when performing a regression.

I am in no way claiming that the above procedure is the best/only way to proceed.