Residual Normality Test

21 Views Asked by At

I did linear regression for 3 variables from 60 data. So, my equation is: $$Y = \alpha_0 + \alpha_1X_1 + \alpha_2X_2 + \alpha_3X_3.$$ One of the test is residual-normality test. Can I just use Central-Limit-Theorem to justify the normality for residual? Since I have 60 data, larger than 30.

Thanks in advance.

1

There are 1 best solutions below

0
On BEST ANSWER

No. That's not what the central limit theorem is for.

The test for the normality of residuals is about the question of whether the assumptions underlying the linear regression model are valid for the data that was observed, or at least, that the data doesn't obviously violate the assumptions.

The basic idea is to perform the regression on your data to calculate the model parameters, and then calculate the corresponding residual $$e_i = y_i - \hat y_i$$ for each data point in your set. Then you do a test to see if the $e_i$ are normally distributed, since in the model specification, $$Y_i = \alpha_0 + \alpha_1 X_{1i} + \alpha_2 X_{2i} + \alpha_3 X_{3i} + \epsilon_i$$ where $\epsilon_i \sim \operatorname{Normal}(0,\sigma^2)$ is the error term. So $e_i$, which is the observed value of the $i^{\rm th}$ response minus the expected (from the model fit) value of the response based on the $i^{\rm th}$ observations of the predictors $X_1, X_2, X_3$, should be normally distributed.

If the test finds that the residuals are not normally distributed (e.g., they could have a very strong skew), then the model assumption that the errors are normally distributed is most likely incorrect and the resulting parameter estimates may be invalid. This has nothing to do with the number of observations in your data. If you have many observations, you will have greater power to detect deviations from normality of the residuals.

That said, "tests for normality" really should be called "tests for non-normality" because such tests are structured to reject the null if the data are not normal, but cannot statistically verify that the data do come from a normal distribution. They assume the data are normally distributed, and calculate a test statistic under that assumption.

The central limit theorem is essentially a statement about the sample mean of independent and identically distributed observations from a distribution. You're not doing that here.