Using R for lack-of-fit test

6k Views Asked by At

I learnt how to use R to perform an F-test for a lack of fit of a regression model, where $H_0$: "there is no lack of fit in the regression model".

$$F_{LOF} = \frac{MSLF}{MSPE} = \frac{SSLF(\text{model}) / df_1}{SSPE/df_2}$$ where $df_1$ is the degrees of freedom for SSLF (lack-of-fit sum of squares) and $df_2$ is the degrees of freedom for SSPE (sum of squares due to pure error).

In R, the F-test (say for a model with 2 predictors) can be calculated with

anova(lm(y~x1+x2), lm(y~factor(x1)*factor(x2)))

Example output:

Model 1: y ~ x1 + x2
Model 2: y ~ factor(x1) * factor(x2)
  Res.Df    RSS Df Sum of Sq      F Pr(>F)
1     19 18.122                           
2     11 12.456  8    5.6658 0.6254 0.7419

F-statistic: 0.6254 with a p-value of 0.7419.

Since the p-value is greater than 0.05, we do not reject $H_0$ that there is no lack of fit. Therefore the model is adequate.

What I want to know is why use 2 models and why use the command factor(x1)*factor(x2)? Apparently, 12.456 from Model 2, is magically the SSPE for Model 1.

Why?