GOF measures in a model

114 Views Asked by At

When we validate a statistical model, should we only consider the traditional Goodness-Of-Fit measures? Are there other ways in which the model should be validated? Should the validation be application specific or purely statistical? Do we have any examples that we can share with the class?

1

There are 1 best solutions below

0
On

Your question is not very specific. Here are three very different situations in which goodness-of-fit plays a central role. Maybe one or more of them are suitable for the level of your class.

One-way ANOVA with three levels of the factor. The model is

$$Y_{ij} = \mu + \alpha_i + e_{ij},$$

where the group effects $\alpha_i$ have $\sum_i \alpha_i = 0$ and $e_{ij} \stackrel{iid}{\sim} \mathsf{Norm}(0, \sigma),$ so that all groups have the same population SD $\sigma.$

It is customary to check for equal variances and normality. The check for normality must be done on the residuals, or on the three levels separately. (The mixture distribution of the three levels will not be normal unless all group population means are equal.) Analysis from Minitab 17 statistical software.

Data Display 

Scores
  108    93    86    97   117    95    94    96   100   103   102   124    97   107   103
  116   104   106   108   102   115   111   110   130   137   120   117   126   115   107

Group
    1     1     1     1     1     1     1     1     1     1     2     2     2     2     2 
    2     2     2     2     2     3     3     3     3     3     3     3     3     3     3

-

One-way ANOVA: Scores versus Group 

Method

Null hypothesis         All means are equal
Alternative hypothesis  At least one mean is different
Significance level      α = 0.05

Equal variances were assumed for the analysis.


Factor Information

Factor  Levels  Values
Group        3  1, 2, 3


Analysis of Variance

Source  DF  Adj SS   Adj MS  F-Value  P-Value
Group    2    2005  1002.70    13.22    0.000     # Reject Ho: 3 equal means
Error   27    2047    75.83
Total   29    4053


Model Summary

      S    R-sq  R-sq(adj)  R-sq(pred)
8.70802  49.48%     45.74%      37.63%

enter image description here

(a) The normal probability plot of residuals is roughly linear, suggesting approximately normal residuals. (b) Plot of residuals vs. fits shows about the same spread for each group, suggesting equal variances. (c) The histogram of residuals is of limited use for only 30 observations. (d) The plot of residuals in time order shows no trend or 'clumpiness' by group (1-10, 11-20, 21-30), suggesting independence of individual data values; this plot would be useless if data were sorted, which is often the case for textbook displays of data.

Formal (Bartlett) test for equal variances in Groups: No evidence of unequal group variances.

Test for Equal Variances: Scores versus Group 

Method

Null hypothesis         All variances are equal
Alternative hypothesis  At least one variance is different
Significance level      α = 0.05

Tests

                           Test
Method                Statistic  P-Value
Multiple comparisons          —    0.874
Levene                     0.23    0.793

(Levene's test is best for clearly nonnormal data; not relevant here.)

Confession: Repeating the normal probability plot for residuals below, with a formal Anderson-Darling test for normality, we see a P-value < 0.05, indicating some departure from normality. This has to be a Type I Error because the data are fake data generated to be normal.

enter image description here

Fairness of a Die. A die is rolled 600 times with the following results.

Faces:    1   2   3   4   5   6 
Counts: 145 142 128  61  58  66 

Note that faces 1, 2, 3 appeared relatively more often than did 4, 5, 6. Is this evidence of unfairness?

Under the null hypothesis that the die is fair, we expect counts $E = 100$ for each face. Observed counts are $X = (145, 142, 128, 61, 58, 66).$

The chi-squared goodness-of-fit (GOF) test has test statistic

$$Q = \sum_{i=1}^6 \frac{(X_i - E)^2}{E} = 90.14.$$

Under the null hypothesis, $Q \stackrel{aprx}{\sim} \mathsf{Chisq}(df=5).$ The P-value of the test (essentially 0) is the probability in the right-hand tail of this distribution beyond 90.14. So there is strong evidence the die is unfair. This is not surprising because the data were simulated for a die for which faces 1, 2, 3 are twice as likely (probability 2/9 each) as faces 4, 5, 6 (1/9 each).

By contrast, if we had rolled the die only 60 times with face counts $X = (14,14,13,6,6,7),$ proportionately about the same as before, we would have $Q = 8.2,$ P-value 0.146 (> 0.05), and so no solid evidence of unfairness. There is simply not enough information in 60 rolls of the die to detect its considerable degree of unfairness.

Note that respective bar charts of the data for 600 and 60 rolls would look almost identical. So bar charts alone are hardly a guide to goodness-of-fit; formal GOF tests are required before drawing conclusions from bar charts.

Item Matching Problem. There are 12 letters and 12 properly matching envelopes on a desk. A weary administrative assistant thinks these are left-over mass mail and stuffs letters into envelopes at random. If $X$ is the number of envelopes randomly put into their proper envelopes, what are $E(X),$ and $SD(X)?$ It is easy to show that $E(X) = 1$ and possible to show that $SD(X) = 1.$ The equality of $E(X) = V(X) = 1$ raises the possibility that $X \stackrel {aprx}{\sim} \mathsf{Pois}(\lambda = 1).$ [Poisson is one of the few distribution families for which mean and variance are numerically equal.]

The fit to Poisson cannot be perfect: it is clear that $P(X = 11) = 0$ and that $P(X >12) = 0.$ But for some practical purposes $\mathsf{Pois}(1)$ is a useful approximate fit to the random envelope-matching model.

The R code below simulates a million of these 12-letter experiments and makes a histogram of the approximate probabilities for each value of $X.$ The dots show exact Poisson probabilities.

x = replicate( 10^6, sum(sample(1:12,12)==1:12) )
hist(x, prob=T, br=(0:13)-.5, col="skyblue2", main="Simulated Dist'n of Matches")
points(0:12, dpois(0:12, 1), pch=19, col="red")
mean(x); sd(x)
## 0.998496   # aprx E(X) = 1
## 0.9980404  # aprx V(X) = 1

enter image description here