Can't figure out the correct degrees of freedom for my goodness of fit

158 Views Asked by At

I am trying to solve the following question:

The number of goals scored by a certain football team was recorded for each of 100 matches and the results are summarised in the following table. $$\begin{array}{c|c|c|c|c|c|c|c|}& \text{Number of goals} & 0 & 1 & 2 & 3 & 4 &5 & 6 \text{ or more} \\ \hline & \text{frequency} & 12 & 16 &31 &25 & 13 & 3 & 0 \end{array} $$ Fit a Poisson distribution, and test its goodness of fit at the $5\%$ significance level.

So I did find the Poisson distribution, and here are the expected values: $$\begin{array}{c|c|c|c|c|c|c|c|}& \text{Number of goals} & 0 & 1 & 2 & 3 & 4 &5 & 6 \text{ or more} \\ \hline & \text{frequency} & 11.08 & 24.38 &26.81 &19.66 & 10.82 & 4.76 & 2.49 \end{array} $$ Now the last $2$ cells have expected value less than $5$ so I add them together to get for the expected: $$\begin{array}{c|c|c|c|c|c|c|c|}& \text{Number of goals} & 0 & 1 & 2 & 3 & 4 & 5 \text{ or more} \\ \hline & \text{frequency} & 11.08 & 24.38 &26.81 &19.66 & 10.82 & 7.25\end{array} $$ and observed becomes: $$\begin{array}{c|c|c|c|c|c|c|c|}& \text{Number of goals} & 0 & 1 & 2 & 3 & 4 &5 \text{ or more} \\ \hline & \text{frequency} & 12 & 16 &31 &25 & 13 & 3\end{array} $$ Then I go and calculate my $\chi^2$ statistic to be $7.99$ and then I want to compare it to $\chi_{d,0.95}$ where $d$ is the degrees of freedom. I thought since I have 6 cells then $d$ must surely be $5$ but according to the mark scheme $d$ should be $4$ how is that so?

1

There are 1 best solutions below

0
On BEST ANSWER

It seems you estimated the Poisson mean by $\hat \lambda = \bar X = 2.2,$ where $\bar X$ is the average count. @DavidQuinn is correct that this estimation causes the degrees of freedom to decrease by 1. The chi-squared distribution only approximately describes the distribution of your chi-squared statistic $Q$. The approximation is pretty good, if none of the expected counts are too small. (Note: $Q$ has a discrete distribution, based on the integer counts.)

Below is a simulation in R, in which the actual Poisson mean is $\lambda = 2$ in 10,000 'seasons' of 100 matches each, we simulate the number of goals according to $Pois(2)$, estimate $\lambda$ for each season, and find the chi-squared goodness-of-fit statistic $Q$. The code is shown below. (Inside the loop, there are more elegant ways to find the observed counts for the six 'categories' $0, 1, 2, 3, 4,$ and $\ge 5$, but they may not be as easy to understand.)

 m = 10^4;  q = numeric(m)
 for (i in 1:m) {table
   x = rpois(100, 2);  lam.hat = mean(x)
   exp = dpois(0:5, lam.hat);  exp[6] = 1 - ppois(4, lam.hat)
   exp = exp*100
   obs = numeric(6)
     obs[1] = sum(x==0);  obs[2] = sum(x==1)
     obs[3] = sum(x==2);  obs[4] = sum(x==3)
     obs[5] = sum(x==4);  obs[6] = sum(x>=5)      
   q[i] = sum((obs - exp)^2 / exp)  }
 hist(q, br = 30, prob=T, col="wheat", ylim=c(0,.2), xlab="Chi-sq Statistic",
   main="Simulated Dist'n of Chi-sq Statistic and PDFs of CHISQ(4) and CHISQ(5)")
 curve(dchisq(x, 4), lwd=2, col="blue", n=1001, add=T)
 curve(dchisq(x, 5), lwd=2, col="red", lty="dotted", n=1001, add=T)

We make a histogram of the 10,000 simulated values of $Q$ and compare it with the correct PDF of $Chisq(df=4)$ (solid curve) and the incorrect PDF of $Chisq(df=5).$

enter image description here

Note: Here our questions is whether the number of goals follow some Poisson distribution (mean to be estimated), $not$ whether the goals fit a specific Poisson distribution (mean known). In the latter case, with six categories, $df = 5$ would be correct.