I am trying to solve the following question:
The number of goals scored by a certain football team was recorded for each of 100 matches and the results are summarised in the following table. $$\begin{array}{c|c|c|c|c|c|c|c|}& \text{Number of goals} & 0 & 1 & 2 & 3 & 4 &5 & 6 \text{ or more} \\ \hline & \text{frequency} & 12 & 16 &31 &25 & 13 & 3 & 0 \end{array} $$ Fit a Poisson distribution, and test its goodness of fit at the $5\%$ significance level.
So I did find the Poisson distribution, and here are the expected values: $$\begin{array}{c|c|c|c|c|c|c|c|}& \text{Number of goals} & 0 & 1 & 2 & 3 & 4 &5 & 6 \text{ or more} \\ \hline & \text{frequency} & 11.08 & 24.38 &26.81 &19.66 & 10.82 & 4.76 & 2.49 \end{array} $$ Now the last $2$ cells have expected value less than $5$ so I add them together to get for the expected: $$\begin{array}{c|c|c|c|c|c|c|c|}& \text{Number of goals} & 0 & 1 & 2 & 3 & 4 & 5 \text{ or more} \\ \hline & \text{frequency} & 11.08 & 24.38 &26.81 &19.66 & 10.82 & 7.25\end{array} $$ and observed becomes: $$\begin{array}{c|c|c|c|c|c|c|c|}& \text{Number of goals} & 0 & 1 & 2 & 3 & 4 &5 \text{ or more} \\ \hline & \text{frequency} & 12 & 16 &31 &25 & 13 & 3\end{array} $$ Then I go and calculate my $\chi^2$ statistic to be $7.99$ and then I want to compare it to $\chi_{d,0.95}$ where $d$ is the degrees of freedom. I thought since I have 6 cells then $d$ must surely be $5$ but according to the mark scheme $d$ should be $4$ how is that so?
It seems you estimated the Poisson mean by $\hat \lambda = \bar X = 2.2,$ where $\bar X$ is the average count. @DavidQuinn is correct that this estimation causes the degrees of freedom to decrease by 1. The chi-squared distribution only approximately describes the distribution of your chi-squared statistic $Q$. The approximation is pretty good, if none of the expected counts are too small. (Note: $Q$ has a discrete distribution, based on the integer counts.)
Below is a simulation in R, in which the actual Poisson mean is $\lambda = 2$ in 10,000 'seasons' of 100 matches each, we simulate the number of goals according to $Pois(2)$, estimate $\lambda$ for each season, and find the chi-squared goodness-of-fit statistic $Q$. The code is shown below. (Inside the loop, there are more elegant ways to find the observed counts for the six 'categories' $0, 1, 2, 3, 4,$ and $\ge 5$, but they may not be as easy to understand.)
We make a histogram of the 10,000 simulated values of $Q$ and compare it with the correct PDF of $Chisq(df=4)$ (solid curve) and the incorrect PDF of $Chisq(df=5).$
Note: Here our questions is whether the number of goals follow some Poisson distribution (mean to be estimated), $not$ whether the goals fit a specific Poisson distribution (mean known). In the latter case, with six categories, $df = 5$ would be correct.