I am trying to proof that large poisson data follows gaussian distribution using Goodness–of–fit tests. I am stuck at calculating the degree of freedom.
My question is, since the data are from poisson distribution then I calculated the standard deviation as the square root of the mean. and for the normal distribution I need both mean and standard deviation. but what is the correct value for the degree of freedom here?
I know that for poisson distribution its k-2 (since we estimate the mean) and k is the data points. and for normal distribution its k-3 (since we estimate the mean and the standard deviation). But in my case the standard deviation comes from poisson data and its dependent on the mean. What is the correct degree of freedom in this case, k-2 or k-3 ?
Thanks
A Chi-Squared Goodness-of-Fit Test: Are Poisson Data Nearly Normal?
Poisson Data. Suppose you have $n = 1000$ observations from $\mathsf{POIS}(\lambda=100).$ Then the resulting data should be approximately distributed as $\mathsf{Norm}(\mu=100, \sigma=10).$
The sample mean is about 100 and the sample SD is about 10. So the mean and SD are nearly correct. What about the normal shape?
Chi-squared GOF test. Judging by eye, the histogram seems to have an approximately normal shape. Let's try a chi-squared GOF test to see if the approximation is "good".
Observed Poisson counts: We use six intervals taken from the histogram:
<79.5
,79.5 to 89,5
, ...109.5 to 119.5
, and>119.5
. These intervals have observed frequencies: 24, 132, 344, 345, 133, and 22, respectively.Now I will use R to go through the steps of the chi-squared goodness-of-fit test. You should follow through the computations here and match them with examples and formulas in your text.
Your null hypothesis is that data fit $\mathsf{Norm}(\mu=100, \sigma=10).$ So we need to get probabilities for each of the six intervals according to this distribution. One can get these in R, but you should show how to use printed normal tables to get them.
Expected counts based on null hypothesis: Then under the null hypothesis of a normal fit, the expected counts are as shown below. [In order for the so-called 'chi-squared statistic' to have approximately a chi-squared distribution, all expected counts should exceed 5. That's why I didn't try to use eight histogram intervals instead of six.]
Chi-squared statistic: Then the chi-squared statistic is 2.91. (You should look for this formula in your text.)
Degrees of freedom: With $k = 6$ intervals we have $\nu = k-1 = 5$ degrees of freedom. (We obtained the normal mean $\mu = 100$ and standard deviation $\sigma = 10$ by a theoretical argument, not by estimation from data, so there is no 'penalty' for estimation in finding the degrees of freedom.)
Critical value for test at 5% level: The critical value for a test at the 5% level is $c = 11.071$ from R or from printed tables of chi-squared distributions.
Conclusion from test: Because the chi-squared statistic is smaller than $c,$ we cannot reject the null hypothesis that the Poisson data (except for their discrete integer values) are consistent with a normal distribution.
The figure below shows the density function of the chi-squared distribution with degrees of freedom $\nu = 5.$ The solid vertical line shows the observed value of the chi-squared GOF statistic and the dotted vertical line shows the critical value for a test at the 5% level.
Note: Of course, a Poisson distribution (even with large $\lambda)$ is not exactly normal. Any Poisson distribution is discrete and right-skewed, whereas a normal distribution is continuous and symmetrical. For larger sizes $n$ of the Poisson sample, the GOF test becomes more likely to detect small differences between the Poisson counts and a normal distribution.
If you do you own example, you might want to use $n = 500$ observations, rather than my $n = 1000.$