I have two questions on the $\chi^2$ statistic and $\chi^2$ distributions.
I think I understand the $\chi^2$ test, in that for a given p-value, one can find a cut-off $\chi^2$ value for a given k degrees of freedom. This cutoff concludes whether an observed hypothesis is statistically significant (by the magnitude of the p-value) from a null hypothesis. In general, the cutoff $\chi^2$ value is looked up in a table for a given p-value and k. However, this just comes from the CDF for the $\chi^2$ distribution, correct? Can I just use the CDF and calculate the p-value directly, based on my observed $\chi^2$ ? Just by using Mathematica, or something similar, and using CDF[x, k]?
In textbooks, I've only seen the $\chi^2$ test version method with a table lookup, but is this due to the difficulty of calculating the CDF directly in an intro course, or is there something wrong about assigning a p-value to an observed $\chi^2$ , instead of a pass-fail cutoff and binary significance?
Secondly, assuming some of the ideas above, how does the binning of observed data affect the $\chi^2$ and its p-value? For example, if I have 10 data points, each an independent observation. Say, cars observed driving past, per day. Also, assume the observed number is large enough to approximate Gaussian distribution instead of Poisson for each point. For simplicity, my null hypothesis is a constant fit over all 10 days. Number of degrees of freedom = k = 10 data points - 1 parameter in fit = 9. Fit to the constant, minimized $\chi^2$ can be compared to a cutoff $\chi^2$ for k = 9 and some p-value (or p-value calculated directly if I wasn't wrong in the first paragraph). But with the same data, I could bin by 12 hours. So k = 20 - 1 = 19, and a different p-value is found. Or bin by 48 hours, so k = 5 - 1 = 4? All with the same data and arbitrary binning.
Is this type of data incompatible with a $\chi^2$ test? Does binning just affect the fit in that way? Or am I missing a fundamental concept?
First question. You are right about being able to use software instead of tables of the chi-squared distribution. For example, if df = 9 and the chi-squared statistic is 20.16, you could look at a chi-squared table to see that $20.16 > 19.02,$ where 19.02 cuts area 0.025 from the upper tail of $Chisq(df = 9)$. You you would reject at that 2.5% level.
If you wanted a P-value, you could use software to find the probability of the chi-squared statistic being greater than 20.16. In R software this is computed as follows, where
pchisq
stands for the CDF of a chi-squared distribution:Thus the P-value (probability of a value more extreme than 20.16) is about 0.017. Some software will give you the P-value automatically.
Second question. As far as binning is concerned, you are right that in some instances there are alternate possible ways of binning. You do not want so many bins that the expected counts in each bin get less than about 5, otherwise the approximation of the chi-squared statistic to the chi-squared distribution is not good. Given that restriction, it is usually better to use more bins rather than fewer.
Also notice that the df of the chi-squared distribution depends directly on the number of $bins$ used, not on the overall number of $events$ counted. (I do not understand what you say about 'approximately Gaussian' in this context.)
Examples: Here is an example in which we simulate 60 rolls of a fair die, so that we expect 10 instances of each face. The observed numbers of each face are tabulated. Finally, a chi-squared test that the die is fair has a chi-squared goodness-of-fit statistic of 3.0, and a P-value of 70% (consistent with a fair die).
In the test, the default is that faces have equal probabilities unless some other probability vector is specified. The test procedure
chisq.test
finds the P-value as follows (and rounds):In our second example, we simulate 600 rolls of a die that is heavily biased in favor of faces 4, 5, and 6 (see
prob
vector). Here the null hypothesis that the die is fair is soundly rejected with an extremely small P-value.