Calculated Chi-Square probability by hand (using R) seems quite off - did I something wrong?

125 Views Asked by At

I try to understand the Chi-Square distribution with df=4-1.

Therefore I tried to calculate the following (using R): I have a four-sided dice and I threw this dice n=100 times. I get the results (23,28,27,22) instead of (25,25,25,25). This gives me a chi-square value of 1.04:

gemessen <- c(23,28,27,22)
ideal <- c(25,25,25,25)
chiquadrat <- sum((gemessen - ideal)^2 / ideal)

Now I calculated

pchisq(1.04, df = 4 - 1, lower.tail = FALSE)

and got 0.7915744. I know that this is only an approximation and therefore I wanted to calculate it by hand (using multinomial distribution)

for (x1 in 0:100){
  for (x2 in 0:100){
    for (x3 in 0:100){
      x4 <- 100-x1-x2-x3
      gg <- c(x1,x2,x3,x4)
      if (x4>= 0 && sum((gg-ideal)^2/ideal)>=1.04)
      {vonhand <- vonhand + factorial(100)*0.25^100/(factorial(x1)*factorial(x2)*factorial(x3)*factorial(x4))
      }}}}

This gives me 0.8107929. Is this correct? The error between 0.8107929 and 0.7915744 seems quite big.

Furthermore I tried to support my calculation with brut force and the following code yielded 0.810845 which would confirm my calculation of 0.8107929.

samples <- rmultinom(10000000, size = 100, prob = c(0.25,0.25,0.25,0.25))
ideal <- c(25,25,25,25)
chiquadrats <- colSums((samples - ideal)^2 / ideal)
#sort(chiquadrats)
#plot(sort(chiquadrats))
sum(chiquadrats>=chiquadrat)/10000000
1

There are 1 best solutions below

0
On BEST ANSWER

A very curious thing happens in your question that is worth thinking about. Let us suppose that the null hypothesis is true, that is to say, the dice is fair and in $n = 100$ rolls, the outcome is multinomial with parameters $p_1 = p_2 = p_3 = p_4 = 1/4$ as you have stated. You observed the outcome $$\boldsymbol X = (23, 28, 27, 22)$$ with probability $$\Pr[\boldsymbol X = (23, 28, 27, 22)] = \frac{100!}{23! 28! 27! 22!} \frac{1}{4^{100}} \approx 0.000602032. $$ An exact $p$-value may be calculated by tabulating all outcomes that are no more probable than the one observed: Among the $\binom{103}{3} = 176851$ possible outcomes, there are only $273$ outcomes that do not meet this criterion, and up to permutation, are equivalent to one of the following:

$$\{(21, 25, 27, 27), (21, 26, 26, 27), (22, 24, 25, 29), (22, 24, 26, 28), (22, 24, 27, 27), (22, 25, 25, 28), \\ (22, 25, 26, 27), (22, 26, 26, 26), (23, 23, 25, 29), (23, 23, 26, 28), (23, 23, 27, 27), (23, 24, 24, 29), \\ (23, 24, 25, 28), (23, 24, 26, 27), (23, 25, 25, 27), (23, 25, 26, 26), (24, 24, 24, 28), (24, 24, 25, 27), \\ (24, 24, 26, 26), (24, 25, 25, 26), (25, 25, 25, 25)\}.$$ Thus, based on this computation, the exact $p$-value is $$p = \frac{2498975451220524516436384480916627030350374088772338040021}{3138550867693340381917894711603833208051177722232017256448} \approx 0.79622.$$ But your calculation, which is based on whether the test statistic exceeds the value of the statistic computed from the observed data, is also correct! This means that there is a difference in the set of outcomes that meet the above criterion, versus those outcomes that meet your criterion. And up to permutation, it is $$(22,24,25,29).$$ Thus there are $4! = 24$ such aberrant outcomes; its probability is strictly greater than the one observed: $$\Pr[\boldsymbol X = (22,24,25,29)] \approx 0.000607222 > \Pr[\boldsymbol X = (23,28,27,22)]$$ but the value of the test statistic is $$\sum_{i=1}^4 \frac{(x_i - 25)^2}{25} = \frac{26}{25}$$ which is equal to the observed test statistic and therefore included in your $p$-value computation. This is the only other outcome (again, up to permutation) that has the same value for the test statistic.

One final word about using the chi-squared test here. Because the true distribution is multinomial and therefore discrete, the $p$-value calculated from a chi-squared statistic is based on a normal approximation. Thus, it should not be expected to be exact. That said, I would consider the tabulation of outcomes based on their exact multinomial probability as being the definitive $p$-value, rather than using the value of the test statistic to decide whether to include those outcomes. That's why your simulation doesn't really do anything except confirm that your previous calculation is correct; in other words, the simulation cannot answer the question of whether your method of determining which outcomes are "at least as extreme as the the one observed" is correct.