Chi-squared test is failing to invalidate null hypothesis for some reason

75 Views Asked by At

I'm mostly a coder so apologies in advance if my math notation / explanation isn't ideal. It's been a while since college and statistics also wasn't my strong suite even back then. Would appreciate any help.

Let $(X_1,...X_n)$ be random samples from a function such that $X_n\in [0, R)$ where $R \in Z$ (i.e. the function returns a whole number in [0, R)). The probability of hitting any specific value is exponentially decaying such that $p(x) = 2^{R - x}$ (i.e. R - 1 has 0.5 probability, R - 2 has 0.25, etc). The function being randomly sampled is $f(x) = max\{log2(x),-range\} + range$

When I do a chi-squared test for significance on some randomly generated samples, I'm getting strange results. For example, here are some random samples I captured with $R = 20, n = 2000$:

19 expected 1000 actual 1010
18 expected 500 actual 518
17 expected 250 actual 231
16 expected 125 actual 119
15 expected 62.5 actual 48
14 expected 31.25 actual 35
13 expected 15.625 actual 14
12 expected 7.8125 actual 11
11 expected 3.90625 actual 8
10 expected 1.953125 actual 3
9 expected 0.9765625 actual 2
8 expected 0.48828125 actual 1
7 expected 0.244140625 actual 0
6 expected 0.1220703125 actual 0
5 expected 0.06103515625 actual 0
4 expected 0.030517578125 actual 0
3 expected 0.0152587890625 actual 0
2 expected 0.00762939453125 actual 0
1 expected 0.003814697265625 actual 0
0 expected 0.0019073486328125 actual 0

For the test statistic I'm simply doing $(1000 - 1010)^2 / 1000 + (500 - 518)^2 / 500 + ... 0.0019073486328125$. The resultant test statistic 14.710092651367187 and with a degree of freedom parameter of 19, the chi-squared test CDF returns returns a significance of 0.26 which is not even close to where I would expect to invalidate the null hypothesis that the sampled distribution isn't the same as I expect (npm library I'm using).

Am I plugging in the values incorrectly? Am I computing the test statistic incorrectly? Is the chi-squared test a bad one for this kind of exponential distribution? It doesn't seem to be a matter of just using more samples as one might expect in case I've undersampled. For example, $n=50000$ generates the following:

19: 25012 (expected 25000)
18: 12418 (expected 12500)
17: 6288
16: 3173
15: 1584
14: 797
13: 372
12: 189
11: 94
10: 38
9: 18
8: 9
7: 6
6: 1
5: 1
4: 0
3: 0
2: 0
1: 0
0: 0

With a resultant test statistic of 11.323836284179686. Doing a CDF lookup with 19 degrees of freedom yields 0.08753564706921234 which is well below my expected p value of 0.95. Additionally, the p value is all over the place on every run (sometimes 0.2, 0.7, 0.56, 0.97). Bumping the samples to 500k shows similar results (p = 0.66, 0.98, 0.44, 0.68)

Am I computing this incorrectly? My answers are numerically consistent with using 1 - chisq.test(observed, expected) in Google Sheets so that doesn't seem it. Is it just that chi-squared requires a lot of samples for an exponential distribution to workaround biases that may exist in the PRNG that NodeJS uses for Math.random?

1

There are 1 best solutions below

3
On BEST ANSWER

There are minor problems in your definition of the probabilities. You want $2^{x-R}$, not $2^{R-x}$ (which would be $\ge1$), and even then, your probability distribution isn’t normalized, since these probabilities would add up to $1$ only if you take the sum to negative infinity, but you only have values up to $0$. If I understand correctly what you wrote about how you generated your samples (I’m guessing that you used the function $f(x)$ to transform samples uniformly randomly drawn from $[0,1]$ and then took the integer part), the value $0$ actually had probability $2\cdot2^{-20}$ instead of $2^{-20}$, but that’s obviously not what caused your test to fail.

Another potential problem is that you seem to be confused about the meaning of the $p$-value. You write “Doing a CDF lookup with $19$ degrees of freedom yields $0.08753564706921234$ which is well below my expected $p$-value of $0.95$”. But a $p$-value of $0.95$ would be very bad – what you want is more likely a $p$-value of $0.05$, and $0.09$ is already quite close to that. The lower the deviations from the expected distribution, the lower the test statistic, the lower the value of the cumulative distribution function, and the lower the $p$-value.

But the fundamental problem in what you did is that you have way too many values with very low expected counts to apply a chi-squared test. Note that almost half of your test statistic (about $6.9$) comes from the values for which less than $5$ samples were expected. If you omit that part, you get a test statistic of $7.7635$ and a $p$-value of $0.01$. The chi-squared test assumes that enough samples are expected per value that the discrete distribution can be approximated by a continuous distribution. That assumption isn’t valid here. A rule of thumb is that at least $5$ samples should be expected per value, and certainly there shouldn't be any values with an expected count of $0$.

A final thought: This question would have been better placed at https://stats.stackexchange.com.