Find the required Chi-square score for an arbitrarily low p-value (2 degrees of freedom)

676 Views Asked by At

I'm trying to use the Chi-Square test to find the significance of data that suffers from the multiple testing problem. Because I have this multiple testing problem, the required p-value to view a test as significant is very, very low, around 10E-5. I haven't found a chi-square table that lists critical values that are so low. I'm wondering how I can actually calculate this value myself. Some google searching hasn't helped me find the method with which these chi-square tables are generated.

So:

  1. I need to know how to calculate the chi-square values to arbitrarily low p-values.

  2. I would like to know how these tables are generated in the first place. The internet tells me they exist and this is how we use them, but not where they come from. It feels a bit deus-ex-machina, like "here's a tool for you, and just trust us that it works."

I'm not a mathematician so an example with the values inserted would be greatly helpful.

Thanks for the help.

2

There are 2 best solutions below

2
On

Your questions are quite simple to answer. In case of Chi-Square test, the p-value is computed as $1 - CDF_{\chi^{2}_{df}}(ts)$, where $CDF_{\chi^{2}_{df}}$ is a cumulative distributive function of Chi-Square distribution with $df$ degrees of freedom and $ts$ is the value of your test statistic. (Try to figure out why is p-value calculated this way - it is not so difficult).

So, if you want to calculate critical value of Chi-Square test at significance level $10^{-5}$ where test statistic has $\chi^{2}_2$ distribution under null hypothesis, just find the solution of equation $1 - CDF_{\chi^{2}_2}(ts) = 10^{-5}$ in $ts$.

0
On

R is a reliable statistical analysis package, widely used in practice and available free of charge at www.r-project.org. While the package is very broad in scope, you want only a few simple procedures. Many other computational packages will do the same thing. Because R is free and reliable, I'll give instructions for R; adapt them to some other program if you like.

Suppose you want the critical value of a test at level $\alpha = 0.001$ with degrees of freedom $\text{df}= 12.$ The critical value is the value for a chi-squared statistic $Q$ that separated 'fail to reject' to the left from 'reject' to the right. Here is how to use R to get the answer. (The symbol > is the 'prompt'; you don't type that. You can ignore the [1] in the result; it is mainly useful for answers with many parts. You don't need to type anything that comes after a # sign; those are comments for you, not the computer.)

 > alpha = .001;  df = 12    # press RETURN at end of each line
 > qchisq(1 - alpha, df)     # notice the 'q' in 'qchisq'
 [1] 32.90949

Or you can put the query all on one line, if you like:

 > qchisq(1 - .001, 12)
 [1] 32.90949

You won't find that value in most printed tables. Here's one you can compare with a printed table:

 > alpha = .10;  df = 8
 > qchisq(1 - alpha, df)
 [1] 13.36157

My printed table shows the answer as 13.36, maybe some have 13.362. R usually prints more decimal places that are really necessary for your purposes. See illustration below.

If you have a one-sided test in which $\text{df} = 8,$ the computed value of the chi-squared statistic is $Q = 19.33,$ and you're rejecting for large values of $Q,$ then use can use R to find the P-value of the test, as follows:

 > q = 19.33;  df = 8
 > 1 - pchisq(q, df)    # notice the 'p' in 'pchisq'
 [1] 0.01319102

So the P-value is 0.013; small enough to reject at the 5% level or the 2% level, but not at the 1% level (or lower levels) of significance.

Compare the following with a printed table and the figure below:

 > q = 13.36;  df = 8
 > 1 - pchisq(q, df)
 [1] 0.1000488

R is case sensitive. That means it distinguishes between capital and small letters. For your purposes, you will be happier if you use lower-case letters for everything, as in my examples above.

END of lesson. That's all there is to it.

These values for the chi-squared distribution are extremely difficult to compute--essentially impossible to do by hand. So, in practice, you will have to get over your objection to the 'deus-ex-machina' existence where the chi-squared distribution is concerned. Perhaps you have already gotten over this objection in the case of normal tables (or using a statistical calculator to get similar values). I imagine you have never computed those by hand.

In the figure below the blue curve is the density curve of $Chisq(df=8).$ The vertical line is at 13.36. The entire area beneath the curve is 1. The area to the right of the line and beneath the density curve in 0.10.

enter image description here


Note: If you want to give R a try: go the URL given above. Then click CRAN (no idea what that means) and the CRAN site nearest to you; then your kind of machine (Windows or Mac); then base and Download. After a brief time (depending on your connection speed), an R icon will appear on your desktop; click it to get the R Console window, and start typing at a > prompt.