Why scipy.stats.chi2_contingency computes incorrect Pearson's chi-squared statistic?

520 Views Asked by At

I computed Pearson's chi-squared statistics for the contingency table bellow.

observed

Here are the expected values.

expected

Method 1 (using SciKit)

from scipy.stats import chi2_contingency
from scipy.stats import chi2

table = [[21,2], [14, 0]]
stat, p, dof, expected = chi2_contingency(table)
print(stat)

Returns

0.14814773735581194

Method 2 (directly)

G3 = 35*23/37
H3 = 2*23/37
G4 = 35*14/37
H4 = 2*14/37
(21-G3)**2/G3+(2-H3)**2/H3+(14-G4)**2/G4+(0-H4)**2/H4

Returns

1.2869565217391306

I expected the returned values to be equal. Why do they differ?

1

There are 1 best solutions below

0
On

There are many slightly different implementations of chi-squared tests within and among various kinds of software?

Here are four tests in R, using your table:

TBL = matrix(c(21,2,14,0), byrow=T, nrow=2)
TBL
      [,1] [,2]
 [1,]   21    2
 [2,]   14    0

Default chi-squared test in R, using Yates' continuity correction. It gives an error message because counts in some cells are too small for the chi-squared statistic to have approximately a chi-squared distribution.

 chisq.test(TBL)

        Pearson's Chi-squared test 
        with Yates' continuity correction

 data:  TBL
 X-squared = 0.14815, df = 1, p-value = 0.7003

 Warning message:
 In chisq.test(TBL) : 
  Chi-squared approximation may be incorrect

chisq.test(TBL, cor=F)

Without the continuity correction: Another error message because of low counts.

        Pearson's Chi-squared test

data:  TBL
X-squared = 1.287, df = 1, p-value = 0.2566

Warning message:
In chisq.test(TBL, cor = F) : 
  Chi-squared approximation may be incorrect

Simulation to give a more reliable P-value:

chisq.test(TBL, sim=T)

        Pearson's Chi-squared test 
        with simulated p-value 
        (based on 2000 replicates)

data:  TBL
X-squared = 1.287, df = NA, p-value = 0.5042

Fisher exact test uses a hypergeometric distribution:

fisher.test(TBL)

    Fisher's Exact Test for Count Data

data:  TBL
p-value = 0.5165
alternative hypothesis: 
  true odds ratio is not equal to 1
95 percent confidence interval:
 0.000000 8.792391
sample estimates:
odds ratio 
     0 

None of the tests gives a P-value significant at the 5% level. With only two subjects in 'Senzitvne` outcome, you have no chance for a significant between groups.