How do I know whether my Chi-square results are valid or not?

279 Views Asked by At

I've been using Qualtrics to undergo statistical analysis of two categorical variables. On such note, the cross tabulation is depicted below. enter image description here

As you may see, some data points are missing, so I was wondering whether i could undergo a reliable Chi square to use in my research. When I do the Chi-square test on Qualtrics, it tells me that there is a statistically significant relationship between the two variables. It does not tell me 'Some of the cells in the table do not have enough datapoints. This result may not actually be statistically significant' which is a message that often appears if i do a chi square test on other variables that lack significant data points.

I read that sometimes, A condition of the chi-square test is that all expected counts must be greater than 5. In this case, my results would be invalid even though qualtrics doesn't state so? In other cases, i read that at least 80% of the cells should have an expected value which is higher than 5.

Do you believe that I could undergo the Chi-square test in this case ? Thank you !

1

There are 1 best solutions below

0
On

Got out my microscope to read tiny print in your table, replicated below in R. (Row totals match your table, but proofreading is advisable.)

wa = c(0,0,5,1,5,2,0)
wb = c(0,4,24,10,57,20,13)
bo = c(1,7,7,2,12,8,16)
nt = c(0,0,8,1,10,2,1)
DTA=rbind(wa,wb,bo,nt)
rowSums(DTA)
 wa  wb  bo  nt 
 13 128  53  22   # matches totals in Question

Then a test of independence in R does, quite justifiably, give a warning message. There is only $1$ count in the first column, so expected counts in that column cannot possibly be large enough for the chi-squared statistic to have nearly a chi-squared distribution.

cq.out = chisq.test(DTA);  cq.out

            Pearson's Chi-squared test

data:  DTA
X-squared = 40.191, df = 18, p-value = 0.001966

Warning message:
In chisq.test(DTA) : 
 Chi-squared approximation may be incorrect

Specifically, here are the 28 expected counts. None of the expected counts in the first column is five or more (as recommended) or even three or more (as required, assuming other counts are five or more).

cq.out$exp
         [,1]     [,2]      [,3]      [,4]      [,5]      [,6]      [,7]
wa 0.06018519 0.662037  2.648148 0.8425926  5.055556  1.925926  1.805556
wb 0.59259259 6.518519 26.074074 8.2962963 49.777778 18.962963 17.777778
bo 0.24537037 2.699074 10.796296 3.4351852 20.611111  7.851852  7.361111
nt 0.10185185 1.120370  4.481481 1.4259259  8.555556  3.259259  3.055556

In such cases, R has the capability of simulating a P-value without relying on the usual requirements for the chi-squared statistic.

chisq.test(DTA, simulate.p=T)

    Pearson's Chi-squared test with simulated p-value 
    (based on 2000 replicates)

data:  DTA
X-squared = 40.191, df = NA, p-value = 0.005497

So it does appear that web site preference and educational level are not independent categorical variables, P-value $0.0055 < 0.05 = 5\%.$

If you cannot use software that is able to provide simulated P-values, then I suggest you combine the first three columns to get a category 'HS or Less' and also the last two columns to get category 'Grad Deg', and then see if you have a significant result. There may still be a couple of expected scores that are smaller than ideal, but the chi-squared approximation may be OK. (You can look at expected cell counts to see. You may want to settle for testing at the 10% level.)

If you find significance, then you can look at Pearson residuals to find which cells of the (collapsed) table make large contributions to the significantly large chi-squared statistic. The Pearson residual $r_{ij}$ for cell $(i,j)$ is the (positive or negative) square root of the contribution $c_{ij} = (X_{ij} - E_{ij})^2/E_{ij}$ (according as $X_{ij} - E_{ij}$ is positive or negative), where $X_{ij}$ and $E_{ij}$ are observed and expected cell counts, respectively. Ordinarily, Pearson residuals with largest absolute values (especially, absolute values above 1.5 or 2) point the way to statistically significant departures from independence.

In retrospect, a power computation in advance may have pointed you toward using about 300 subjects rather than only about 200. Also, it may have been better not to have a category 'Both', but to force a vote for A or B. Or to consider something like 'No Favorite' instead of 'Both' and 'Neither'. Half of the possible responses invite (busy, lazy, or nonjudgemental) subjects "permission" not to participate meaningfully in your survey. Large numbers of indecisive votes are causing a problem with analysis.

Note: Aside from investigating independence, it seems worthwhile noting that Website B was much more favorably scored than Website A.

prop.test(c(13, 128), c(141,141))

        2-sample test for equality of proportions 
        with continuity correction

data:  c(13, 128) out of c(141, 141)
X-squared = 184.34, df = 1, p-value < 2.2e-16
alternative hypothesis: two.sided
95 percent confidence interval:
 -0.8902273 -0.7409784
sample estimates:
    prop 1     prop 2 
0.09219858 0.90780142