Help with a Pearson's chi-square test exercise

54 Views Asked by At

The following exercise is from some assessment test from Hong Kong. It looked like a GOF test at first, but I've only ever done them with a single column of data. I'm a maths tutor but I had to transfer him since I didn't recognize the correct procedure. And of course I would like that not to happen again, so, can anyone show me the correct way to approach this problem? Here it is:

Exercise. A random sample of 230 workers at a company were surveyed about their satisfaction with their life. The answer about their satisfaction was recorded along with their annual wages: $$ \begin{array}{lccccr} & \$\text{20-35k} & \$\text{35-50k} & \$\text{50-75k} & \$\text{75-90k} & \text{Total} \\ \text{Very satisfied} & 13 & 11 & 19 & 15 & 58 \\ \text{Somewhat satisfied} & 29 & 31 & 28 & 12 & 100 \\ \text{Not satisfied} & 34 & 20 & 10 & 8 & 72 \\ \text{Total} & 76 & 62 & 57 & 35 & 230 \\\hline \text{Pearson's Chi-square test} & \chi^2=20.0043 & \text{df}=6 & \text{p-value}<0.001 \end{array} $$ Assuming there's no relationship between income and life satisfaction, how many people who earn between $20-35k would you expect to be 'not satisfied' with life?

Thanks in advance!

2

There are 2 best solutions below

3
On BEST ANSWER

HINT: The expected values are defined as the the frequencies that should be found in each cell of the table assuming no association between the two variables.

The expected value for each cell is obtained by multiplying the row total to the column total, and then dividing by the grand total.

0
On

Below is analysis of this data table using the chisq.test procedure in R. Nothing here is classified 'secret'; formulas for everything should be in the course textbook.

Data table:

vs = c(13,11,19,15)
ss = c(29,31,28,12)
ns = c(34,20,10, 8)
TBL = rbind(vs,ss,ns);  TBL
   [,1] [,2] [,3] [,4]
vs   13   11   19   15
ss   29   31   28   12
ns   34   20   10    8
rowSums(TBL)
vs  ss  ns 
58 100  72 
colSums(TBL)
[1] 76 62 57 35

Chi-squared test:

The test statistic (X-sq in printout) is $$Q = \sum_{i=1}^r\sum_{j=1}^c \frac{(X_{ij}-E_{ij})^2}{E_{ij}},$$ where the table has $r = 3$ rows and $c = 4$ columns, $X_{ij}$ are observed cell counts, and $E_{ij}$ are expected counts determined from table row and column totals in accordance with the null hypothesis.

Provided that all $E_{ij} \ge 5,$ the test statistic $Q$ has approximately a chi-squared distribution with degrees of freedom $\nu = (r-1)(c-1) = 6.$

chi.out = chisq.test(TBL); chi.out

        Pearson's Chi-squared test

data:  TBL
X-squared = 20.008, df = 6, p-value = 0.00276

Notice that you can match the output here with results included in your Question. The expected counts $E_{ij}$ are available:

chi.out$exp
       [,1]     [,2]     [,3]      [,4]
vs 19.16522 15.63478 14.37391  8.826087
ss 33.04348 26.95652 24.78261 15.217391
ns 23.79130 19.40870 17.84348 10.956522

In particular, following @Anatoly's response (+1), $E_{31} = \frac{76(72)}{230} = 23.7913,$ can be obtained from appropriate row and column totals of the table.

Figure: Below is a plot of the density function of $\mathsf{Chisq}(\nu = 6)$ along with the 5% critical value of the test (vertical dotted line) and the observed value of $Q = 20.01$ (solid line). The 95th quantile $c = 12.5916$ of the chi-squared distribution can be found in printed tables of chi-squared distributions or from R (see below).

qchisq(.95, 6)
[1] 12.59159

The exact P-value (here $0.003),$ often shown in computer printouts, cannot usually be found from printed tables. The value from R is shown below:

1 - pchisq(20.008, 6)
[1] 0.00276033

In the figure below the area to the right of $c$ under the chi-squared density curve is 5% and the (very small) area to the right of $Q$ is $0.003.$

enter image description here