Counting Rules application for an hypothesis testing example

50 Views Asked by At

I´m studying the book "Common errors in statistics and how to avoid them" by Phillip I. Good and James W. Hardin. I´m in the second part: Foundations, and i found an example that is used to show where to use a one sided or two sided test, but I don´t understand how the author use counting rules in the example:

One-Sided or Two-Sided

Suppose on examining the cancer registry in a hospital, we uncover the following data that we put in the form of a 2 × 2 contingency table.

enter image description here

The 9 denotes the number of males who survived, the 1 denotes the number of males who died, and so forth. The four marginal totals or marginals are 10, 14, 13, and 11. The total number of men in the study is 10, and 14 denotes the total number of women, and so forth.

The marginals in this table are fixed because, indisputably, there are 11 dead bodies among the 24 persons in the study and 14 women. Suppose that before completing the table we lost the subject IDs, so that we could no longer identify which subject belonged in which category. Imagine you are given two sets of 24 labels. The first set has 14 labels with the word “woman” and 10 labels with the word “man.” The second set of labels has 11 labels with the word “dead” and 12 labels with the word “alive.” Under the null hypothesis, you are allowed to distribute the labels to subjects independently of one another. One label from each of the two sets per subject, please.

There are a total of $\binom{24}{10}$ ways you could assign the labels.$\binom{14}{10}\binom{10}{1}$ of the assignments result in tables that are as extreme as our original table, (that is, in which 90% of the men survive) and in tables that are more extreme (100% of the men survive). This is a very small fraction of the total, so we conclude that a difference in survival rates of the two sexes as extreme as the difference we observed in our original table is very unlikely to have occurred by chance alone.

I have bolded the part I don´t get. I do understand how combinatorics works. I think what I need help with is with how to express the decisión algorithm needed. can someone help me with this?

1

There are 1 best solutions below

0
On BEST ANSWER

The test introduced here is Fisher's Exact test. In my Comment, I recommended you try the Wikipedia explanation of this test (just click on Example in the Comment).

However, this test uses the 'hypergeometric distribution', which may be in your text or class notes. (But if not, then you can google it.) Here is an explanation of the probabilities used in the test.

Suppose an urn contains 24 balls, 10 blue (men) and 14 pink (women). You are going to select 11 balls (died) at random without replacement. The probability of getting exactly $X = 1$ red ball out of 11 is as follows: $$ P(X = 1) = \frac{{10 \choose 1}{14 \choose 10}}{24 \choose 11} = 0.00401.$$

This can be computed in R software in two different ways as follows:

dhyper(1, 10, 14, 11)
[1] 0.004010185
choose(10, 1)*choose(14,10)/choose(24,11)
[1] 0.004010185

The computation uses the PDF (or PMF) of the hypergeometric distribution dhyper as programmed into R. The second uses the relevant 'binomial coefficients'.

If we want $P(X \le 1) = P(X = 0) + P(X = 1) = 0.00415,$ we can use the hypergeometric CDF phyper:

phyper(1, 10, 14, 11)
[1] 0.00415601

P-value of Fisher's Exact test: The null hypothesis of the test is that the proportion of deaths is the same among men and women. Because the probability of getting result as extreme (only one of the men died) or more extreme (none or one died), we say that the P-value of of the test is $0.00415 < 5\%$ so we reject the null hypothesis at the 5% level (even at the 1% level.