Binomial Hypothesis Test

870 Views Asked by At

The proportion of deaths due to lung cancer in working males aged 15-64 in Australia between 1970 and 1972 was 10%. There is reason to believe that working in a chemical plant for an extended period can increase your risk of lung cancer. Several Australian chemical plants were investigated, and it was found that of 90 deaths in working males aged 15-64, 19 were due to lung cancer.

Is there evidence of increased risk of developing lung cancer if you work in a chemical plant?

For this hypothesis test, we are required to provide a statement of the null and alternative hypothesis, a test statistic, the observed value and the estimated $p$-value.

My Attempt:

let $p$ be the probability of developing lung cancer in a chemical plant. Therefore,
$$H_0:p=0.1 \ \ \ \ \ \ \text{vs} \ \ \ \ \ \ \ H_1:p>0.1$$ Our test statistic is $$Z=\frac{\hat{p}-p}{\sqrt{\frac{p(1-p)}{n}}} \ \ \ \ \ \ \text{where} \ Z\sim N(0,1)$$ Our observed value is therefore $$\frac{\frac{\hat{p}}{n}-p}{\sqrt{\frac{p(1-p)}{n}}} =\frac{\frac{19}{90}-0.1}{\sqrt{\frac{0.1(1-0.1)}{90}}}=3.51$$ Hence our p-value is $$\mathbb{P}(Z>\text{observed value})=\mathbb{P}(Z>3.51)<0.0001$$

Hence we reject $H_0$. Is my hypothesis test correct?

1

There are 1 best solutions below

8
On BEST ANSWER

Under the null hypothesis that 10% of deaths are due to lung cancer, the number $X$ of deaths at the chemical factories is $X \sim \mathsf{Binom}(90, .1).$ Then the P-value of $H_0: p = .1$ vs $H_a: p > .1$ is $P(X \ge 19) = 1 - P(X \le 18) = 0.0013 < 1\%.$ So $H_0$ is rejected.

1 - pbinom(18, 90, .1)
## 0.001308245

This may raise questions about hazardous conditions in chemical factories, which need to be carefully investigated. But it is not 'evidence' of increased risk due to employment in a chemical factory. There are many ways in which workers at 'several' chemical factories might not be typical of Australian men aged 15-64.

Addendum per Comment: Here is a plot of the exact (discrete) binomial distribution and its approximating (continuous) normal distribution. The P-value refers to values to the right of the vertical broken line.

enter image description here

Note: As software for computing exact distributions becomes more readily available and easy to use, there is no reason to use a normal approximation in many practical applications. However, the normal approximation will continue to be of theoretical interest.