How to determine power of a test of the difference of two binomial proportions?

88 Views Asked by At

Consider a sample of 800 adults with wrist fracture where 400 are provided an operative treatment and 400 are provided physiotherapy only. Outcome of interest is whether the wrist is fully healed after 6 months. What power will a hypothesis test possess to detect a 10% improvement in the operative cohort, assuming 50% of the physiotherapy cohort are fully healed at 6 months?

My attempt (assuming $\alpha = 0.05$):

A 10% improvement would mean the proportion difference (test statistic) between the two therapies: $$\theta = p_2 - p_1 = \frac{240}{400} - \frac{200}{400} = 0.1$$

Thus, we have two sample distributions for $\theta$. One is the null hypothesis where there is no difference between the therapies and the mean, $\bar{\theta_1}$, is $0$. The other distribution is where $\bar{\theta_2}$, is $0.1$.

Here is an illustration:

enter image description here

This is the point where I got stuck. Usually, in these types of questions I'm told that we are sampling from a normally distributed population with a known standard deviation so I can use Z statistic ($1.64$) to determine the rejection threshold for the null hypothesis and go from there.

However, here we are reliant on sampling two distributions (physio + operative) that I assume to be binomially distributed.

Thus, the variance of the black and red curves (null vs alternate hypothesis) would be the sum of the variances from the two underlying distributions:

$$Var(\theta) = \frac{p_1(1-p_1)}{n_1} + \frac{p_2(1-p_2)}{n_2}$$

How do I proceed from here? Can we assume the sample distributions are normally distributed? If so, I can use the Z statistic with the null distribution:

$$1.96 = \frac{\theta - 0}{\sqrt{2\frac{p_1(1-p_1)}{n_1}}}$$

Edit: This question was inspired by the example in this video starting at 50:47. I don't understand the reasoning behind his steps and I tried to break it down using the approach I was more familiar with.

1

There are 1 best solutions below

2
On

Comment continued:

To obtain fictitious results for one experiment with 800 subjects, use R to sample possible results. Then do the test with prop.test:

set.seed(2022)
x.t = rbinom(1, 400, .6);  x.t
[1] 246
x.c = rbinom(1, 400, .5);  x.c
[1] 193

prop.test(c(246,193), c(400,400), alt="greater", cor=F)

         2-sample test for equality of proportions 
         without continuity correction

data:  c(246, 193) out of c(400, 400)
X-squared = 14.18, df = 1, p-value = 8.307e-05
alternative hypothesis: greater
95 percent confidence interval:
  0.07513794 1.00000000
sample estimates:
 prop 1 prop 2 
 0.6150 0.4825 

The small P-value $< 0.05 = 5\%$ indicates rejection of $H_0: p_t = p_c$ vs. $H_a: p_t > p_c$ at the 5% level of significance.

Simulate to repeat 100,000 experiments and count the rejections for a good approximation of the power of the test.

set.seed(217)
pv = replicate(10^5, prop.test(c(rbinom(1,400,.6),
      rbinom(1,400,.5)), c(400,400), alt="g", cor=F)$p.val)
mean(pv <= .05)
[1] 0.88337        # approx power
2*sd(pv <= .05)/sqrt(10^5)
[1] 0.002030059    # 95% margin of sim error

The vector pv contains $10^5$ P-values. The logical vector pv <= .05 contains as many TRUEs and FALSEs, and its mean is the proportion of its TRUEs (Rejections). So, the power of a test at the 5% level is about 88% $(0.883 \pm 0.002).$

There are several (nearly equivalent) tests to compare two binomial proportions. The 2-sided version of this one (prop.test in R) is equivalent to a chi-squared test of a $2 \times 2$ table of counts with rows for Recovery/Not, columns for Operative Procedure and Psychotherapy, and grand total 800.

Notes: (1) If no test for comparing two binomial proportions is provided in your text, then google to find your favorite version of such a test, and do an analytic computation of power.

(2) You did not say explicitly whether you want to do a one-sided or two-sided test. A one-sided test (as above) may be a little easier to handle.

(3) One very rough clue that 400 in each group may be enough to give good power is that the margin of error for a 95% confidence interval for a binomial proportion with $n=400$ near $0.5$ is $\pm 0.05$ and your test has effect size $0.1.$