How can i predict how many participants i will need in hypothesis test trial?

43 Views Asked by At

I try to understand if having more serious side effects after vaccine mean higher level of antibodies. There are 2 groups, divided by the severity of there side effects after the vaccine and there antibodies levels are tested.

Now the $H_0$ is that there is no difference in the amount of antibodies and $H_1$ says there is.

I don't know ahead what the average or variance will be.

How can i know how many participants i will probably need to get significant results with a p-value 0.05?

NOTE: I am not a statistician, i just try to understand how to do a good trial.

Thanks.

1

There are 1 best solutions below

2
On

Power and sample size computations require some educated guesses in advance. You expect more antibodies among subjects who have more side effects. That means the alternative hypothesis is one-sided. So $H_0$ is that there is no difference and $H_a$ is more antibodies in the high side-effects group.

You say you want to find significance at the 5% level. Possibly you would use a two-sample t test (which requires antibody counts to be approximately normal). Let's say you want probability 80% or 90% of finding a meaningful difference.

Maybe the standard deviation $\sigma$ of antibodies will be about 30 units in each group and you would consider $\Delta = 10$ units difference to be clinically meaningful. (It is the ratio $\Delta/\sigma$ that matters. So you're saying it would be important to detect a difference that's $1/3$ the standard deviation.)

Someone connected with the study must have some information or intuition about the variability of antibody counts and how big a difference is of important to detect, or you shouldn't be trying to plan such a study.

Testing is most efficient when the number in each group is roughly the same. For planning purposes we assume it will be reasonable to define 'serious side effects' so that about half of the subjects have serious side effects. Maybe not true, but it gives a way to start.

With this information one can find the sample size in each group. Relevant output from Minitab software is shown below. This seems like it might be a phase one clinical trial where you're trying to get an idea whether the vaccine is safe and may be effective at a particular dosage. Typically, such trials use a few hundred volunteer subjects. Altogether, you would need something like 225 or 300 subjects, depending on the required power of the test (probability of detecting a useful difference).

Power and Sample Size 

2-Sample t Test

Testing mean 1 = mean 2 (versus >)
Calculating power for mean 1 = mean 2 + difference
α = 0.05  Assumed standard deviation = 30

            Sample  Target
Difference    Size   Power  Actual Power
        10     112     0.8      0.800098
        10     155     0.9      0.900282

The sample size is for each group.

enter image description here

Notes: (1) A clinical trial for final governmental approval of a vaccine typically requires several thousand subjects half getting the vaccine and half getting a placebo. Government requirements for such trials, including sample sizes, are based on a many considerations, only some of which are statistical.]

(2) To determine power for a given sample size in a 2-sample pooled t test, there is a formula involving the non-central t distribution (used in the Minitab procedure above). It can be used in case you must have different sample sizes in the two groups.

Various statistical programs have similar 'power and sample size' procedures, and there are some on-line calculators (of varying clarity and accuracy).

If you are using some other kind of test, you may need to use simulation to get the power for the assumed specifications.

Here is a simulation in R for sample sizes 80 and 200 in a Welch 2-sample t test (equal variances not required), showing power about 80%.

set.seed(2021)
n1 = 80;  n2 = 200;  mu1 = 110;  mu2 =100; sg = 30
pv = replicate(10^5, t.test(rnorm(n1,mu1,sg),
      rnorm(n2,mu2,sg), alt="gr", var.eq=T)$p.val)
mean(pv <= .05)
[1] 0.8058