Calculating Sample Size for a One Sample, Dichotomous Outcome

420 Views Asked by At

I'm trying to calculate a required sample sizes for a project using a one sample, dichotomous outcome formula. I'm confused by the intuition of the formula where a less probable result requires fewer samples than a more probable result.

Let's say I have a very smart robot dog and countless colored balls in three colors (red, blue, green). I program the dog to pick up only the red balls and bring them to me. Naturally, some bugs exist in the code of the dog, and I project that it'll mess up and bring me a blue ball 1 in 1000 (0.001) times and he will bring me a green ball 2 in 1000 times (0.002). I want to validate that these projected miscues are accurate estimations so I want to run a test and collect enough data to say these proportions are accurate.

I'm referencing this equation to estimate the required number of trials such that the 95% confidence interval estimate of the number of times a green and blue ball are picked is within 0.1% of the true proportion: https://sphweb.bumc.bu.edu/otlt/mph-modules/bs/bs704_power/BS704_Power4.html

 n = p(1-p)(Z/E)^2

Here's the required sample size for blue balls:

 n = 0.001(1-0.001)(1.96/0.001)^2

 n = 3838

And here's the required sample size for green balls:

 n = 0.002(1-0.002)(1.96/0.001)^2

 n = 7668

We can max out the equation or sample size required by using 0.5 (implying we know nothing about the code and the times the dog will get any color).

Now the part I'm confused by is, intuitively, shouldn't the smaller proportion (blue balls at 0.001) require the higher sample size because those miscues are harder to detect compared to detecting the green ball (0.002) miscues? Meaning, since picking a green ball is projected to be more likely than picking a blue ball, why is this formula telling me the green ball detection needs more samples? Am I using the formula properly?

1

There are 1 best solutions below

0
On

I would have thought if you want the $95\%$ confidence interval for the green balls to be something similar in size to one between $0.001998$ and $0.002002$ you would want to consider something like $$1.96\sqrt{\frac{0.002(1-0.002)}{n}} = 0.000002$$ rather than the $=0.001$ you actually used.

Solving that would give $$n=0.002(1-0.002)\left( \frac{1.96}{0.000002}\right)^2 = 1916958400$$ which is very big.

It is about half the corresponding number for the blue balls, solving $1.96\sqrt{\frac{0.001(1-0.001)}{n}} = 0.000001$ to get $n=3837758400$, in line with your intuition.