How large must be the sample to determine the probability of a binary outcome?

52 Views Asked by At

Suppose a computer program randomly generates a "yes" or "no" answer upon request. You however do not know whether the probability of getting either answer is evenly split or following a different ratio (example: 10% chance "yes" and 90% chance "no").

How many answers should I request from the computer, before having a sample large enough to determine the probability of getting either answer with a margin of error of 1%?

1

There are 1 best solutions below

0
On

Assume that our prior information is that the program randomly generates Y with prob $p$ or N with prob $1-p$, where our distribution for $p$ is uniform on the interval $[0,1]$ (meaning we have no idea what the parameter is).

Suppose we then carry out trials and the computer returns Y $m$ times and N $n$ times. The probability of this result with parameter $p$ is simply the binomial ${m+n\choose m}p^m(1-p)^n$, so our posterior distribution for $p$ after the trials is the normalised density $$f(p)=\frac{1}{m+n+1}{m+n\choose m}p^m(1-p)^n$$

Unsurprisingly this gives a distribution with modal value $p=\frac{m}{m+n}$. For example if we got 40 Y in 100 trials, the posterior distribution would be:

enter image description here

But with only 100 trials our confidence that $p$ is close to 0.4 is fairly low. Integrating, we get that the chance that $p$ lies outside the range $0.39<p<0.41$ is 84%.

If we got 400 Y in 1000 trials our posterior distribution would give a 48% chance of $0.39<p<41$:

enter image description here

If we got 4000Y in 10000 trials we would get a 96% chance of $0.39<p<0.41$:

enter image description here

So the short answer is that you need of the order of 10,000 trials to get good confidence that you have pinned down $p$ to within 1%.

Of course, you could also use your trials to test whether the Y/N were being randomly generated. Twenty years ago many programmers were hopeless at writing random number generators and they often had obvious regularities. But there have been major improvements since then and the generators built into the major operating systems are now fairly good.