Creating confidence intervals for Bernoulli trials when $p\approx 1$

77 Views Asked by At

Suppose that I am testing a piece of software. This software will pass the test correctly with probability $p\in[0,1]$. Now say that I will not ship the software until it passes all $n$ tests because I want to be confidence with probability $1-\alpha$ that $p>P$.

How many tests should I perform to make sure that this is true?

Using a numerical example, I want to be $95\%$ sure that $p>.99$, how many tests do I need to pass to be in this interval? It would also be nice if I could find some function that was dependent on $\alpha, P$, $n > f(\alpha, P)$.

Now normally I would know how to do this by assuming the Bernoulli trials approximated a normal distribution, however apparently this is not a good approximation if $p$ is close to one? Looking at Wikipedia there appears to be a lot of different methods of doing this, and I don't know which to pick or how to even implement it. Any help would be appreciated

2

There are 2 best solutions below

0
On

One approach is the rule of three

which suggests, in your example of wanting $95\%$ confidence that $p \gt 0.99$, would require passing at least $300$ independent tests without a single failure

and more generally would suggest to have $1-\alpha$ confidence that $p \gt P$, with $\alpha$ close to $0$ and $P$ close to $1$, you want the test to pass $n \gt -\frac{\log_e \alpha}{1-P}$ independent tests without a single failure

2
On

Since you do not specify a prior degree of belief, it appears that you want to do a Frequentist analysis. Note that you use invalid colloquial language in saying "I want to be 95% sure". This is a common mis-interpretation of Frequentist confidence.

Furthermore, it appears that you want to form a one sided confidence interval for $p$ with, as an example, $\alpha=0.05$.

Suppose you do $n$ trials. If there are no failures, then the Neyman confidence interval for $p$ will be $[Q,1]$ with $Q\ge P$, provided: $$P^n \ge \alpha$$ The left hand side is the probability of observing $0$ failures in $n$ trials. So, $$n\ge \log (\alpha) / \log(P)$$

For example with $\alpha=0.05$ and $P=0.99$, you will need to do at least 298 trials.