How long do I have to wait to say with 95% confidence my Poisson-distributed software failure is fixed?

Question

How long do I have to wait to say with 95% confidence my Poisson-distributed software failure is fixed?

170 Views Asked by Bumbble Comm At 01 Apr 2026 - 12:58

I have attempted to workaround a software issue which has caused intermittent failures twice in a one week period.

Assume the failures are Poisson-distributed. i.e. $P(X=x)=\frac{e^{-\lambda}\lambda^x}{x!}$ for unknown $\lambda$
Assume an unsuccessful workaround will have no effect on the frequency with which the problem is occurring.
The problem will never re-occur if the workaround is successful.

How long do I have to wait (should no further failures occur) in order to be able to declare with 95% confidence that my workaround is a success?

I think the assumptions I've made are reasonable, and I've got as far graphing possible values of $\lambda$ against their likelihood given the two failures in a week, i.e. $P(X=2)=\frac{e^{-\lambda}\lambda^2}{2}$. I think the relevant equation for introducing time t is $p(X=0)=e^{-\lambda t}$, but I'm a Software Developer without much of a stats background at all, and now I'm stuck.

Original Q&A

There are 2 best solutions below

Bumbble Comm On 20 Jul 2015 - 12:08

I realise that running a simulation is a horrible way of doing mathematics, but I think the answer is 1.7 weeks (standard deviation 0.037). Programming language is R.

set.seed(1)

getPoissionTime <- function(smallLambda)
{
    count <- 0
    while(rpois(1, smallLambda) == 0)
    {
        count <- count + 1
    }
    count
}

answers <- c()

while(length(answers) < 100)
{
    values <- c()

    while(length(values) < 10000)
    {
        # I guess this is an additional assumption - that prior to observations, all
        # values of lambda were equally likely.
        lambda <- runif(1) * 20

        if (rpois(1, lambda) == 2)
        {
            t <- getPoissionTime(lambda/100)/100
            values <- c(values, t)
        }
    }

    q <- quantile(values, probs=0.95)
    answers <- c(answers, q)
}

print(paste("mean", mean(answers), "sd", sd(answers)))

I've checked this against @DavidQuinn's example i.e. if I force $\lambda$ to be 2 then the program agrees with him that $t$ is 1.497

**Bumbble Comm** · Accepted Answer

The Poisson distribution tells us that $P(X=2|\lambda)=\frac{e^{-\lambda}\lambda^2}{2}$. According to Bayes theorem

$$P(\lambda_i|X=2)=\frac{P(X=2|\lambda_i)P(\lambda_i)}{P(X=2|\lambda_1)P(\lambda_1)+P(X=2|\lambda_2)P(\lambda_2)+...+P(X=2|\lambda_N)P(\lambda_N)}$$

This is the discrete form of Bayes theorem - I'm not going to attempt the integral in the continuous form. I have used some R code to assist my calculations.

Choose a sample of possible $\lambda$ (note in my sample N=200)

{0.1, 0.2, 0.3... 20.0}

Choose priors for these

{1/N, 1/N, 1/N... 1/N}

Calculate the Poisson probabilities P(X=2|$\lambda$), which will be the likelihoods in the Bayes equation.

{0.0045, 0.016, 0.033... 0.00000041}

Calculating the weighted-sum of the last two sets gives us our denominator

0.050

And complete the Bayesian part of the calculation by dividing each likelihood - again weighted by the priors - by the denominator. These are our posterior probabilities.

{0.00045, 0.0016, 0.0033... 0.000000041}

Now we can make a series of guesses about the value of t, and use an algorithm such as binary chop to get arbitrarily close to its true value (actually we can't get arbitrarily close with binary chop alone. Due to errors introduced by sampling, we'd have to make the sample progressively finer too).

Let us suppose t=1.72

The half-life formula tells us that $$p(X=0)=e^{−\lambda t}$$ is the probability of no failure occurring in this length of time. Calculate it for each $\lambda$ in our sample:

{0.842, 0.709, 0.597... 0.0000000000000011}

Now we can weight these probabilities against the (posterior) probability of each value of $\lambda$ occurring. This weighted sum is 0.04969, so 1.72 weeks is the answer, because the chance of a failure occurring within that time comes in nicely at just under 5% as required.

It is nagging me slightly that I can't quantify the errors I have created in using a discrete sample, but as I see it, the main problem with this calculation is my choice of priors, and the only way to reduce their influence is by observing more data. I would welcome your feedback too.

Edit 2016-06-13, added graphs:

Graphs

How long do I have to wait to say with 95% confidence my Poisson-distributed software failure is fixed?

There are 2 best solutions below

Related Questions in POISSON-DISTRIBUTION

Related Questions in BAYES-THEOREM

Trending Questions

Popular # Hahtags

Popular Questions