I have attempted to workaround a software issue which has caused intermittent failures twice in a one week period.
- Assume the failures are Poisson-distributed. i.e. $P(X=x)=\frac{e^{-\lambda}\lambda^x}{x!}$ for unknown $\lambda$
- Assume an unsuccessful workaround will have no effect on the frequency with which the problem is occurring.
- The problem will never re-occur if the workaround is successful.
How long do I have to wait (should no further failures occur) in order to be able to declare with 95% confidence that my workaround is a success?
I think the assumptions I've made are reasonable, and I've got as far graphing possible values of $\lambda$ against their likelihood given the two failures in a week, i.e. $P(X=2)=\frac{e^{-\lambda}\lambda^2}{2}$. I think the relevant equation for introducing time t is $p(X=0)=e^{-\lambda t}$, but I'm a Software Developer without much of a stats background at all, and now I'm stuck.
The Poisson distribution tells us that $P(X=2|\lambda)=\frac{e^{-\lambda}\lambda^2}{2}$. According to Bayes theorem
$$P(\lambda_i|X=2)=\frac{P(X=2|\lambda_i)P(\lambda_i)}{P(X=2|\lambda_1)P(\lambda_1)+P(X=2|\lambda_2)P(\lambda_2)+...+P(X=2|\lambda_N)P(\lambda_N)}$$
This is the discrete form of Bayes theorem - I'm not going to attempt the integral in the continuous form. I have used some R code to assist my calculations.
Choose a sample of possible $\lambda$ (note in my sample N=200)
{0.1, 0.2, 0.3... 20.0}
Choose priors for these
{1/N, 1/N, 1/N... 1/N}
Calculate the Poisson probabilities P(X=2|$\lambda$), which will be the likelihoods in the Bayes equation.
{0.0045, 0.016, 0.033... 0.00000041}
Calculating the weighted-sum of the last two sets gives us our denominator
0.050
And complete the Bayesian part of the calculation by dividing each likelihood - again weighted by the priors - by the denominator. These are our posterior probabilities.
{0.00045, 0.0016, 0.0033... 0.000000041}
Now we can make a series of guesses about the value of t, and use an algorithm such as binary chop to get arbitrarily close to its true value (actually we can't get arbitrarily close with binary chop alone. Due to errors introduced by sampling, we'd have to make the sample progressively finer too).
Let us suppose t=1.72
The half-life formula tells us that $$p(X=0)=e^{−\lambda t}$$ is the probability of no failure occurring in this length of time. Calculate it for each $\lambda$ in our sample:
{0.842, 0.709, 0.597... 0.0000000000000011}
Now we can weight these probabilities against the (posterior) probability of each value of $\lambda$ occurring. This weighted sum is 0.04969, so 1.72 weeks is the answer, because the chance of a failure occurring within that time comes in nicely at just under 5% as required.
It is nagging me slightly that I can't quantify the errors I have created in using a discrete sample, but as I see it, the main problem with this calculation is my choice of priors, and the only way to reduce their influence is by observing more data. I would welcome your feedback too.
Edit 2016-06-13, added graphs:
Graphs