When to use Binomial Distribution vs. Poisson Distribution?

20.5k Views Asked by At

A bike has probability of breaking down $p$, on any given day.

In this case, to determine the number of times that a bike breaks down in a year, I have been told that it would be best modelled with a Poisson distribution, with $\lambda = 365\,p$.

I am wondering why it would be incorrect to use a binomial distribution, with $n=365$. After all, isn't Poisson really an approximation of a sum of Bernoulli random variables?

Thanks!

4

There are 4 best solutions below

0
On BEST ANSWER

Poisson distribution

a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since the last event.

Binomial distribution

the discrete probability distribution of the number of successes in a sequence of n independent yes/no experiments, each of which yields success with probability p

Emphasis mine

For the Poisson you need a known interval (365 days) and a known failure rate (average failures per day - Note: this can be any number $> 0$). For the Binomial you would need a fixed number of trials (365) and a known failure rate per trial (failure chance on a given day Note: this must be a number $\in [0,1]$).

For the specific question, it is a matter of interpretation and both could be justified here.

The Poisson is more appropriate if it is conceivable that the bike could break on a given day, be repaired and break again (and again etc.). For minor failures this is appropriate.

The Binomial is more appropriate if a failure on a given day takes the bike out for the rest of the day (but not for more than that because it would then reduce the total number of days). That is, a moderate failure.

I know from your earlier question here that this is then combined with a Gamma distributed cost - there is no mention of the time the repair takes. If there were, this would be a fairly typical queuing problem which typically uses Poisson distributions. I must say that it was this that led me towards the Poisson.

0
On

I think the main difference is that the Poisson distribution is used to approximate a very large sample like casuality in the war, fish caught in a big lake, number of traffic accidents, etc. If you review the derivation of Poission distribution, at some step we let $n\rightarrow \infty$ and assume $np\rightarrow \lambda$, which is a constant. At here $n=365$ is a relatively large number, comparing to a typical sample side of $25-100$, so I think using Poisson distribution is justified if $p$ is relatively small.

The wikipedia article on Poission distribution might also helps.

0
On

It's not incorrect to use a binomial distribution because indeed that is what it would be.

However, it is best modeled as a poison distribution because the calculations are much simpler and the approximation is sufficiently close for large $n$

$$\mathcal{Bin}(n, p) \approx \mathcal {Pois}(np), \mbox{ for }n\to\infty$$

0
On

For Binomial, we assume the bike breaks in a given day and it doesn't break again that day (one Bernoulli trial). For Poisson, bike might get fixed and break yet again in the same day (as said by @JMoravitz). Still, if the chosen time interval (day is an arbitrary choice) is narrowed down to so small that likelihood of breaking twice becomes negligible, Binomial is the model for the distribution. In that case, however, the number of Bernoulli trials becomes very large, in which case the Binomial converges to Poisson distribution (Poisson limit theorem). And then, "it is best modeled as a poison distribution because the calculations are much simpler and the approximation is sufficiently close for large n" (@Graham Kemp)