Why use Negative Binomial distribution to model count data?

836 Views Asked by At

According to Wikipedia: https://en.wikipedia.org/wiki/Negative_binomial_distribution

In probability theory and statistics, the negative binomial distribution is a discrete probability distribution that models the number of successes in a sequence of independent and identically distributed Bernoulli trials before a specified (non-random) number of failures (denoted r) occurs.

I see that the Negative Binomial distribution is usually used to model count data, especially in the insurance industry. However, I don't see why it should be used when it models number of success before some failures occur. For example, it is used to model the number of catastrophic events happening in 1 year and I don't see anything to do with "number of success before some failures".

Could you please explain me why we use Negative Binomial distribution to model count data, even when the concept of "number of success before $r$ failures" doesn't exist ?

Thank you very much for your help!

1

There are 1 best solutions below

2
On

The better motivation for the use of the negative binomial for count data is that the negative binomial is a gamma mixture of poisson random variables. To see this, suppose that $y|\lambda \sim \text{Poisson}(\lambda)$, i.e. given a fixed value of $\lambda$, $y$ is poisson distributed. Further, assume that $\lambda \sim \text{Gamma}(a,b)$. Then you can show that the marginal distribution of $y$ has the negative binomial distribution, i.e. by solving:

$$ p(y) = \int p(y|\lambda) p(\lambda) d\lambda $$

The most basic approach to count data is to assume that $y$ is poisson with a fixed $\lambda$. This can however fail to capture certain aspects of count data, since it assums that the mean and the variance of $y$ are equal. Whereas in the negative binomial case we no longer have this issue