Multinomial distribution: probability that at least one outcome didn't occur

61 Views Asked by At

I'm trying to find the probability, that in a group of $N$ people, there are no people from at least one district with populations $n_{i}$ (for $i \in \mathbb{N}$ ranging from $0$ to $k$, where $k+1$ is the number of districts) for every district respectively. This boils down to questioning the probability of at least one outcome not occuring in $m$ trials in multinomial distribution. How could I approach this problem?

1

There are 1 best solutions below

0
On

The problem seems conceptually simple, if computationally a bit exhausting, assuming that every selection retains the same probability (ie we are indeed looking at independent draws from the population). We simply need to sum up the probability of every outcome where at least one $n_i=0$. It's a daunting task, so let's look at simpler versions of the problem and work up from there.

The simplest problem would be where $k$ $n_i$ values are zero - in effect, the probability that everyone is from a single district. That's very easy to calculate as below:

$$\sum_{a=0}^{k}\left (\frac{n_a}{\sum_i{n_i}}\right )^N$$

The next simplest problem is where $k-1$ or more values are zero - so, the probability that everyone is from two districts. This is functionally equivalent to a sum of binomial probabilities, with some extra iterators to allow us to go through every option:

$$\sum_{x=0}^{k}\sum_{y=x+1}^{k}\sum_{a=0}^{N}\binom{N}{a}\left (\frac{n_x}{\sum_i{n_i}}\right )^a\left (\frac{n_y}{\sum_i{n_i}}\right )^{N-a}$$

It's important to note that this would, in fact, include the previous equation - it should iterate through all probabilities of all unique combinations.

The next simplest is where $k-2$ or more values are zero - so, that everyone is from 3 districts. This is functionally equivalent to a sum of 3-multinomial probabilities, again with extra iterators to manage the additional degrees of freedom:

$$\sum_{x=0}^{k}\sum_{y=x+1}^{k}\sum_{z=y+1}^{k}\sum_{a=0}^{N}\sum_{b=0}^{N-a}\binom{N}{a,b,\left (N-a-b\right )}\left (\frac{n_x}{\sum_i{n_i}}\right )^a\left (\frac{n_y}{\sum_i{n_i}}\right )^{b}\left (\frac{n_z}{\sum_i{n_i}}\right )^{N-a-b}$$

The pattern, I believe, should be getting pretty obvious. Your initial problem is equivalent to a sum of $k$-multinomial probabilities. Iterating through all of those probabilities would take $2k-1$ iterators - $k$ iterators for iterating through all combinations of $n_i$ values, and $k-1$ iterators for iterating through the multinomial. Unsurprisingly, this approach doesn't scale well for large values of $k$!