Mixture of negative binomial distributions (technically some of them are geometric)

223 Views Asked by At

I have something that I am trying to compute.

Let's say that a number is uniformly generated 1-4.

What would be the expected number of generations required to get at least 3 1s and every other number was generated at least once.

I know how to calculate at least 3 1s as that would be a negative binomial distribution with p=3/4 and r=3.

I know how to calculate at least every number shows up once, since those are geometric distrubutions (or negative binomial with r=1).

How can I marry these concepts to come up with an expected value?

2

There are 2 best solutions below

7
On BEST ANSWER

This is a generalized coupon collector's problem. Two approaches you can take to this are used in the two answers at Coupon Collecting Problem: $4$ coupons with $p_1 = p_2 = \frac{1}{8}$ and $p_3 = p_4 = \frac{3}{8}$.

In the spirit of Ross's answer, you can define a state $(j,k)$ with $0\le j,k\le3$, in which you've generated $j$ $1$s and $k$ of the other numbers at least once. Then the expected number $a_{jk}$ of remaining generations satisfies the recurrence

$$ a_{jk}=1+\frac14a_{j+1,k}+\frac k4a_{jk}+\frac{3-k}4a_{j,k+1}\;, $$

where the indices aren't incremented beyond $3$ and the initial value is $a_{33}=0$. The result is

\begin{array}{c|cccc} j\setminus k&0&1&2&3\\\hline 0&\frac{1915}{144}&\frac{349}{27}&\frac{25}2&12\\ 1&\frac{125}{12}&\frac{88}9&9&8\\ 2&\frac{25}3&\frac{22}3&6&4\\ 3&\frac{22}3&6&4&0\\ \end{array}

Thus the expected number of generations is $a_{00}=\frac{1915}{144}\approx13.3$.

In the spirit of my answer, we can apply inclusion–exclusion to the four conditions $A_i$ that you've generated $i$ a sufficient number of times ($3$ for $i=1$ and $1$ otherwise). Let $N_i$ denote the number of generations required until $A_i$ is fulfilled, and let $N=\max_iN_i$ denote the number of generations required until all the conditions $A_i$ are fulfilled. Then

\begin{eqnarray*} \mathsf E[N] &=& \sum_{n=0}^\infty\mathsf P(N\gt n) \\ &=& \sum_{n=0}^\infty\mathsf P\left(\bigvee_{i\in\{1,2,3,4\}}N_i\gt n\right) \\ &=& \sum_{n=0}^\infty\sum_{\emptyset\ne S\subseteq\{1,2,3,4\}}(-1)^{|S|+1}\mathsf P\left(\bigwedge_{i\in S}N_i\gt n\right)\;. \end{eqnarray*}

Now there are two cases. If $1\notin S$, we have

$$ \mathsf P\left(\bigwedge_{i\in S}N_i\gt n\right)=4^{-n}(4-|S|)^n\;. $$

If $1\in S$, we have

$$ \mathsf P\left(\bigwedge_{i\in S}N_i\gt n\right)=\sum_{j=0}^2\binom nj4^{-n}(4-|S|)^{n-j}\;. $$

Splitting the sum over $S$ into these two cases, we obtain

\begin{eqnarray*} \sum_{n=0}^\infty\sum_{\emptyset\ne S\subseteq\{2,3,4\}}(-1)^{|S|+1}\mathsf P\left(\bigwedge_{i\in S}N_i\gt n\right) &=& \sum_{n=0}^\infty\sum_{k=1}^3(-1)^{k+1}\binom3k4^{-n}(4-k)^n \\ &=& \sum_{k=1}^3(-1)^{k+1}\binom3k\frac4k \\ &=& 12-6+\frac43 \\ &=& \frac{22}3 \end{eqnarray*}

and

\begin{eqnarray*} \sum_{n=0}^\infty\sum_{S\subseteq\{2,3,4\}}(-1)^{|S|}\mathsf P\left(\bigwedge_{i\in S\cup\{1\}}N_i\gt n\right) &=& \sum_{n=0}^\infty\sum_{j=0}^2\sum_{k=0}^3(-1)^k\binom3k\binom nj4^{-n}(4-(k+1))^{n-j} \\ &=& 4\sum_{j=0}^2\sum_{k=0}^3(-1)^k\binom3k\left(\frac1{k+1}\right)^{j+1} \\ &=& 12-4\left(\frac32-1+\frac14+\frac34-\frac13+\frac1{16}+\frac38-\frac19+\frac1{64}\right) \\ &=& \frac{859}{144}\;. \end{eqnarray*}

Together, this is

$$ \frac{22}3+\frac{859}{144}=\frac{1915}{144}\approx13.3\;, $$

in agreement with the first result.

0
On

Here’s yet another way to obtain the solution, using “Poissonization”. Instead of generating numbers at discrete times, consider $4$ independent Poisson processes with rate $1$, one for each number.

The interarrival times of a Poisson process are independently geometrically distributed, so at time $t$ the probability that a given process has generated its number is $1-\mathrm e^{-t}$ and the probability that the process that generates $1$s has generated three $1$s is given by the cumulative distribution function $1-\mathrm e^{-t}\left(1+t+\frac12t^2\right)$ of an Erlang-$3$ distribution, the distribution of the sum of $3$ independent identically geometrically distributed variables. The sum of the $4$ Poisson processes is itself a Poisson process, with rate $4$. As shown in this answer, the expected number of number generations required is equal to number of generations expected to occur in the expected time required, which in this case yields

$$ 4\int_0^\infty\left(1-\left(1-\mathrm e^{-t}\right)^3\left(1-\mathrm e^{-t}\left(1+t+\frac12t^2\right)\right)\right)\mathrm dt=\frac{1915}{144}\;, $$

in agreement with the other two approaches.