Coupon collector's variation with unequal probabilities and uneven number of items required

513 Views Asked by At

I am not sure about this being a negative binomial distribution problem or a variation of Coupon collector's problem.

Here's the problem, Suppose, you want to build a house and I told you that you need certain type and certain number of items to build it. Let's consider you need

Bricks $- ~4$,

Cement $-~ 1$,

Metal $- ~1$,

Gravel $- \ 1$,

Wood $-\ 2$

Now, in order to get these items, you need to open a locker and every time you open it you get only one item. Also, know that the locker contains an item that you don't need at all - Feathers. The following are their probabilities

(Let's use their initials for the sake of simplicity)

B - 30%

C - 5%

F - 10%

G - 10%

M - 35%

W - 10%

The question here is what's the average number of times you will have to open the locker if you want to build a house?

The way I thought about solving this is by first multiplying the no of items required for a type by its expected number which is $\frac{1}{probability}$ for that item (for ex. $3.33$ times for Bricks multiplied by the number we want which is $4$ ) and then adding together for all the type of materials we want. I am not so good at this type of problems so please guide me.

4

There are 4 best solutions below

5
On

A careful analysis is hard because of the lack of symmetry. If one item is rare you can assume you have enough of all the rest by the time you get that one. Here both cement and wood need on average $20$ lockers to give you what you need.

Since both cement and wood are rare we will ignore the correlation that comes from the fact that if you get cement you can't get wood. Then if you open $n$ lockers the chance you have gotten cement is $1-0.95^n$. The chance you have gotten two wood or more is $1-0.9^n-n\cdot 0.1 \cdot 0.9^{n-1}$. The easy calculation is what is the $n$ that guarantees the product of these exceeds $0.5$, which will be close to the expected number of tries to get them both. We can ask Alpha and learn you cross the threshold at $n=24$

3
On

As I mentioned in a comment, this can be analyzed as a finite-state absorbing Markov chain. The Wikipedia article explains the calculation in the section titled "Expected number of steps."

I wrote a python script to do this.

from itertools import product
import numpy as np

needs = [4,1,1,1,2]
probs = [0.3,0.05,0.1,0.35,0.1]

P = np.zeros((120,120))
index = { }
states = list(product(range(5),range(2),range(2),range(2),range(3)))
for idx, state in enumerate(states):
    index[state] = idx
    P[idx, idx] = .1
for idx, state in enumerate(states):
    for i in range(5):
        j = state[i] if state[i] >= needs[i] else state[i]+1
        target = tuple(state[k] if k != i else j for k in range(5))
        tgt = index[target]
        P[idx, tgt] += probs[i]

Q=P[:-1,:-1]
N = np.linalg.inv(np.eye(119)-Q)
t = N.sum(axis=1)
print(f'Expected number of lockers: {t[0]}')

This produced the output

Expected number of lockers: 31.374487169390555
1
On

One approach is to find the expected number of lockers by means of an exponential generating function (EGF). If you are not familiar with generating functions, you might be interested in some of the resources mentioned in the answer to this question: How can I learn about generating functions?

As a start, we will find the EGFs for the number of ways to find four bricks, one bag of cement, etc., weighted by their probabilities.

The EGF for four or more bricks is $$\begin{align} F_B(x) &= \frac{1}{4!} 0.3^4 x^4 + \frac{1}{5!} 0.3^5 x^5 + \frac{1}{6!} 0.3^6 x^6 + \dots \\ &= e^{.0.3 x} - 1 - 0.3 x - \frac{1}{2!} 0.3^2 x^2 - \frac{1}{3!} 0.3^3 x^3 \end{align}$$ The EGF for one or more bags of cement is $$\begin{align} F_C(x) &= 0.05 x + \frac{1}{2!} 0.05^2 x^2 + \frac{1}{3!} 0.05^3 x^3 + \dots \\ &= e^{0.05 x} - 1 \end{align}$$ The EGF for zero or more bags of feathers is $$\begin{align} F_F(x) &= 1 + 0.1 x + \frac{1}{2!} 0.1^2 x^2 + \dots \\ &= e^{0.1 x} \end{align}$$ (I hope you see the pattern by now, so I'm going to move a little more quickly...)

The EGF for one or more bags of gravel is $$F_G(x) = e^{0.1 x} - 1$$ for one or more metal $$F_M(x) = e^{0.35 x} -1$$ for two or more wood $$F_W(x) = e^{0.1 x} - 1 - 0.1 x$$ With these preliminaries out of the way, we have an easy way to find the EGF of the probability of a sequence of $n$ lockers (not necessarily the least such $n$) in which we have all the materials we need: $$g(x) = F_B(x) \cdot F_C(x) \cdot F_F(x) \cdot F_G(x) \cdot F_M(x) \cdot F_W(x)$$ That is, the coefficient of $(1/n!) \; x^n$ in $g(x)$ is the probability $p_n$ that we have a complete set of materials on or before opening the $n$th locker.

Now let's define $T_n$ as the number of the locker in which we first have a complete set of materials. Then $P(T_n > n) = 1-p_n$. Let's define $q_n = 1-p_n$; then the EGF of $q_n$ is $e^x - g(x)$. By a well-known theorem, $$E(T_n) = \sum_{n=0}^{\infty} P(T_n > n) = \sum_{n=0}^{\infty} q_n$$

We can find this sum from the EGF for $q_n$ by taking advantage of the identity $$\int_0^{\infty} x^n e^{-x} \; dx= n!$$ from which $$E(T_n) = \int_0^{\infty} e^{-x}(e^x - g(x)) \; dx$$ Numerical evaluation of this integral in Mathematica yields $$E(T_n) = 31.3745$$

0
On

Some great answers have already been provided. You can also do this using inclusion–exclusion.

There are five conditions to fulfill, namely to have obtained the required amount of each of the five materials. Denote by $N$ the number of lockers needed to fulfill all five conditions. Denote by $N_i$ the number of lockers needed to fulfill condition $i$, by $N_{ij}$ the number of lockers needed to fulfill at least one of conditions $i$ and $j$, and so on. Then by inclusion–exclusion

$$ P(N\gt n)=\sum_iP(N_i\gt n)-\sum_{\{i,j\}}P(N_{ij}\gt n)+\sum_{\{i,j,k\}}P(N_{ijk}\gt n)-\cdots\;. $$

Summing over $n$ yields the corresponding expression for the expectations:

$$ E[N]=\sum_iE[N_i]-\sum_{\{i,j\}}E[N_{ij}]+\sum_{\{i,j,k\}}E[N_{ijk}]-\cdots\;. $$

Denote by $m_i$ the amount of material $i$ required and by $p_i$ the probability to obtain material $i$ in a locker.

Then $E[N_i]=\frac{m_i}{p_i}$. Similarly, if $m_i=m_j=1$, then $E[N_{ij}]=\frac1{p_i+p_j}$, and if $m_i=m_j=m_k=1$, then $E[N_{ijk}]=\frac1{p_i+p_j+p_k}$.

If $m_i\gt1$ and $m_j=1$, then

$$ P(N_{ij}\gt n)=\sum_{l=0}^{m_i-1}\binom nlp_i^l(1-p_i-p_j)^{n-l} $$

and

\begin{eqnarray*} E[N_{ij}] &=& \sum_{n=0}^\infty P(N_{ij}\gt n) \\ &=& \sum_{n=0}^\infty\sum_{l=0}^{m_i-1}\binom nlp_i^l(1-p_i-p_j)^{n-l} \\ &=& \sum_{l=0}^{m_i-1}\left(\frac{p_i}{1-p_i-p_j}\right)^l\sum_{n=0}^\infty\binom nl(1-p_i-p_j)^n \\ &=& \sum_{l=0}^{m_i-1}\left(\frac{p_i}{1-p_i-p_j}\right)^l\frac{(1-p_i-p_j)^l}{(p_i+p_j)^{l+1}} \\ &=& \frac1{p_i+p_j}\sum_{l=0}^{m_i-1}\left(\frac{p_i}{p_i+p_j}\right)^l \\ &=& \frac1{p_j}\left(1-\left(\frac{p_i}{p_i+p_j}\right)^{m_i}\right)\;. \end{eqnarray*}

The calculation is essentially the same if we include more than one material of which we require only $1$, e.g. $j$ and $k$ with $m_j=m_k=1$, with $p_j+p_k$ taking the role of $p_j$ above:

$$ E[N_{ijk}]=\frac1{p_j+p_k}\left(1-\left(\frac{p_i}{p_i+p_j+p_k}\right)^{m_i}\right)\;. $$

Having in mind this way of including any number of conditions with requirement $1$, let's do $m_i\gt1$, $m_j\gt1$ with $m_k=1$ included right away, and we can set $p_k=0$ to get the result for just $m_i\gt1$, $m_j\gt1$ alone:

$$ P(T_{ijk}\gt n)=\sum_{l=0}^{m_i-1}\sum_{r=0}^{m_j-1}\binom nk\binom{n-k}rp_i^lp_j^r(1-p_i-p_j-p_k)^{n-l-r}\;, $$

and thus

\begin{eqnarray*} E[N_{ijk}] &=& \sum_{n=0}^\infty P(T_{ijk}\gt n) \\ &=& \sum_{n=0}^\infty\sum_{l=0}^{m_i-1}\sum_{r=0}^{m_j-1}\binom nk\binom{n-k}rp_i^lp_j^r(1-p_i-p_j-p_k)^{n-l-r} \\ &=& \sum_{l=0}^{m_i-1}\sum_{r=0}^{m_j-1}\frac{p_i^lp_j^r}{(1-p_i-p_j-p_k)^{l+r}}\sum_{n=0}^\infty\binom nk\binom{n-k}r(1-p_i-p_j-p_k)^n \\ &=& \sum_{l=0}^{m_i-1}\sum_{r=0}^{m_j-1}\frac{p_i^lp_j^r}{(1-p_i-p_j-p_k)^{l+r}} \binom{l+r}l\frac{(1-p_i-p_j-p_k)^{l+r}}{(p_i+p_j+p_k)^{l+r+1}} \\ &=& \frac1{p_i+p_j+p_k}\sum_{l=0}^{m_i-1}\sum_{r=0}^{m_j-1}\binom{l+r}l\left(\frac{p_i}{p_i+p_j+p_k}\right)^l\left(\frac{p_j}{p_i+p_j+p_k}\right)^r\;. \end{eqnarray*}

Now we have all the ingredients for our $31$-term inclusion–exclusion sum:

$$ E[N]= \frac4{\frac3{10}} +\frac1{\frac1{20}} +\frac1{\frac7{20}} +\frac1{\frac1{10}} +\frac2{\frac1{10}} -\frac1{\frac1{20}}\left(1-\left(\frac{\frac3{10}}{\frac3{10}+\frac1{20}}\right)^4\right) -\frac1{\frac7{20}}\left(1-\left(\frac{\frac3{10}}{\frac3{10}+\frac7{20}}\right)^4\right) -\frac1{\frac1{10}}\left(1-\left(\frac{\frac3{10}}{\frac3{10}+\frac1{10}}\right)^4\right) -\frac1{\frac1{20}}\left(1-\left(\frac{\frac1{10}}{\frac1{10}+\frac1{20}}\right)^2\right) -\frac1{\frac7{20}}\left(1-\left(\frac{\frac1{10}}{\frac1{10}+\frac7{20}}\right)^2\right) -\frac1{\frac1{10}}\left(1-\left(\frac{\frac1{10}}{\frac1{10}+\frac1{10}}\right)^2\right) -\frac1{\frac1{20}+\frac7{20}} -\frac1{\frac7{20}+\frac1{10}} -\frac1{\frac1{10}+\frac1{20}} +\frac1{\frac1{20}+\frac7{20}}\left(1-\left(\frac{\frac3{10}}{\frac3{10}+\frac1{20}+\frac7{20}}\right)^4\right) +\frac1{\frac7{20}+\frac1{10}}\left(1-\left(\frac{\frac3{10}}{\frac3{10}+\frac7{20}+\frac1{10}}\right)^4\right) +\frac1{\frac1{10}+\frac1{20}}\left(1-\left(\frac{\frac3{10}}{\frac3{10}+\frac1{10}+\frac1{20}}\right)^4\right) +\frac1{\frac1{20}+\frac7{20}}\left(1-\left(\frac{\frac1{10}}{\frac1{10}+\frac1{20}+\frac7{20}}\right)^2\right) +\frac1{\frac7{20}+\frac1{10}}\left(1-\left(\frac{\frac1{10}}{\frac1{10}+\frac7{20}+\frac1{10}}\right)^2\right) +\frac1{\frac1{10}+\frac1{20}}\left(1-\left(\frac{\frac1{10}}{\frac1{10}+\frac1{10}+\frac1{20}}\right)^2\right) +\frac1{\frac1{20}+\frac7{20}+\frac1{10}} -\frac1{\frac1{20}+\frac7{20}+\frac1{10}}\left(1-\left(\frac{\frac3{10}}{\frac3{10}+\frac1{20}+\frac7{20}+\frac1{10}}\right)^4\right) -\frac1{\frac1{20}+\frac7{20}+\frac1{10}}\left(1-\left(\frac{\frac1{10}}{\frac1{10}+\frac1{20}+\frac7{20}+\frac1{10}}\right)^2\right)\\ +\sum_{l=0}^3\sum_{r=0}^1\binom{l+r}r\left( -\frac1{\frac3{10}+\frac1{10}}\left(\frac{\frac3{10}}{\frac3{10}+\frac1{10}}\right)^l\left(\frac{\frac1{10}}{\frac3{10}+\frac1{10}}\right)^r +\frac1{\frac3{10}+\frac1{10}+\frac1{20}}\left(\frac{\frac3{10}}{\frac3{10}+\frac1{10}+\frac1{20}}\right)^l\left(\frac{\frac1{10}}{\frac3{10}+\frac1{10}+\frac1{20}}\right)^r\\ +\frac1{\frac3{10}+\frac1{10}+\frac7{20}}\left(\frac{\frac3{10}}{\frac3{10}+\frac1{10}+\frac7{20}}\right)^l\left(\frac{\frac1{10}}{\frac3{10}+\frac1{10}+\frac7{20}}\right)^r\\ +\frac1{\frac3{10}+\frac1{10}+\frac1{10}}\left(\frac{\frac3{10}}{\frac3{10}+\frac1{10}+\frac1{10}}\right)^l\left(\frac{\frac1{10}}{\frac3{10}+\frac1{10}+\frac1{10}}\right)^r\\ -\frac1{\frac3{10}+\frac1{10}+\frac1{20}+\frac7{20}}\left(\frac{\frac3{10}}{\frac3{10}+\frac1{10}+\frac1{20}+\frac7{20}}\right)^l\left(\frac{\frac1{10}}{\frac3{10}+\frac1{10}+\frac1{20}+\frac7{20}}\right)^r\\ -\frac1{\frac3{10}+\frac1{10}+\frac7{20}+\frac1{10}}\left(\frac{\frac3{10}}{\frac3{10}+\frac1{10}+\frac7{20}+\frac1{10}}\right)^l\left(\frac{\frac1{10}}{\frac3{10}+\frac1{10}+\frac7{20}+\frac1{10}}\right)^r\\ -\frac1{\frac3{10}+\frac1{10}+\frac1{10}+\frac1{20}}\left(\frac{\frac3{10}}{\frac3{10}+\frac1{10}+\frac1{10}+\frac1{20}}\right)^l\left(\frac{\frac1{10}}{\frac3{10}+\frac1{10}+\frac1{10}+\frac1{20}}\right)^r\\ +\frac1{\frac3{10}+\frac1{10}+\frac1{20}+\frac7{20}+\frac1{10}}\left(\frac{\frac3{10}}{\frac3{10}+\frac1{10}+\frac1{20}+\frac7{20}+\frac1{10}}\right)^l\left(\frac{\frac1{10}}{\frac3{10}+\frac1{10}+\frac1{20}+\frac7{20}+\frac1{10}}\right)^r \right) \\ = \frac{40}3 +20 +\frac{20}7 +10 +20 -20\left(1-\left(\frac67\right)^4\right) -\frac{20}7\left(1-\left(\frac6{13}\right)^4\right) -10\left(1-\left(\frac34\right)^4\right) -20\left(1-\left(\frac23\right)^2\right) -\frac{20}7\left(1-\left(\frac29\right)^2\right) -10\left(1-\left(\frac12\right)^2\right) -\frac52 -\frac{20}9 -\frac{20}3 +\frac52\left(1-\left(\frac37\right)^4\right) +\frac{20}9\left(1-\left(\frac25\right)^4\right) +\frac{20}3\left(1-\left(\frac23\right)^4\right) +\frac52\left(1-\left(\frac15\right)^2\right) +\frac{20}9\left(1-\left(\frac2{11}\right)^2\right) +\frac{20}3\left(1-\left(\frac25\right)^2\right) +2 -2\left(1-\left(\frac38\right)^4\right) -2\left(1-\left(\frac16\right)^2\right)\\ +\sum_{l=0}^3\sum_{r=0}^1\binom{l+r}r\left( -\frac52\left(\frac34\right)^l\left(\frac14\right)^r +\frac{20}9\left(\frac23\right)^l\left(\frac29\right)^r +\frac43\left(\frac25\right)^l\left(\frac2{15}\right)^r +2\left(\frac35\right)^l\left(\frac15\right)^r -\frac54\left(\frac38\right)^l\left(\frac18\right)^r -\frac{20}{17}\left(\frac6{17}\right)^l\left(\frac2{17}\right)^r -\frac{20}{11}\left(\frac6{11}\right)^l\left(\frac2{11}\right)^r +\frac{10}9\left(\frac13\right)^l\left(\frac19\right)^r \right) \\ = \frac{40}3 -20 -\frac{20}7 -10 +20 +20\left(\frac67\right)^4 +\frac{20}7\left(\frac6{13}\right)^4 +10\left(\frac34\right)^4 +20\left(\frac23\right)^2 +\frac{20}7\left(\frac29\right)^2 +10\left(\frac12\right)^2 +\frac52 +\frac{20}9 +\frac{20}3 -\frac52\left(\frac37\right)^4 -\frac{20}9\left(\frac25\right)^4 -\frac{20}3\left(\frac23\right)^4 -\frac52\left(\frac15\right)^2 -\frac{20}9\left(\frac2{11}\right)^2 -\frac{20}3\left(\frac25\right)^2 -2 +2\left(\frac38\right)^4 +2\left(\frac16\right)^2\\ -\frac{1345}{128} +\frac{16940}{2187} +\frac{14716}{5625} +\frac{3756}{625} -\frac{9555}{4096} -\frac{2984740}{1419857} -\frac{780580}{161051} +\frac{4180}{2187} \\[15pt] =\frac{36726346111860961183807819781}{1170579965612689097244979200} \\[15pt] \approx31.37448716939056356\;, $$

in agreement with the other answers.