How do I approach the coupon collector's problem if multiple items belong to different "families"?

84 Views Asked by At

Say we roll a fair six-sided dice, and our goal is to roll every number at least once. The expected number of rolls is fairly simple: we find $E(x)=\int_0^\infty(1-(1-e^{-\frac{t}{6}})^6)dt=6(1+\frac{1}{2}+\frac{1}{3}+\frac{1}{4}+\frac{1}{5}+\frac{1}{6})=14.7$.

However, say instead that our goal is to not roll every number at least once, but to roll three different numbers, each one from a different family - we want $a,b,$ and $c$ such that $a$ is an even number, $b$ is a prime number, and $c$ is a square number, with the stipulation that $a\neq b\neq c$. So, we want an element from $A=\{2,4,6\}$, one from $B=\{2,3,5\}$, and one from $C=\{1,4\}$, and we don't want any duplicates. What is the expected number of rolls to complete this task?

Without the no-duplicates rule, this is still simple. It would be $E(X)=\int_0^\infty(1-(1-e^{-\frac{t}{2}})^2(1-e^{-\frac{t}{3}}))dt=4.35$. However, with the rule, it becomes more difficult. I've considered going on a case-by-case basis - that is, if our first roll is 1, then we look at the probability of getting different members of $A$ and $B$, if we roll a 2, we look at the probability of getting either different members from $B/\{2\}$ and $C$ or $A/\{2\}$ and $C$, and so on - but I wanted to know if there's a faster way to do this. Any help is appreciated. Thank you!

2

There are 2 best solutions below

1
On BEST ANSWER

Your result without the no-duplicates rule is wrong, so let’s derive the correct result for that case, first.

Let $X$ be the number or rolls required, and let $A_x$, $B_x$, $C_x$ denote the events that no number in the corresponding family has been rolled yet. Then, by inclusion–exclusion,

\begin{eqnarray} \mathsf E[X]&=&\sum_{x=0}^\infty\mathsf P(X>x)\\ &=&\sum_{x=0}^\infty\mathsf P(A_x\cup B_x\cup C_x)\\ &=&\sum_{x=0}^\infty(\mathsf P(A_x)+\mathsf P(B_x)+\mathsf P(C_x)-\mathsf P(A_x\cap B_x)-\mathsf P(A_x\cap C_x)-\mathsf P(B_x\cap C_x)\\&&\quad{}+\mathsf P(A_x\cap B_x\cap C_x))\\[5pt] &=&6\left(\frac13+\frac13+\frac12-\frac15-\frac14-\frac15+\frac16\right)\\[5pt] &=&4.1\;. \end{eqnarray}

The result is confirmed by simulations using this Java code.

If I understand your no-duplicates rule correctly, the process terminates once we’ve rolled all three rolls in one of the thirteen triples in

$$T=\{123,124,125,126,134,136,145,156,234,245,246,346,456\}\;,$$

where to reduce notational clutter I denoted e.g. the triple $\{1,2,3\}$ by $123$. Let $Y$ denote the number of rolls required, and for a triple $t\in T$, let $E_t$ denote the event that not all rolls in $t$ have been rolled yet. Then, again by inclusion–exclusion,

\begin{eqnarray} \mathsf E[Y]&=&\sum_{y=0}^\infty\mathsf P\left(\bigcup_{t\in T}E_t\right)\\ &=&\sum_{y=0}^\infty\sum_{\emptyset\neq S\subseteq T}(-1)^{|S|+1}\mathsf P\left(\bigcap_{t\in S}E_t\right)\\ &=&\sum_{\emptyset\neq S\subseteq T}(-1)^{|S|+1}\sum_{y=0}^\infty\mathsf P\left(\bigcap_{t\in S}E_t\right)\\ &=&6\sum_{\emptyset\neq S\subseteq T}(-1)^{|S|+1}H_{\left|\bigcup_{t\in S}t\right|}\;, \end{eqnarray}

where $H_n$ is the $n$-th harmonic number. This Java code adds up the $2^{13}-1$ terms in this sum; the result is

\begin{eqnarray} \mathsf E[Y]&=&6(13H_3-25H_4+17H_5-4H_6)\\ &=&6\left(1+\frac12+\frac13-\frac{12}4+\frac{13}5-\frac46\right)\\ &=&4.6\;. \end{eqnarray}

The result is confirmed by simulations also performed by that code.

Thus, the no-duplicates rule increases the expected number of required rolls by half a roll.

3
On

I am sorry, after the clarifications you have given subsequent to my posting an answer based on my understanding of the question, I doubt whether a cut and dried formula can be given the way you have given for other cases. Some sort of dynamic programming seems to be indicated.