I am trying to model a simple game (a more complex problem I am working on reduces to it).
- You roll an $n$-sided dice labeled 1 to $n$. (This can be a normal 6-sided dice if you'd like.)
- If the dice shows 1, you draw a card from a deck of $m$ cards labeled 1 to $m$. (This can be a normal 52-card deck if you'd like.) You record which card you drew, replace it in the deck, and shuffle.
- This process repeats until you draw a card you have previously seen.
The question: what is the distribution of the expected length of this game, in terms of the number of dice rolls?
I am able to model the individual sub-games without an issue:
Game 1, the dice game, is geometrically distributed with $p=1/n$. On average, we expect to see a 1 every $n$ throws.
Game 2, the card game, is a bit more complex, but we can model it like this:
- On our $(i+1)$-th draw, we have already seen $i$ cards, all of which are unique (if not, the game would already be over).
- Therefore, the probability the given card $i+1$ matches one of the previous cards is $\frac{i}{m}$ for a deck of size $m$.
- Factoring in the probability of not having already won, the distribution of winning the game on draw $i$ is
$$ P(i) = \frac{m!/(m-i)!}{m^i} \cdot \frac{i}{m} $$
(The numerator of the left fraction is just the number of permutations of $i$ items from $m$ items.)
I can't find a closed form for the expected value of this PDF, but it matches simulations given a value of $m$. For example, consider the standard case of $n=6$ and $m=52$.
- We expect a 1 on the dice every 6 rolls.
- We expect to draw a duplicate card after 9.7 draws, on average. (Solved numerically in Mathematica and confirmed in simulation.)
- So, we expect the game to last $\approx 58$ rolls on average, since we can multiply expectations of independent events.
But what is the actual distribution of game lengths?
Here's some simulation data. I play the game 100,000 times:
The simulation code looks like this:
def play_game():
cards_seen = set()
for i in it.count(1):
dice = random.randint(1, 6)
if dice != 1:
continue
card = random.randint(1, 52)
if card in cards_seen:
return i, 1 + len(cards_seen)
cards_seen.add(card)
Statistics from the simulation:
- Mean number of turns: 58.19
- ...median: 54
- Mean number of cards drawn: 9.69
- ...median: 9
- Implied number of turns/per card (i.e., dice rolls 1): 6.005
This looks very familiar (i.e., vaguely negative-binomial), but the changing probability of pulling a matching card makes such an analysis tricky.
I know the combination of distributions can be computed via convolution, but I'm not clear how I would apply that here, since these two distributions are not really independent — whether we play the card subgame is directly dependent on the results of the dice subgame.
I believe this game can be modeled as a Markov chain, but I'm not sure if that helps with the analysis.
- We have dice states $D_k$ and card states $C_k$.
- We start at state $D_1$ and move to state $C_1$ with probability $1/6$, or remain at state $D_1$ with probability $5/6$.
- At some state $C_k$, we end the game with probability $P(k)$ from above, and move to state $D_{k+1}$ with probability $1-P(k)$.
Now the question reduces to finding the distribution of the number of steps in this chain.


Lets start with deck game only.
Call $T$ the random time (# of deck extraction) where we see the first duplicate.
We can use this formula for expectation: $$E[T]=\sum_{k=0}^{m}P(T>k)$$ and observe that $P(T>k)=\frac{m!}{(m-k)!}\frac{1}{m^k}$, as it requires that in k extractions, we get k different cards.
Adding the die-part, will just delay the inevitable by a factor of $n$ on average.
$$$$
For the distribution part lets introduce $S$ for the random time of the overall game and $X_k$ for the numbers of 1s in the first k rolls (it is a Binomial).
So we have that: $$ P(S=k)=\sum_{p=0}^{\min\{k-1, m\}} P(S=k \ \bigcap \ X_{k-1}=p) $$
A bit of intuition. If S=k, it means that the k-th iteration gets a 1 with a card already seen, while all previous extractions were unique.
But how many extractions? Lets cycle through them leveraging $X_k-1$.
Finally we get the following (unsimplified) expression: $$ P(S=k \ \bigcap \ X_{k-1}=p) = \binom{k-1}{p}\ \left(\frac{1}{n}\right)^p\ \left(1-\frac{1}{n}\right)^ {(k-1)-p}\ \frac{1}{n}\ \ \frac{1}{m^p}\ \ \binom{m}{p} \ p\ !\ \ \frac{p}{m} $$
$$$$ I leave some R code that I have used to view results.