What is the amount of draws necessary to see all red cards from a standard deck of 52 cards if you draw 5 cards from the deck?

351 Views Asked by At

Problem abstraction

A standard deck of $52$ cards has $26$ red cards: it has $13$ hearts, $13$ diamonds, as well as $26$ black cards ($13$ spades, as well as $13$ clubs). Let us draw $5$ cards from the deck at once, and return those cards to the deck afterward.

What is the expected number of draws before we see all $26$ red cards?

Use case

Let us say that there is a set of $N = 100$ cards in a game. $M = 30$ cards are of rare rarity and $N - M = 70$ cards are of common rarity. We buy booster packs of size $= 10$. The question is: how many booster packs need to be bought to collect all $M = 30$ cards?

Attempted solution

I have managed to calculate the approximate number of booster packs necessary to get $M = 30$ rare cards by calculating the expectation of the above hypergeometric distribution ($\mu$) and then calculating $M/\mu$. However, this is not the correct solution since it does not take into account the possibility of collecting duplicates.

Regarding the Coupon collector's problem, I'm not sure if it is applicable since we always draw a single coupon, whereas in my use case a booster pack contains more than a single card.

Related: https://stats.stackexchange.com/questions/198915/in-the-coupon-collectors-problem-with-group-drawings-why-does-the-probability

Simulations

Problem Abstraction

$10^6$ trials were conducted, AVG: $38.947$, STDEV: $12.3653$ draws

Use case

$10^6$ trials were conducted, AVG: $38.535$, STDEV: $11.962$ draws

1

There are 1 best solutions below

5
On BEST ANSWER

Here is a solution of the "problem abstraction" by way of the Principle of Inclusion / Exclusion (PIE).

Let $T$ be the number of the first draw in which we have seen all the red cards. We would like to find $P(T>k)$ for some $k>0$, i.e. the probability that we have not seen all $26$ red cards in $k$ draws. To that end, let's say a sequence of $k$ draws has "Property $i$" if red card $i$ has not been drawn, for $i = 1,2,3,\dots,26$. Let $S_j$ be the sum of the probabilities of all the sequences with $j$ of the properties, for $j = 1,2,3,\dots,26$. For $S_j$, there are $\binom{26}{j}$ ways to select the $j$ cards which are missing. The probability that those cards are missing in a single draw is $\binom{52-j}{5} / \binom{52}{5}$, so the probability that the cards are missing in all $k$ draws is $[\binom{52-j}{5} / \binom{52}{5}]^k$. Therefore $$S_j = \binom{26}{j} \left( \frac{\binom{52-j}{5}}{ \binom{52}{5}} \right) ^k$$ By PIE, the probability of a sequence of draws with at least one of the properties, i.e. a sequence with at least one red card not seen, is $$P(T>k) = \sum_{j=1}^{26} (-1)^{j+1} S_j$$ so $$\begin{align} E(T) &= \sum_{k=0}^{\infty} P(T>k) \\ &= \sum_{k=0}^{\infty} \sum_{j=1}^{26} (-1)^{j+1} S_j \\ &= \sum_{k=0}^{\infty} \sum_{j=1}^{26} (-1)^{j+1} \binom{26}{j} \left( \frac{\binom{52-j}{5}}{ \binom{52}{5}} \right) ^k \\ &= \sum_{j=1}^{26} (-1)^{j+1} \binom{26}{j} \sum_{k=0}^{\infty} \left( \frac{\binom{52-j}{5}}{ \binom{52}{5}} \right) ^k \\ &= \sum_{j=1}^{26} (-1)^{j+1} \binom{26}{j} \frac{1}{1- \binom{52-j}{5}/ \binom{52}{5}} \\ &= 38.9133 \end{align}$$


Added February 4, 2019:

The following Monte Carlo simulation of $10^6$ trials in the R language is consistent with the result above. The average number of draws of 5-card hands necessary to see all 26 red cards was 38.973, with a 95% confidence interval of 38.91305 to 38.96158. The analytical result 38.9133 falls in the confidence interval, although just barely.

> # ndraws: return the number of draws of 5-card hands required 
> # to see all red cards at least once
> # We consider the red cards to be the cards numbered 1-26.
> ndraws <- function() {
+   seen <- rep(0, 52)
+   n <- 0
+   while (TRUE) {
+     n <- n+1
+     hand <- sample(1:52, 5)
+     seen[hand] <- 1
+     if (sum(seen[1:26]) >= 26)
+       return (n)
+   }
+ }
> nreps <- 1e6
> set.seed(1234)  # for reproducibility
> t <- replicate(nreps, ndraws())
> t.test(t)

        One Sample t-test

data:  t
t = 3145.3, df = 1e+06, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
 38.91305 38.96158
sample estimates:
mean of x 
 38.93731