What is the expected number of runs of same color in a standard deck of cards?

7.9k Views Asked by At

Standard deck has $52$ cards, $26$ Red and $26$ Black. A run is a maximum contiguous block of cards, which has the same color.

Eg.

  • $(R,B,R,B,...,R,B)$ has $52$ runs.
  • $(R,R,R,...,R,B,B,B,...,B)$ has $2$ runs.

What is the expected number of runs in a shuffled deck of cards?

3

There are 3 best solutions below

7
On BEST ANSWER

The expected number of runs is 27.

Let $X_n$ be the color of the n'th card. For n<52, the n'th card is the end of a run if and only if $X_n\not=X_{n+1}$ and the last card in the pack is always the end of a run. So, the total number of runs is $$ N=\sum_{n=1}^{51}1_{\{X_n\not=X_{n+1}\}}+1. $$ The expected number of runs is $$ \mathbb{E}[N]=\sum_{n=1}^{51}\mathbb{P}(X_n\not=X_{n+1})+1. $$ Whatever colour the n'th card is, there are 51 remaining cards in the deck of which 26 of them are a different colour from $X_n$. So, $\mathbb{P}(X_n\not=X_{n+1})=26/51$, giving $$ \mathbb{E}[N]=51 (26/51) + 1=27. $$

1
On

An alternative thought process following the thinking of Calvin Lin & Mining in the comments of George Lowther's solution. It is more involved, so not the kind of back-of-the-envelop solution this post is looking for, educational nonetheless.

Conjecture

Suppose our deck of cards consists of $n$ red cards and $n$ black cards respectively (usually, $n=26$). Define $U$ as the total number of runs in shuffled sequence of our deck. By empirical observation, it seems that $$ \mathbb{P}\{U=2k\}=\mathbb{P}\{U=2[n-(k-1)]\}. $$ For example, the following charts show average frequencies of runs in the cases of $n=4$ and $n=5$.

enter image description here

In our case of $n=26$, this means that $$ \mathbb{P}\{U=2\}=\mathbb{P}\{U=52\},~ \mathbb{P}\{U=3\}=\mathbb{P}\{U=50\},\cdots. $$ If this were true, then, by symmetry, our expected number of runs is simply $$ \dfrac{2k+2[n-(k-1)]}{2} = n+1, $$ or 27 for a normal deck of cards. This is the same answer that George Lowther had calculated.

Proof Part I

Wald and Wolfowitz calculated the closed form probabilities of $\mathbb{P}(U=2k)$ and $\mathbb{P}(U=2k-1)$. I will walk through the former since the latter follows the exact logic.

Following the notation in Wald and Wolfowitz, let $r_{Rj}$ be the length of the $j$-th red run and, similarly, $r_{Bj}$ be the length of the $j$-th black run. Let $e_R$ be the total number of red runs and, similarly, $e_B$ be the total number of black runs. Obviously,

$$ \sum_{j=1}^{e_R}r_{Rj}=\sum_{j=1}^{e_B}r_{Bj}=n,\\ 1\leq e_R, e_B \leq n,\\ \text{and }|e_R-e_B|\leq 1. $$

The last property follows since red and black runs must be alternating by definition of runs. Therefore, if $U=2k$, then $e_R=e_B=k$. There are 2 sub-cases with equal likelihood: whether the sequence starts with red or black. Therefore, the complete sequence is fully described by

  • the color of the first card, and
  • the sequence of run lengths $r_{*j}$, where * can be R or B.

Suppose that the first card is fixed, say red, the number of all possible sequences in this case equal the product of

  1. number of ways to arrange the red runs $r_{Rj}$ such that they partition $n$ into $k$ positive summands. In symbols, $\sum_{j=1}^{k}r_{Rj}=n$, $1\leq k\leq n$.
  2. do the same for black. In symbols, $\sum_{j=1}^{k}r_{Bj}=n$, $1\leq k\leq n$.

Observe that both 1 and 2 are exactly the coefficient of $a^n$ in the expansion of

$$ (a+a^2+a^3+\cdots)^k=\left(\dfrac{a}{1-a}\right)^k. $$

We just need to get that coefficient.

Lemma

If $f(a)=(1-a)^{-1}$, then the $k$-th derivative $f^{(k)}(a)=k!(1-a)^{-(k+1)}$. This implies that $$ \left(\dfrac{1}{1-a}\right)^k = \dfrac{1}{(k-1)!}f^{(k-1)}(a). $$

On the other hand, by differentiating the geometric expansion of $f(a)=1+a+a^2+\cdots$, we have $$ f^{(k-1)}(a)=\sum_{n=0}^{\infty}\dfrac{(n+k-1)!}{(k-1)!}a^n. $$

Plugging this into the previous equation, we get $$ \left(\dfrac{a}{1-a}\right)^k = \dfrac{a^k}{(k-1)!}\sum_{n=0}^{\infty}\dfrac{(n+k-1)!}{(k-1)!}a^n =\sum_{n=k}^{\infty}{{n-1}\choose{k-1}}a^n. $$

Proof Part II: Conclusion

To conclude, we've shown that there are ${n-1}\choose{k-1}$ ways to satisfy 1 and 2 above. Their product ${{n-1}\choose{k-1}}^2$ is the total number of sequences that have $U=2k$ runs starting with a red card. We conclude by observing the cases of $U=2k$ and $U=2[n-(k-1)]$ and that ${{n-1}\choose{k-1}} = {{n-1}\choose{n-(k-1)-1}}$, hence justifying our conjecture.

Reference

A. Wald, J. Wolfowitz "On a Test Whether Two Samples are from the Same Population", The Annals of Mathematical Statistics, Ann. Math. Statist. 11(2), 147-162, (June, 1940), accessed from Project Euclid

Code Listing

The following Python code is used to generate the above plots.

import numpy as np
from itertools import permutations
from matplotlib import pyplot as plt

def count_runs(outcome:str):
    outcome=np.array(list(outcome))
    cuts = np.where(outcome[:-1]!=outcome[1:])[0]
    lengths = np.diff(cuts, prepend=-1, append=len(outcome)-1)
    return 1+len(cuts), lengths

###########################
# Customize the Following #
###########################
num_of_cards = [8, 10] # easy on this number, remember this is factorial time!
#######################
# Customize the Above #
#######################

fg, ax = plt.subplots(1, len(num_of_cards), figsize=(10,4))

for i in range(len(num_of_cards)):
    n = num_of_cards[i]
    if n % 2:
        raise ValueError("num_of_cards needs to be even.")

    deck = (n//2)*"B"+(n//2)*"R"
    all_outcomes = tuple({''.join(p) for p in permutations(deck)})
    print("There are {} possible shuffling outcomes.".format(len(all_outcomes)))
    runs_counts = np.zeros(len(deck))

    for j in all_outcomes:
        count = count_runs(j)[0]
        runs_counts[count-1] += 1
        #print(i, count)

    ax[i].bar(np.array([i+1 for i in range(1,len(runs_counts))]), runs_counts[1:])
    ax[i].set_xlabel('Number of Runs in Outcome with {} Cards ($n={}$)'.format(n, n//2))
    ax[i].set_ylabel('Frequency Count');
    #ax[i].grid()
0
On

Note: This solution works because of the happy "coincidence" that the number of ways to get $k$ runs is the same as the number of ways to get $2n+2 - k$ runs. We will prove that first, then apply it.

Lemma: The number of ways to achieve $k$ runs is $ 2 \times { n-1\choose \lceil \frac{k}{2} \rceil-1} \times { n-1 \choose \lfloor \frac{k}{2} \rfloor - 1 }$.

Proof: A sequence of $k$ runs can be defined as having $a_1$ of the first color, $b_1$ of the second color, $a_2$ of the first color, $b_2$ of the second color, so on and so forth. For the final term, if $i$ is odd, we have $a_{ \lceil \frac{k}{2} \rceil }$ of the first color, and if $k$ is even, we have $b_{\lfloor \frac{k}{2} \rfloor}$ of the second color.

After choosing what the first color is, we have the constraint that $ \sum a_i = n$ and $ \sum b_i = n$, of which there are $ \lceil \frac{k}{2} \rceil $ terms in the first summation and $ \lfloor \frac{k}{2} \rfloor$ terms in this second summation. It is easy to see that this is the only constraint, since given any 2 such summations, we can form a sequence of $i$ runs. This establishes the bijection between a sequence of $k$ runs and solutions to these equations.

We now count the number of ways to get such solutions. By bars and stars, there are $ { n-1 \choose \lceil \frac{i}{2} \rceil - 1 } $ solutions to the first equation, and ${ n-1 \choose \lfloor \frac{i}{2} \rfloor - 1 }$ solutions to the second. Hence the lemma follows.

Corollary: Given $2n$ cards, the number of ways to get $r$ runs is the same as the number of ways to get $2n+2-r$ runs.

Corollary: By symmetry, the expected number of runs is $ \frac{ r + ( 2n+2 - r ) } { 2 } = n+ 1 $.