Optimal stopping in red vs black card game deck of 52 cards, quick way to reasonably estimate the answer

Question

Optimal stopping in red vs black card game deck of 52 cards, quick way to reasonably estimate the answer

762 Views Asked by Bumbble Comm At 30 Mar 2026 - 8:44

Here is a question asked on MSE a few years ago: Optimal stopping in red vs black card game deck of 52 cards

Imagine you are playing a card game you start with a well shuffled deck of $52$ cards, containing $26$ red cards and $26$ black cards, stacked face down. You have a sequence of turns, $52$ possible turns in total, each turn you either pull the top card and turn it over, or you quit. If you pull a red card you win $1$, and if you pull a black card you lose $1$. If you played all $52$ turns without stopping you are guaranteed to break even. What is the optimal stopping strategy?

There is an answer given there by quasi: https://math.stackexchange.com/a/2544082/834963

Let $v(r,b)$ be the expected value of the game for the player, assuming optimal play, if the remaining deck has $r$ red cards and $b$ black cards.

Then $v(r,b)$ satisfies the recursion $$ v(r,b) = \begin{cases} 0&\;\text{if}\;r=0\\[4pt] r&\;\text{if}\;b=0\;\text{and}\;r>0\\[4pt] \max(0,f(r,b))&\;\text{if}\;r,b>0\\[4pt] \end{cases} $$ where $$ f(r,b) = \left({\small{\frac{r}{r+b}}}\right)(1+v(r-1,b)) + \left({\small{\frac{b}{r+b}}}\right)(-1+v(r,b-1)) $$ The stopping rule is simple: Stop when $v(r,b)=0$.

To explain the recursion . . .

If $r,b>0$, and the player elects to play a card, then:

The revealed card is red with probability ${\large{\frac{r}{r+b}}}$, and in that case, the player gets a score of $+1$, and the new value is $v(r-1,b)$.$\\[4pt]$
The revealed card is black with probability ${\large{\frac{b}{r+b}}}$, and in that case, the player gets a score of $-1$, and the new value is $v(r,b-1)$.

Thus, if $r,b>0$, electing to play a card yields the value $f(r,b)$.

But the player always has the option to quit, hence, if $r,b>0$, we get $v(r,b) = \max(0,f(r,b))$.

Implementing the recursion in Maple, the value of the game is $$v(26,26) = \frac{41984711742427}{15997372030584}\approx 2.624475549$$ and the optimal stopping strategy is as follows . . .

If $24 \le b \le 26$, play while $r \ge b - 5$.$\\[4pt]$
If $17 \le b \le 23$, play while $r \ge b - 4$.$\\[4pt]$
If $11 \le b \le 16$, play while $r \ge b - 3$.$\\[4pt]$
If $6 \le b \le 10$, play while $r \ge b - 2$.$\\[4pt]$
If $3 \le b \le 5$, play while $r \ge b - 1$.$\\[4pt]$
If $1 \le b \le 2$, play while $r \ge b$.$\\[4pt]$
If $b = 0$, play while $r > 0$.

However, given that this question was asked as part of a job interview, I don't think it's very realistic to get $2.62$ in a short amount of time. I was wondering if there a quick way to:

Find the optimal stopping rule in terms of maximizing expected payoff.
Get an estimate close to the actual answer of around $2.62$ without laborious recursive calculation or writing a program.

Original Q&A

There are 2 best solutions below

**Bumbble Comm** · Answer 1 · 2022-07-25 15:40:41

Look for the 'pivot points' and make continuous approximations

Here's part of an answer...

If we do some handwaving and approximations we can get something more compact without actually doing the computations.

First, simplifying the notation of $v$ and $f$, we can combine into $q$ notation: q depends on both state and action, and is the expected value of taking the proposed action in the state and then acting optimally thereafter.

Using Bellman's optimal q equation, writing $q(r, b)$ for the quality of the action $\operatorname{play}$ in state $(r, b)$ (the quality of the action $\operatorname{stop}$ is always $0$)

$$q(r, b) = \frac {r} {r+b}\left(1 + \max(0, q(r-1, b))\right) + \frac {b} {r+b}\left(-1 + \max(0, q(r, b-1))\right)$$

The point of the question is to determine when $q(r, b) > 0$ (i.e. we should take the $\operatorname{play}$ action) and also what is $q(26, 26)$. What follows makes some progress toward the former.

So the criterion is

$$q(r, b) > 0$$

First handwave: in the region of $0$, less $r$ is 'bad', less $b$ is 'good'

Those $\max$ expressions sure make this fiddly!

Let's look for 'pivot points' where $q \simeq 0$.

Assuming $q$ is roughly continuous, when we're near (in $(r, b)$-space) $q(r, b) = 0$ we should expect that reducing $r$ pushes us into negative territory. And we should assume that reducing $b$ pushes us into positive territory.

So 'near a pivot' we can say

$$\max(0, q(r-1, b)) = 0$$ $$\max(0, q(r, b-1)) = q(r, b-1)$$

and also

$$\frac {\partial q} {\partial r}(r, b) > 0$$ $$\frac {\partial q} {\partial b}(r, b) < 0$$

Second handwave: small changes in $r$, and $b$ have roughly linear effects

This is basically a Taylor assumption. I don't justify it here, but it seems intuitively sensible for the domain in question.

Now we can say things like, near a pivot where $q(r, b) = 0$

$$\max(0, q(r-1, b)) = \max\left(0, q(r, b) - \frac {\partial q} {\partial r}(r, b)\right) = 0$$ $$\max(0, q(r, b-1)) = \max\left(0, q(r, b) - \frac {\partial q} {\partial b}(r, b)\right) = -\frac {\partial q} {\partial b}(r, b)$$

Putting it together and 'solving'

Now we have, near a 'pivot point' where $q(r, b) \simeq 0$,

$$q(r, b) \simeq \frac r {r+b} + \frac b {r+b}\left(-1 - \frac {\partial q} {\partial b}(r, b)\right)$$

So we have a first order partial differential equation for $q$ in $b$

$$0 \simeq r + b \left(-1 - \frac {\partial q} {\partial b}\right)$$

Separating,

$$\frac {\partial q} {\partial b} = -1 + \frac r b$$

and solving

$$q = - b + r \ln b + c_r(r)$$

I'm not sure what handwaves are required, but intuitively we should be able to do some similar tricks to get

$$q = r + c_b(b)$$

and from there

$$q = r - b + r\ln b + c$$

which yields the criterion (for some $c$ to be determined, perhaps by inspecting small $r$, $b$ cases)

Play while $r > \frac {b + c} {1 + \ln b}$

Presumably an appropriate $c$ is small, maybe zero, since for small $r$ and $b$, $r-b$ is a good criterion.

Finding $q(26, 26)$

The above solution should not be expected to extrapolate usefully 'away from pivots', since many of the assumptions require being in the region of a pivot. So we can't use this to answer the other part of the question.

**Bumbble Comm** · Answer 2 · 2022-07-29 04:11:04

For large $r, b$ with $\frac rb$ close to $1$, we should expect the game to be approximately unchanged after replacing each $+1$ card with four cards $+\frac12, +\frac12, +\frac12, -\frac12$, and each $-1$ card with four cards $+\frac12, -\frac12, -\frac12, -\frac12$, since the replacements have the same sum and variance over a given fraction of the deck. In other words, doubling $b - r$ while quadrupling $r + b$ should approximately double the game value. So the optimal stopping criterion should have $b - r$ approximately proportional to $\sqrt{r + b}$.

Optimal stopping in red vs black card game deck of 52 cards, quick way to reasonably estimate the answer

There are 2 best solutions below

Look for the 'pivot points' and make continuous approximations

First handwave: in the region of $0$, less $r$ is 'bad', less $b$ is 'good'

Second handwave: small changes in $r$, and $b$ have roughly linear effects

Putting it together and 'solving'

Finding $q(26, 26)$

Related Questions in PROBABILITY

Related Questions in COMBINATORICS

Related Questions in EXPECTED-VALUE

Related Questions in ESTIMATION

Related Questions in CARD-GAMES

Trending Questions

Popular # Hahtags

Popular Questions