You have a shuffled deck of 26 red and 26 black cards. You play a game by repeatedly looking at the top card and either discarding it or ending the game. At the end if the color of the next card matches that of the top card you win, otherwise you lose. What is the optimal strategy?
Stopping Problem Question
2.8k Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail AtThere are 2 best solutions below
On
Without loss of generality, we can assume the initial top card is red.
Label the choices for the player as "hit" or "stop".
If the player chooses to hit, the old top card is discarded, and a new top card is dealt face up from the deck.
At any time when a choice of hit or stop is about to be made, and assuming the current top card is red, define the state of the game to be the pair $(r,b)$, where $r$ and $b$ are nonnegative integers, not both zero, representing the respective number of red cards and black cards remaining in the deck.
To simplify the analysis, when a player chooses to hit, if a black card is dealt as the new top card, change the color of the top card to red, and simultaneously toggle the colors of all the remaining cards in the deck. By symmetry, assuming the player uses an optimal strategy, this conversion doesn't affect the player's chance of winning.
Thus, for the analysis, force all top cards to be red.
Let $p(r,b)$ be the probability that the player wins the game, assuming
- The current state is $(r,b)$.$\\[4pt]$
- The current top card is red.$\\[4pt]$
- The player uses an optimal strategy.
Then we have the following recursion . . . $$p(r,b)= \begin{cases} \text{if $r=0$ and $b=1$ then}\\[4pt] \qquad 0\\[4pt] \text{else if $r=0$ or $b=0$ then}\\[5pt] \qquad 1\\[2pt] \text{else}\\ \qquad \max\left( {\displaystyle{\frac{r}{r+b}}}, \; \left({\displaystyle{\frac{r}{r+b}}}\right)p(r-1,b) + \left({\displaystyle{\frac{b}{r+b}}}\right)p(b-1,r) \right) \\ \end{cases} $$ Implementing the recursion in Maple, we discover a very simple optimal strategy:
- Stop only if $(r,b)=(1,1)$, or if one of $r,b$ is zero, and the other matches the color of the top card, otherwise hit.
Using this strategy, we get $$p(25,26)=\frac{38}{51}\approx .7450980392$$
Conjecture:$\;$The strategy specified above is optimal for any state $(r,b)$.
Here is a proof that quasi's strategy is optimal for any situation.
I'm going to slightly modify that strategy to simplify the analysis. In the new strategy (which we'll call the Two-Card Strategy), you should keep playing no matter what until only two face-down cards are left. Based on the cards you've seen so far, you should be able to determine if they're the same color or different colors. Based on that:
The only difference from quasi's strategy is that the Two-Card Strategy potentially stalls for a while if it gets into a situation where there are many cards left, all of one color. (In that case, we can win with probability $1$ before getting down to two face-down cards.) But this doesn't change the winning probability, since we win anyway in either case.
As a result, the two bottom cards of the deck are ultimately what determines the fate of this strategy.
Now consider any other strategy. If the other strategy also always waits until two face-down cards are left, it's easy to see that it can't beat the Two-Card Strategy. So suppose that the other strategy decides to stop at some point when more than two cards are left.
We'll consider each of those early stopping points one by one. If, when the other strategy decides to stop, the top card is red (without loss of generality) and there are $r$ red and $b$ black cards left, the other strategy wins with probability $\frac{r}{r+b}$. We may assume $b\ge 1$; if $b=0$, then the other strategy definitely wins in this case, but so does the Two-Card strategy, because then the bottom two cards are definitely red.
If we decided to keep going and use the Two-Card Strategy instead, then:
So the Two-Card Strategy wins with probability \begin{align} \frac{r(r-1) + b(b-1) + \frac12 \cdot 2rb}{(r+b)(r+b-1)} &= 1 - \frac{b}{r+b} \cdot \frac{r}{r+b-1} \\ &\ge 1 - \frac{b}{r+b} & \text{(since $b\ge 1$, $\tfrac{r}{r+b-1} \le 1$)}\\ &= \frac{r}{r+b}. \end{align} That is, whenever the other strategy decides to stop early, we would have done at least as well with the Two-Card Strategy, and therefore the Two-Card Strategy is optimal.
Also, the inequality is tight if the other strategy never stops when $\frac{r}{r+b-1} < 1$, an inequality equivalent to $b>1$. This gives us a general characterization of optimal strategies. A strategy is optimal if and only if it does both of the following: