Probability: Optimal strategy in die game

344 Views Asked by At

You are given a single die. When you roll the die, you have two choices:

  1. End the game here, in which case the number you just rolled is your final score.
  2. Continue the game and roll once again, after which you will face these same two choices.

You can roll the die a maximum of 4 times. After the fourth roll, you will not have a choice and will have to accept whatever you rolled as your final score.

There are two questions to answer here:

  1. What is the optimal strategy to maximize your final score?
  2. What is the expected value of the final score?

I was asked this question in an interview and struggled to work it out. I ended up getting the job anyway, but am still interested to figure out the approach to this question. When I was unable to answer this question, my interviewer said that he didn't expect me to answer this question as it needs some intuition from Markov chains, which I had no prior exposure to.

2

There are 2 best solutions below

0
On

You continue if the expected value of continuing is higher than your current role.

On the third roll, the expected value of continuing equals $$C_{3}=\frac{1}{6}(1+2+3+4+5+6)=\frac{7}{2}=4-\frac{1}{2},$$ so you continue if you roll lower than $4$.

On the second roll, the expected value of continuing equals $$C_{2}=\frac{1}{6}(4+5+6)+\frac{1}{2}C_{3}=\frac{17}{4}=5-\frac{3}{4},$$ so you continue if you roll lower than $5$.

On the first role, the expected value of continuing equals $$C_{1}=\frac{1}{6}(5+6)+\frac{2}{3}C_{2}=\frac{22}{12}+\frac{51}{12}=\frac{71}{12}=6-\frac{1}{12},$$ so you continue if you roll lower than $6$.

0
On

Discussion, notations and a first solution:

We will use the following (general) stopping strategy, depending on the sets $A_1,A_2,A_3\subset\{1,2,3,4,5,6\}=:\Omega_0$. The whole modeling space is $\Omega=\Omega_0^{\times 4}$, and we have the usual filtration on it. Let $n=(n_1,n_2,n_3,n_4)$ be an element in $\Omega$. (After the player stops, we can still go on, possibly without telling him the one, two, or three last results.)

  • If the player sees $n_1\in A_1$ he stops. For $n_1\not \in A_1$ he continues.

  • If the player sees $n_2\in A_2$ he stops. For $n_2\not \in A_2$ he continues.

  • If the player sees $n_3\in A_3$ he stops. For $n_3\not \in A_3$ he continues.

  • And the player has to accept $n_4$ as a final result.

Let $p_1,p_2,p_3$ be the probabilities to land in $A_1$, $A_2$, $A_3$, so $p_k=|A_k|/6$, for $k \in \{1,2,3\}$, and let $m_k$ be the mean of the set $A_k$, for $k \in \{1,2,3\}$. We have to maximize $$ p_1m_1+(1-p_1)p_2m_2+(1-p_1)(1-p_2)p_3m_3+\frac 72\ . $$ It is clear now, that $A_1$ can be taken optimal if it contains the highest values, given its cardinality as fixed. The same applies also for $A_2,A_3$. There are thus only $6^3$ stopping strategies, depending on $a_k:=|A_k|$, $k\in\{1,2,3\}$. It makes no sense to take some $a_k=0$ (instead of using $a_k=1$, $A_k=\{6\}$), or $a_k=6$ (instead using $a_k=5$, $A_k=\{2,3,4,5,6\}$, and forcing one more turn, where we get at least one as minimal result). so $a_1,a_2.a_3\in\{1,2,3,4,5\}$.

Then $p_k=a_k/6$, the elements of $A_k$ are $6-0,\dots,6-(a_k-1)$,with a mean of $6-\frac 12(a_k-1)$, so we have to maximize: $$ \frac {a_1}6\left( 6-\frac 12(a_1-1)\right)\\ + \frac {6-a_1}6\cdot\frac {a_2}6\left( 6-\frac 12 (a_2-1)\right)\\ + \frac {6-a_1}6\cdot\frac {6-a_2}6\cdot\frac {a_3}6\left( 6-\frac 12 (a_3-1)\right)\\ + \frac 72\ . $$ The computer gives the best five strategies as follows:

sage: strategies = []    # and we will append
sage: R = [1, 2, 3, 4, 5]
sage: for a1, a2, a3 in cartesian_product( [R, R, R] ):
....:     val = f(a1, a2, a3)
....:     strategies.append( [val, [a1, a2, a3]] )
....:     
sage: strategies.sort()
sage: for val, str in strategies[-5:]:
....:     print "Expected final score %7s ~ %.6f for the strategy %s" % (val, val, str)
....:     
Expected final score    44/9 ~ 4.888889 for the strategy [2, 3, 4]
Expected final score  265/54 ~ 4.907407 for the strategy [2, 2, 2]
Expected final score  265/54 ~ 4.907407 for the strategy [2, 2, 4]
Expected final score   59/12 ~ 4.916667 for the strategy [2, 3, 3]
Expected final score   89/18 ~ 4.944444 for the strategy [2, 2, 3]
sage: 

Second solution:

We can also think recursively. In the present game there are $4$ steps. (After the $4$.th step we must stop.) Let us consider the general game, where there are $N$ steps. Let $W_N\in[1,6]$ be the expected final score in a strategy (similar to those discusted above) for the game with $N$ steps. We can now argue inductively.

  • For $N=1$ we have the expectation $W_1=(1+2+3+4+5+6)/6=7/2=3.5$. That's it.

  • Now in the $N=2$ game, we have a similar strategy as described in the first solution, but "it is clear" (by maximizing $a\to \frac a6\left(6-\frac 12(a-1)\right) +\frac {6-a}6\cdot W_1$) that we should stop in the first step if we see a number $>W_1$, else continue. The numbers bigger $W_1$ are $4,5,6$. The expected final score is now $$ W_2 =\frac 12\cdot 5+\frac 12\cdot W_1 =\frac{17}4=4.25\ . $$

  • In the $N=3$ game, same thoughts tell us to stop in the first step if we see a number $>W_2$, so we stop when getting $5$ or $6$. The expected final score is now $$ W_3 =\frac 13\cdot \frac{11}2+\frac 23\cdot W_2 =\frac{14}3=4.66666\dots\ . $$

  • In the $N=4$ game, same thoughts tell us to stop in the first step if we see a number $>W_3$, so we stop again when getting $5$ or $6$. The expected final score is now $$ W_4 =\frac 13\cdot \frac{11}2+\frac 23\cdot W_3 =\frac{89}{18}=4.944444\dots\ . $$

  • In the game with $N=5$ we also stop in the first step, when we see either $5$ or $6$, the corresponding expected final score is $$ W_5 =\frac 13\cdot \frac{11}2+\frac 23\cdot W_4 =\frac{277}{54}=5\ .\ 1\;296\;296\;296\;296\dots\ . $$ We jump thus over $5$, so "next strategies" will stop after the first step only when getting the six.