20-sided dice 100 rounds game

1.6k Views Asked by At

In a game there is a 20-sided die. At the start of the game it is on the table and the 1 face is facing upright. In each of the 100 rounds you have 2 options: you can either roll the dice or you can take a number of dollars equal to the current face. How would you play this game to maximize the expected profit and how much money could you win on average?

First, let's notice that it does not make sense to reroll after taking money, because if you admit it's profitable to take x dollars on turn i, it is also profitable to take them on turn i+1. So the best strategy would be to roll up to something "good enough" and then take it for the rest of turns.

I have thought about 3 things that could be useful (though I am not sure if they are):

  1. Let $(U_1,U_2,...,U_{100})$ be i.i.d. r.v. ~ Unif{1,..,20}, representing the randomness for rolls.

Let $(Y_1,Y_2,...,Y_{100}) \in {0,1}$ : $Y_i = 0 <=> $ we decide to roll on turn $i$, $Y_i = 1 <=> $ we decide to take money on turn $i$.

Let $(X_0,X_1,X_2,...,X_{100})$ be r.v. representing the current face of the dice. $X_i$ - the face of the dice on the turn $i$ ($X_0=1)$: $$X_n = Y_nX_{n-1} + (1-Y_n)U_n = \prod_{i=1}^{n}Y_i +\sum_{i=1}^{n}[(\prod_{j=i+1}^{n}U_j )U_i(1-Y_i)] $$

Finally, let $(S_0,S_1,S_2,...S_{100})$ be r.v. such that $S_i$ - the money you have after turn $i$. $S_0 = 0$, $\forall n \geq 1:$ $$S_n = S_{n-1}+Y_nX_{n-1}=\sum_{i=1}^{n}Y_iX_{i-1} = \sum_{i=1}^{n}Y_i(\prod_{k=1}^{i-1}Y_k +\sum_{k=1}^{i-1}[(\prod_{j=k+1}^{i-1}U_j )U_k(1-Y_k)])$$

Now we want to maximize $\mathbb{E}[S_{100}]$, and that is the moment I am stuck on. Should we somehow use that $X_n$ is a Markov chain? Or should we somehow find a suitable Martingale, which could help us to calculate this expectation be applying Doob's optional sampling theorem?

  1. For each turn $i$ $ \exists K_i \in$ {1,..,20} such that:

If $X_{i-1} \geq K_i$, then for all turns $j : 100 \geq j \geq i$ we take $X_i$ and do not roll again ($Y_j = 1$) ;

If $X_{i-1} < K_i$, then we reroll on turn i ($Y_i = 0$).

Let's notice that $K_i$ is a fixed number, not a random variable. So if we get to calculate $K_i$ for each turn $i$, we will solve the task and find the best strategy.

Let $T := inf\{i \in (1,...,100) | X_{i-1} \geq K_i \}$ be a r.v. representing the turn on which we start taking money (T is a stopping time).

Then $Y_1=Y_2=...=Y_{T-1}=0$, $Y_T=...=Y_{100}=1$.

And $X_1 = U_1, ..., X_{T-1} = U_{T-1}$, $X_T=...=X_{100}=X_{T-1}=U_{T-1} \geq K_T$.

And $S_1=...=S_{T-1}=0$, $S_T = X_{T-1}, S_{T+1} = 2X_{T-1},...,S_{100}=(101-T)X_{T-1}$.

Again, if only we could build a Martingale from $S_n$...

  1. $K_i = min \{k_i: (101-i)k_i \geq \mathbb{E}[S_{100} | T \geq i+1] \} $ : by definition, $K_i$ is the smallest roll that would still bring more money $(101-i)k_i$ than rerolling .

$$\mathbb{E}[S_{100} | T \geq i+1] = \mathbb{E}[(101-T)X_{T-1}| T \geq i+1] = \sum_{j=i+1}^{100} \{\mathbb{E}[(101-T)X_{T-1}| T \geq i+1, T=j] \mathbb{P}[T=j]\} = \sum_{j=i+1}^{100} \{\mathbb{E}[(101-j)X_{j-1}] \mathbb{P}[U_{j-1} \geq K_j, U_{j-2} < K_{j-1},...,U_1<K_2]\} = \sum_{j=i+1}^{100} \{\mathbb{E}[(101-j)X_{j-1}] \frac{K_2-1}{20}...\frac{K_{j-1}-1}{20}\frac{21-K_j}{20}\} = \sum_{j=i+1}^{100} \{ \sum_{n=1}^{20}\{\mathbb{E}[(101-j)X_{j-1} | X_{j-1}=n]\mathbb{P}[X_{j-1}=n]\} \frac{K_2-1}{20}...\frac{K_{j-1}-1}{20}\frac{21-K_j}{20}\} = \sum_{j=i+1}^{100} \{(101-j) \sum_{n=1}^{20}\{\frac{n}{20}\} \frac{K_2-1}{20}...\frac{K_{j-1}-1}{20}\frac{21-K_j}{20}\} = $$$$ \frac{21}{2} \sum_{j=i+1}^{100} \{(101-j) \frac{K_2-1}{20}...\frac{K_{j-1}-1}{20}\frac{21-K_j}{20}\}$$

3

There are 3 best solutions below

3
On BEST ANSWER

The player of the game is right now in the situation he sees $x$ (between $1$ and $20$) and has $k$ more rounds to go, denote by $$ W(x,k) $$ the expected win with the optimal strategy. He can either take $x$ dollars, or else take a new face, and in the last case the expected win is the mean of the $W(y,k-1)$ for all $20$ possibilities of $y$. Thus we have to compute recursively (from the end to the start) the numbers $W(x,k)$ determined recursively by: $$ \begin{aligned} W(x,1) &=x\ ,&&\text{one more round, take the money, }\\ W(x,k+1) &=\max\left(x+W(x,k),\ \underbrace{\frac 1{20}\sum_yW(y,k)}_{w(k)}\right)\ ,&& \text{ponder both decisions, take max .}\\ \end{aligned} $$ Let us cover some first steps manually.

  • For the last decision we have: $$W(x,1)=x\ .$$
  • So the mean is $w(1)=\frac 1{20}\sum_y y=\frac{21}2=10.5$ .
  • $W(x,2)=\max(x+W(x,1),\ w(1))=\max(x+x,w(1))$, and this is $2x$ for $x\ge 6$, respectively the "constant" $w(1)$ for $x\le 5$. Let us display this: $$ W(x,2)=\left\{ \begin{aligned} 10.5 &\text{ for }1\le x\le 5\ ,\\ 2x &\text{ for }6\le x\le 20\ . \end{aligned} \right. $$
  • So the mean is $w(2)=\frac 1{20}\sum_y W(y,2)=\frac1{20}(5\cdot 10.5 + 2(6+7+\dots+20))=22.125$ .
  • $W(x,3)=\max(x+W(x,2),\ w(2))$ is computed for $1\le x \le 5$, when we get $x+10.5$ or $22.125$, ok, take the last number, for $6\le x\le 7$, when we compare $x+2x=3x$ and $22.125$, ok, take the last again, and for $8\le x\le 20$, when we compare the same last two number, but pick the better win $3x$. Let us display this: $$ W(x,3)=\left\{ \begin{aligned} 22.125 &\text{ for }1\le x\le 7\ ,\\ 3x &\text{ for }8\le x\le 20\ . \end{aligned} \right. $$
  • So the mean $w(3)=\frac1{20}(7\cdot 22.125 + 3(8+9+\dots+20))=35.043$ .

This goes on in the same manner till all $100$ steps are computed. Since the problem is a very special bet-and-win story, no number theory, i decided to implement, instead of searching for an exact formula. Note that during the backwards recursion we have at each step some cases when the expected win at best decision is $kx$ (when facing $x$ and having $k$ further steps to go), so let us put a terminology on this collect the money-set, it is the "support" at step $k$.

Code:

R = [1..20]
k = 1
W = dict([(x, x) for x in R])
w = sum(W)/20

while k < 100:
    k += 1
    V = dict([(x, max(x + W[x], w)) for x in R])
    W, w = V, sum(V.values())/20
    print(f"k = {k} w ~ {RR(w)} with support {[x for x in R if W[x] == k*x]}")

This delivers:

k = 2 w ~ 22.1250000000000 with support [6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
k = 3 w ~ 35.0437500000000 with support [8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
k = 4 w ~ 48.8175000000000 with support [9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
k = 5 w ~ 63.2178750000000 with support [10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
k = 6 w ~ 78.1089375000000 with support [11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
k = 7 w ~ 93.3599156250000 with support [12, 13, 14, 15, 16, 17, 18, 19, 20]
k = 8 w ~ 108.947953593750 with support [12, 13, 14, 15, 16, 17, 18, 19, 20]
k = 9 w ~ 124.768772156250 with support [13, 14, 15, 16, 17, 18, 19, 20]
k = 10 w ~ 140.861263293750 with support [13, 14, 15, 16, 17, 18, 19, 20]
k = 11 w ~ 157.116757976250 with support [13, 14, 15, 16, 17, 18, 19, 20]
k = 12 w ~ 173.525892684563 with support [14, 15, 16, 17, 18, 19, 20]
k = 13 w ~ 190.141830244966 with support [14, 15, 16, 17, 18, 19, 20]
k = 14 w ~ 206.892189659228 with support [14, 15, 16, 17, 18, 19, 20]
k = 15 w ~ 223.729923278498 with support [14, 15, 16, 17, 18, 19, 20]
k = 16 w ~ 240.624450131024 with support [14, 15, 16, 17, 18, 19, 20]
k = 17 w ~ 257.687115091717 with support [15, 16, 17, 18, 19, 20]
k = 18 w ~ 274.880980564202 with support [15, 16, 17, 18, 19, 20]
k = 19 w ~ 292.166686394941 with support [15, 16, 17, 18, 19, 20]
k = 20 w ~ 309.516680476459 with support [15, 16, 17, 18, 19, 20]
k = 21 w ~ 326.911676333521 with support [15, 16, 17, 18, 19, 20]
k = 22 w ~ 344.338173433465 with support [15, 16, 17, 18, 19, 20]
k = 23 w ~ 361.786721403425 with support [15, 16, 17, 18, 19, 20]
k = 24 w ~ 379.340041052569 with support [16, 17, 18, 19, 20]
k = 25 w ~ 397.005030789427 with support [16, 17, 18, 19, 20]
k = 26 w ~ 414.753773092070 with support [16, 17, 18, 19, 20]
k = 27 w ~ 432.565329819053 with support [16, 17, 18, 19, 20]
k = 28 w ~ 450.423997364289 with support [16, 17, 18, 19, 20]
k = 29 w ~ 468.317998023217 with support [16, 17, 18, 19, 20]
k = 30 w ~ 486.238498517413 with support [16, 17, 18, 19, 20]
k = 31 w ~ 504.178873888060 with support [16, 17, 18, 19, 20]
k = 32 w ~ 522.134155416045 with support [16, 17, 18, 19, 20]
k = 33 w ~ 540.100616562034 with support [16, 17, 18, 19, 20]
k = 34 w ~ 558.075462421525 with support [16, 17, 18, 19, 20]
k = 35 w ~ 576.056596816144 with support [16, 17, 18, 19, 20]
k = 36 w ~ 594.045277452915 with support [17, 18, 19, 20]
k = 37 w ~ 612.136221962332 with support [17, 18, 19, 20]
k = 38 w ~ 630.308977569866 with support [17, 18, 19, 20]
k = 39 w ~ 648.547182055893 with support [17, 18, 19, 20]
k = 40 w ~ 666.837745644714 with support [17, 18, 19, 20]
k = 41 w ~ 685.170196515771 with support [17, 18, 19, 20]
k = 42 w ~ 703.536157212617 with support [17, 18, 19, 20]
k = 43 w ~ 721.928925770094 with support [17, 18, 19, 20]
k = 44 w ~ 740.343140616075 with support [17, 18, 19, 20]
k = 45 w ~ 758.774512492860 with support [17, 18, 19, 20]
k = 46 w ~ 777.219609994288 with support [17, 18, 19, 20]
k = 47 w ~ 795.675687995430 with support [17, 18, 19, 20]
k = 48 w ~ 814.140550396344 with support [17, 18, 19, 20]
k = 49 w ~ 832.612440317075 with support [17, 18, 19, 20]
k = 50 w ~ 851.089952253660 with support [17, 18, 19, 20]
k = 51 w ~ 869.571961802928 with support [17, 18, 19, 20]
k = 52 w ~ 888.057569442343 with support [17, 18, 19, 20]
k = 53 w ~ 906.546055553874 with support [17, 18, 19, 20]
k = 54 w ~ 925.036844443099 with support [17, 18, 19, 20]
k = 55 w ~ 943.529475554479 with support [17, 18, 19, 20]
k = 56 w ~ 962.023580443584 with support [17, 18, 19, 20]
k = 57 w ~ 980.518864354867 with support [17, 18, 19, 20]
k = 58 w ~ 999.015091483893 with support [17, 18, 19, 20]
k = 59 w ~ 1017.51207318711 with support [17, 18, 19, 20]
k = 60 w ~ 1036.00965854969 with support [17, 18, 19, 20]
k = 61 w ~ 1054.50772683975 with support [17, 18, 19, 20]
k = 62 w ~ 1073.03156781379 with support [18, 19, 20]
k = 63 w ~ 1091.62683264172 with support [18, 19, 20]
k = 64 w ~ 1110.28280774546 with support [18, 19, 20]
k = 65 w ~ 1128.99038658364 with support [18, 19, 20]
k = 66 w ~ 1147.74182859610 with support [18, 19, 20]
k = 67 w ~ 1166.53055430668 with support [18, 19, 20]
k = 68 w ~ 1185.35097116068 with support [18, 19, 20]
k = 69 w ~ 1204.19832548658 with support [18, 19, 20]
k = 70 w ~ 1223.06857666359 with support [18, 19, 20]
k = 71 w ~ 1241.95829016405 with support [18, 19, 20]
k = 72 w ~ 1260.86454663944 with support [18, 19, 20]
k = 73 w ~ 1279.78486464353 with support [18, 19, 20]
k = 74 w ~ 1298.71713494700 with support [18, 19, 20]
k = 75 w ~ 1317.65956470495 with support [18, 19, 20]
k = 76 w ~ 1336.61062999921 with support [18, 19, 20]
k = 77 w ~ 1355.56903549933 with support [18, 19, 20]
k = 78 w ~ 1374.53368017443 with support [18, 19, 20]
k = 79 w ~ 1393.50362814826 with support [18, 19, 20]
k = 80 w ~ 1412.47808392602 with support [18, 19, 20]
k = 81 w ~ 1431.45637133712 with support [18, 19, 20]
k = 82 w ~ 1450.43791563655 with support [18, 19, 20]
k = 83 w ~ 1469.42222829107 with support [18, 19, 20]
k = 84 w ~ 1488.40889404741 with support [18, 19, 20]
k = 85 w ~ 1507.39755994030 with support [18, 19, 20]
k = 86 w ~ 1526.38792594925 with support [18, 19, 20]
k = 87 w ~ 1545.37973705686 with support [18, 19, 20]
k = 88 w ~ 1564.37277649834 with support [18, 19, 20]
k = 89 w ~ 1583.36686002358 with support [18, 19, 20]
k = 90 w ~ 1602.36183102005 with support [18, 19, 20]
k = 91 w ~ 1621.35755636704 with support [18, 19, 20]
k = 92 w ~ 1640.35392291198 with support [18, 19, 20]
k = 93 w ~ 1659.35083447519 with support [18, 19, 20]
k = 94 w ~ 1678.34820930391 with support [18, 19, 20]
k = 95 w ~ 1697.34597790832 with support [18, 19, 20]
k = 96 w ~ 1716.34408122207 with support [18, 19, 20]
k = 97 w ~ 1735.34246903876 with support [18, 19, 20]
k = 98 w ~ 1754.34109868295 with support [18, 19, 20]
k = 99 w ~ 1773.33993388051 with support [18, 19, 20]
k = 100 w ~ 1792.33894379843 with support [18, 19, 20]

And at this point, W[1] contains the needed expected win:

sage: W[1]
    13854218233441454103931343576754421232251510833114994904398479127392207121624996084247
  / 7812500000000000000000000000000000000000000000000000000000000000000000000000000000

sage: W[1].n(digits=40)
1773.339933880506125303211977824565917728

sage: W[1].factor()
2^-77 * 5^-84 * 7 * 28643 * 2140894207217543 * 32275299004764806582422742296560469181163442881255432601514814029

(Code was manually rearranged.)

0
On

Let $f(a,n)$ be the expected number of money (in dollars) that you earn, if you use an optimal strategy, the current number on the dice is $a$, and you have $n$ turns left.

Then $f(a,n) = \max\{ a+f(a,n-1), \frac{f(1,n-1)+\cdots+f(20,n-1)}{20}\}$

Where the first number corresponds to taking $a$ dollars, and the second number corresponds to re-rolling the dice.

If using a computer program is allowed, (let $k=20$ be the number of sides of the dice) we can use dynamic programming to see it takes $O(k)$ to compute $f(1,n),\cdots,f(k,n-1)$ given $f(1,n-1),\cdots,f(k,n-1)$. Thus, the program takes $O(nk)$ time. Substituting $n=100,k=20$ yields a good solution.

Also, I am not sure if this observation is correct:

notice that it does not make sense to reroll after taking money, because if you admit it's profitable to take x dollars on turn i, it is also profitable to take them on turn i+1. So the best strategy would be to roll up to something "good enough" and then take it for the rest of turns.

But if it is correct, it can greatly simplify calculations.

0
On

I took a bit of a different approach. I assumed that, for each round, there is an optimal value $x_0$ above which the expected gain is maximized. Assuming $N$ throws and an $n$ sided die, we obtain

$P(x \ge x_0) = \frac{n-x_0 +1}{n}$

The expected number of throws until we obtain this value would be

$k = \frac{1}{P(x \ge x_0)} = \frac{n}{n-x_0+1}$

The expected gain for this strategy with $N_0$ throws left is

$E = (N_0 - k)x = (N_0 - \frac{n}{n-x_0+1})x$

By deriving $E$ with respect to $x$, we obtain that, at step $N_0$

$x_0= \frac{N_0n-n+1}{N_0}$

Since we're dealing with a discrete problem, we should pick a value at least equal to $round(x)$ each turn. I'm aware that it's not a perfect strategy, since it solves a discrete optimization problem using continuous optimization techniques, but it comes pretty close to the actual answer(~1751 compared to the theoretical 1773) and it's a strategy any human can use.

Later edit: i simulated this strategy, it seems that $floor(x)$ is much more performant than $round(x)$, which makes perfect sense(waiting for 20 is not as good as taking 19)