Asymptotics of the optimal stopping time of a paying die game

150 Views Asked by At

This may be a question with really obvious analytical answers which I do not see.

You play a multistep game with a die with equal probability of generating any of $1,2,\cdots, n$ at each throw. At the end of each throw of the die, you may choose to stop the game and walk away with the same dollar amount as the number generated by the last roll of the die, or choose to pay \$1 to continue the game. The game can last indefinitely. What is the amount of money you expect to collect from this game? Is there an easy asymptotic formula of the expectation as $n\to\infty$?

Note that only at the end of roll $n$, the expected future income is $\frac{n+1}2$ should you decide to continue the game, since the $n+1$'st roll is the absolutely latest you have to stop. One can easily set up the recursion for the expectation of the game value at any step. But is there an asymptotic formula for it as $n\to\infty$?

2

There are 2 best solutions below

21
On BEST ANSWER

For a given $n$ we have to set a threshold $k$ that we will accept. If we keep rolling the expected value of the roll we accept is $\frac 12(n+k+1)$. On average it takes $\frac n{n-k}$ rolls to get one, so we pay that much. The breakeven point comes when $$\frac 12(n+k+1)-\frac n{n-k}=k\\ (n-k)(n+k+1)-2n=2k(n-k)\\ n^2-k^2-k-n=2kn-2k^2\\ k^2-(2n+1)k+n^2-n=0\\ k=\frac12\left(2n+1\pm \sqrt{(2n+1)^2-4n^2+4n}\right)\\ k=n+\frac 12-\frac 12\sqrt{8n+1}\\ k \approx n-\sqrt {2n}+\frac 12$$ The expected value of the roll we accept is then $n-\frac {\sqrt {2n}}2+\frac 14$ and it takes $\frac {n}{\sqrt {2n}}$ tries to get one, so the payoff is about $n-\sqrt {2n}$

7
On

Let $v(t,i)$ be the optimal expected worth (the sum of current wealth and the expected future income) at the end of $t$'th roll having observed number $i$. It is worth emphasizing that $i$ is in this setting given and the subsequent decision is conditioned upon it. The rolls starting from $t=0$. We have \begin{equation} v(t,i)=\max\begin{cases} -t+i, \\ \mathbf E[v(t+1,j)|\mathcal F_t]-1 \end{cases}. \end{equation} Sett $u(t,i):=v(t,i)+t$ to exclude the sunk cost. Better yet, at each time $t$ it is more transparent to look at the expected future income $u(t,i)$ with a given $t$'th roll outcome of $i$. \begin{equation} u(t,i)=\max\begin{cases} i, \\ -1+\mathbf E[u(t+1,j)|\mathcal F_t] \end{cases}. \end{equation} Let $x(t):=-1+E[u(t+1,j)|\mathcal F_t]$. It is important to note that $x(t)$ is a deterministic function dependent on $t$ only rather than a random variable. We stop and take $i$ if $i\ge x(t)$ and continue if $i<x(t)$. Therefore \begin{align} x(t-1)+1 &= \frac1n\Big(\sum_{i=\lfloor x\rfloor+1}^n i+\sum_{i=1}^{\lfloor x\rfloor}x(t)\Big)\\ &=\frac1n\Big(\frac12\big(n+\lfloor x(t)\rfloor+1)(n-\lfloor x(t)\rfloor\big)+\lfloor x(t)\rfloor x(t)\Big) \tag0 \end{align} This is the same as @RossMillikan's equation of $k$ after setting $k:=x(t-1)=x(t)$. Dividing Equation $(0)$ by $n$, $$\frac{x(t-1)+1}n =\frac12\Big[1-\Big(\frac{\lfloor x(t)\rfloor}n\Big)^2+\frac1n\Big(1-\frac{\lfloor x(t)\rfloor}n\Big)\Big]+\frac{\lfloor x(t)\rfloor}n\frac{x(t)}n. \tag1 $$ Note this is a difference equation for $x(t)$ as $x(t)$ has been shown to be deterministic rather than stochastic.

If the die generates a continuum amount of dollars with uniform distribution, with $\frac{x(t)}n\to y(t)$ as $n\to\infty$, the above iteration becomes $$y(t-1)=\frac12\big(1+y(t)^2\big)\tag2.$$ Both recursions lead to the results that $x(t)$ and $y(t)$ respectively increase as $t$ decreases and converge as $t\to -\infty$.


The decision threshold of $i$ in the above formulation Equation $(1)$ decreases with $t$. $u(t=T,i)=i$ so $\mathbf E[u(t=T,j)|\mathcal F_{T-1}]=\frac{1+n}2$. The threshold at $t=T-1$ is $\frac{n-1}2$. Now for large $n$ $$\mathbf E[u(t=T-1,j)|\mathcal F_{T-2}]\approx\frac1n\Big[\Big(\frac{n-1}2\Big)^2+\frac12\Big(n-\frac{n-1}2\Big)\Big(n+\frac{1+n}2\Big)\Big]=\frac58n+\frac3{8n}.$$ So the threshold at $t=T-2$ is $\mathbf E[u(t=T-1,j)|\mathcal F_{T-2}]-1\approx \frac58n>\frac n2$ approximately the threshold at $t=T-1$ for large $n$.

So the decision threshold is decreasing at least for $t\in \{T-2,T-1,T\}$ and is not a constant.