I'm working on dynamic programming problems this week. The following is a dice game. I'm trying to find the optimal strategy using dynamic programming.
The game's description :
The player rolls two dice.
If the sum of the numbers shown by the two dice are equal to the precedent sum (precedent round) then the player will loose all his reward.
Otherwise the player will add the sum given ($2+3$ for example) to the cumulative rewards he has.
I started modeling the problem. The state of the system is as follow: I chose one random number (400 for example) that may be the maximum reward for $N$ rounds game. The state is $V_k(S,D)$ where $S$ is the cumulative sum and $D$ is the sum of the two numbers shown on the two dice.
If we suppose that the numbers of rounds is finite, what will be the optimal strategy in a simple form?
In the case of infinite game what's the mean of the reward?
Thank you in advance guys :)
Here's a Sage session that calculates the expected value and the optimal strategy:
Thus the game's value is about $25.4$, and the reward at which to stop depends strongly on the probability of repeating the last roll, and is also slightly higher for the lower of two equiprobable sums (e.g. $2$ and $12$), as might be expected.