In a recent interview I received the following question (an optimisation/strategy game)...which left me a bit stumped. The rules of play, you start with 0 points, then:
Roll three fair six-sided dice;
Now you have the option:
Hold: i.e. accept the values shown on your dice as the score for your turn. There is a caveat, if two or more dice show the same values, then all of them are flipped upside down - e.g. 1 becomes 6
OR
reroll the dice: You may choose to hold any combination of the dice on the current value shown (so you can choose to keep 1 dice the same and then reroll the other two). Rerolling costs you 1 point – so during the game and perhaps even at the end your score may be negative. You can roll an infinite number of times...
My thoughts:
So clearly the best possible score is 18 and is achieved by rolling three 1s on the first roll The reroll penalty prevents rolling forever to get 18. If the value of the dice is greater than the expected value of rerolling them (accounting for the penalty), then you should stick... I guess what I am asking is how do I work out the expected value of rerolling them (accounting for the penalty) and how does this fit into the optimal strategy...
One important point is that, anytime in the game, your best action depends only on the dice on the table, and not on the number of penalties you've accumulated so far. Even if you have -20 in penalties, if your dice are bad, maybe you should reroll (and worry that the dice aren't fair after all).
Apart from that, I don't see much good properties on the problem, so to me it's a modeling problem and you have to compute it.
You start with an benefit function $f_0(\{x,y,z\})$ which is the value of your dice combination, and you compute successively the benefit if you also add the choice to reroll at most $i$ times : $f_{i+1}({x,y,z}) = \max f_i(\{x,y,z\}), (-1 + \mathbb E [f_i(\{X,y,z\})]), \dots$.
Since the $f_i$ are increasing and bounded, they converge (exponentially fast) to a $f_\infty$, and you can make your choice by comparing $f_\infty(\{x,y,z\})$ and $(-1 + \mathbb E [f_\infty(\{X,y,z\})])$, etc.
If you want the exact values of the expectation, you can see the transformation of $f_i$ to $f_{i+1}$ as a matrix multiplication (once you reached the point where the choice of the $\max$ will be stable), so you just have to find its stable point.