Solving the devil's penny puzzle

342 Views Asked by At

I was reading an article about the devil's penny puzzle. We are given $n$ boxes, one of which contains the devil's penny while the others contain an amount of money $a_1,\ldots,a_{n-1}$. These numbers are known to us but the boxes are shuffled. We open the boxes one by one and collect the cash. We can stop at any time and take the money home. However, if we open the box with the devil's penny we lose everything. The goal is to maximise the expected gain.

The article makes the intuitive claim that it is optimal to open boxes until you have collected at least half of the available money. The say this can be proofed using a Markov decision process. I would like to see the actual proof.

I assume we have to come up with a Bellman equation. How about:

$$F_s(x) = \max(x, \frac{s-1}s E(F_{s-1}(x'))$$

Where $F_s(x)$ means the expected gain under an optimal policy, having collected $x$ money already with $s$ boxes to go. We can either stop and take $x$ money home, or we open another box which is not the devil's penny with probability $\frac {s-1}s$ so that our money increases to some $x'$.

Is this correct? How can I get the optimal policy from here?

Bonus question: The article mentions we should keep playing until we have won a third of the available money if there are two devil's pennies. How come?

2

There are 2 best solutions below

0
On BEST ANSWER

Let the amount found so far be $a$, the amount yet to find be $b$, and there be $k$ boxes left to open. Finding the penny has a return $-a$. By the linearity of expectation, the expectation of opening one more box is $\frac 1k(b-a)$, which is positive any time $b \gt a$, which means we have not yet found half the money. The same argument applies if there are two pennies. Then the expectation is $\frac 1k(b-2a)$ as there are two boxes that can give us $-a$. We should stop when we have found a third of the total money.

0
On

Not a proof, just some thoughts, and perhaps a bit of insight.

If we were to open all of the boxes, we could expect (on average) to find half of the money before the penny, and half of the money after the penny.

If there are two pennies, we could expect (again, on average) to find one third of the money before the first penny, another third before the second penny, and the final third after the second penny.