Optimal strategy for the Generalized Monty Hall game

293 Views Asked by At

The Generalized Monty Hall game, H(n,p), proceeds in rounds. At the beginning, there are n doors, and one door has a prize behind it. In each round:

  1. Player picks a door.
  2. With probability p, that door is opened, the player wins the revealed prize, if any, and the game ends immediately.
  3. Otherwise, the host opens a non-prize door other than the selected one and the next round begins.

If the selected door is the only remaining closed door, then the player wins and the game ends.

The goal is obviously to come up with the optimal strategy to maximize the chance of getting the prize for all n and p.

During the discussion here, it was noted that, as the host cannot open the selected door, the assessed probability of the prize being behind that door cannot be increased when the host opens the door; the host gives no information at all about this door, so the assessed probability stays the same while the probabilities of all doors remaining closed goes up.

Based on this fact, it is fairly clear that:

  • If p is small (e.g. less than 1/n), the optimal strategy is to keep picking the same door until there are only two closed doors, then switch to the other door. (From this follows the well-known strategy for the standard Monty Hall Problem H(3,0).)
  • If p is large (how large?), the optimal strategy is to switch to a different closed door (the one that has the current highest assessed probability of having the prize) in every round, since the game is very likely to end immediately.

Between these two extremes, I know basically nothing.

Based on the extremes, my guess is that the best pure strategy looks like "Pick the same door for the first f(n,p) rounds, then switch to the highest probability door in all remaining rounds."

Is this correct?

What can we say about f? Is it possible for it to return a value other than 1 or n-2? If no, what's the threshold for p? If yes, what exactly is the relationship between p and the output of f?

1

There are 1 best solutions below

0
On

Optimal strategy with one door: Pick it, you win with probability $1$

Optimal strategy with two doors and a-priori probabilities $p_1\ge p_2$. Pick door 1, you win with probability $p_1$.

Optimal strategy with three doors and a-priori probabilities $p_1\ge p_2\ge p_3$: If we pick door $i$, we win immediately with $pp_i$. With $(1-p)$, the host reveals a door, which causes the third door to acquire probability $1-p_i$. And according to the optimal strategy for $n=2$, we win with $\max\{p_i,1-p_i\}$. So we want to maximize $pp_i+(1-p)\max\{p_i,1-p_i\}$. If all $p_i$ are $\le \frac12$, we maximize $pp_i+(1-p)(1-p_i)$, i.e., if $p\le \frac12$ we minimize $p_i$ and if $p\ge \frac12$ we maximize $p_i$. And if one $p_i$ should be $\ge\frac12$, we compare $p_i$ with the best $pp_j+(1-p)(1-p_j)$ of the others.

It seems that things are getting complicated from here, hence I'll treat $n=4$ only for the case $p_1=p_2=p_3=p_4=\frac14$. If there is a second round, we are faced with $n=3$ and $p_1=p_2=\frac38>p_3=\frac14$. Hence from here on, we have winning probability $\frac14p+\frac34(1-p)=\frac34-\frac12p$ if $p\le \frac12$, and $\frac38p+\frac58(1-p)=\frac58-\frac14p$ if $p\ge\frac12$. Thus the overall winning probability is $\frac14p+(1-p)(\frac34-\frac12p)=\frac34-p+\frac12p^2$ if $p\le \frac12$ and $\frac14p+(1-p)(\frac58-\frac14p)=\frac58-\frac58p+\frac14p^2$ if $p\ge \frac12$. This decreases (smoothly except for $p=\frac12$, where the change of strategy occurs) from $\frac34$ to $\frac14$ as $p$ goes from $0$ to $1$. To treat the $n=5$ case, we should first treat $n=4$ with $p_1=p_2=p_3=\frac4{15}>p_4=\frac15$ and accordingly two different first round moves, but this is beginning to look like a task for a computer ...