Which button to press?

222 Views Asked by At

I have two buttons and one gold ingot. Button A has probability $p_1$ of giving you the gold each time you press it. Button B has prob $p_2$.

It costs $c_1$ to press button A and $c_2$ to press button B. You want to find the gold with the minimum expected cost. To do this you would compute $p_1/c_1$ and compare it to $p_2/c_2$ and repeatedly press whichever button gives the bigger ratio.

But then you learn that each time you press a button, the cost of pressing that button again goes up by $d>0$.

What should you do?

The most obvious strategy is to recalculate the two probability/cost ratios and each time choose the button with the largest ratio. Is this greedy strategy optimal and if so, how can you prove it?

1

There are 1 best solutions below

1
On BEST ANSWER

The greedy strategy is optimal.


A state of the game can be described as a pair of integers $(i, j)$ where $i \geq 0$ and $j \geq 0$ — how many times button $A$ and button $B$ were pressed, respectively.

A strategy for the game corresponds to an infinite path on integer lattice, originating from $(0, 0)$, and always going either one square right, or one square up. Its vertices correspond to strategy states. Here's an example of a strategy that starts with button presses $ABBAABABAAABB...$

example strategy

Note, that there could be infinitely many optimal strategies, e.g. if $p_1 = p_2$ and $c_1 = c_2$.

Lemma 1. Take any state $(i, j)$ and consider a modified game in which this state is initial. Any strategy for this game has finite expected cost and there exists an optimal strategy. (Proof omitted)

Lemma 2. Optimal strategy does not end with pressing the same button indefinitely. (Proof omitted).

In terms of paths it means, that path corresponding to an optimal strategy does not end with a horizontal or vertical ray, in other words, it contains infinitely many number of "turns".

Lemma 3. Consider any L-shaped part of an optimal path, specifically: a part that starts with $n \geq 1$ edges going right and ends with a single edge going up. This corresponds to a segment of optimal strategy $...\underbrace{AAA...A}_\text{n times}B...$. Let the costs of buttons $A$ and $B$ in the first vertex of this L-shaped part be $\bar{c}_1$ and $\bar{c}_2$. Then $\bar{c}_1 / p_1 \leq \bar{c}_2 / p_2$. (Proof will be given in the end).

Observation. Lemma 3 is formulated for strategy substrings $AAA...AB$, but, if it is true, by symmetry the same should also apply for substrings $BBB...BA$, that is in the first vertex of such substrings $\bar{c}_1 / p_1 \geq \bar{c}_2 / p_2$.

If we combine lemmas 2, 3 and the observation, we see that any vertex of some optimal path is the first vertex of some "L shape". Which means in every vertex a greedy choice was made. The greedy strategy is ambiguous in the case when there's an equality, that is when $\bar{c}_1 / p_1 = \bar{c}_2 / p_2$. If we follow the proof of lemma 3 we should be able to see, that both choices in such vertices has the same expected cost, hence both lead to optimal strategies.


Proof of Lemma 3

See the picture below. The red path $...AAA...AB...$ is the optimal one. The green path $BAAA...A$ is alternative "detour". Because red path is optimal, changing part of it to this "detour" cannot improve it.

lemma 3 proof

Let $q_1 = 1 - p_1$ and $q_2 = 1- p_2$. Let expected cost of the end state $(i + n, j + 1)$ be $M$ (this is the cost of optimal strategy for a game that starts in that state).

Consider the first state of L-shape part, state $(i, j)$. If we start the game from this state and follow the red path, its expected cost is

$$ (\text{Red}) = \color{blue}{\bar{c}_1 + q_1 (\bar{c}_1 + d) + q_1^2 (\bar{c}_1 + 2d) + \ldots + q_1^{n-1} (\bar{c}_1 + (n-1)d)} + q_1^n \bar{c}_2 + q_1^n q_2 M, $$

if we follow the green path, then it's expected cost is

$$ (\text{Green}) = \bar{c}_2 + q_2 \left( \color{blue}{\bar{c}_1 + q_1 (\bar{c}_1 + d) + q_1^2 (\bar{c}_1 + 2d) + \ldots + q_1^{n-1} (\bar{c}_1 + (n-1)d)} + q_1^n M \right). $$

Red path is optimal, which means $(\text{Green}) \geq (\text{Red})$. In this inequality summand $q_1^n q_2 M$ cancels out. We separate $p_1, q_1, \bar{c}_1$ to the right and $p_2, q_2, \bar{c}_2$ to the left:

$$ \frac{\bar{c}_2}{p_2} \geq \frac{\color{blue}{\bar{c}_1 + q_1 (\bar{c}_1 + d) + q_1^2 (\bar{c}_1 + 2d) + \ldots + q_1^{n-1} (\bar{c}_1 + (n-1)d)}}{1 - q_1^n} = \frac{\bar{c}_1(1 + q_1 + q_1^2+...+q_1^{n-1}) + \overbrace{q_1d + q_1^2 \cdot 2d + ... + q_1^{n-1} \cdot (n-1) d }^{\geq 0} }{p_1(1+q_1+q_1^2 + ... + q_1^{n-1})} \geq \frac{\bar{c}_1}{p_1} $$

QED.

From proof we see, that equality is only possible when $n = 1$ or $q_1 = 0$ or $d = 0$.


Note. The proof actually can be extended to games, where cost increases as a monotonous function instead of as an arithmetic progression, as long as lemma 1 holds.