Suppose there are $n$ slots in a machine. You can put a small ball in the machine and the ball will fall into one of the $n$ slots with equal probability. You can put in as many balls as you want and you can stop the game whenever you want. After the game stops, your total payoff will be calculated: for every slot having only one ball, you will be paid 1 dollar; for every slot having $k$ balls $k\geq2$, you will be penalized $k$ dollars. For example, if there are 3 slots in total, and one slot has $0$ balls, one slot has $1$ balls, one slot has $2$ balls, the payoff of the game is $1 - 2 = -1$.
What is your optimal strategy when you play the game? What is the expected payoff of the game if you play optimally?
This is a question I bump into some time ago in Quora, maybe a job interview question for some company. I have thought about it. It is easy to decide at a certain state, given the number of balls in each slot, the expected pay-off of putting in an extra ball. And use this to decide wether to stop or not. This seems to be a greedy strategy. However, how can we prove this is globally optimal? Also, how to calculate the expected payoff if we play optimally from the initial stage?
I suppose the exact optimal strategy is hard to find. Can we find the expected pay-off of GREEDY-A and GREEDY-B strategy form the first answer?
Not an answer / too long for comments
I haven't figured out how to play optimally, but here is an example where the "Greedy algorithm" (depending on one detail of the definition) may be non-optimal.
The term "Greedy algorithm" usually means you decide based on what will happen in the next move only. In this context, it means look at the expected incremental payoff (shorthand: $EIP$) for the next ball. If $EIP <0$, then Stop. If $EIP >0$, then Continue. In my experience, when $EIP=0$ it is often left unspecified what Greedy will do.
For the $n=4$ case, what Greedy does when $EIP=0$ turns out to make a big difference.
After one ball, the game state is (without loss) $(0,0,0,1)$. At this point:
$$EIP = \frac34 \times (+1) + \frac14 \times (-3) = 0$$
The $-3$ term comes from the fact that, if your second ball drops into the occupied slot, you turned a $+1$ into a $-2$ for an incremental payoff of $-3$. So GREEDY-A, which stops when $EIP \le 0$, would stop and have a payoff of $1$.
However, GREEDY-B decides to stop only when $EIP<0$, so it continues. In fact, if the second ball drops into the occupied slot, the next $EIP$ would actually be $\frac12 > 0$! This is because going $(0,0,0,1)\to (0,0,0,2)$ is worth $-3$, but going further $(0,0,0,2)\to(0,0,0,3)$ is only worth $-1$. So it is easy to see that GREEDY-B can be equivalently described as: keep going if $1$ slot is occupied but stop as soon as $2$ slots are occupied. (When another slot is newly occupied, by one ball, $EIP < 0$.)
So starting from $(0,0,0,1)$, there will be $X$ balls dropped into the occupied slot before one of the other slots is occupied, resulting in $(0,0,1,X+1)$ at game end.
$X$ is geometrically distributed with $P(X=k) = \frac34 \times 4^{-k}$ for $k = 0, 1, 2, \dots$
If $X=0$ the payoff is $2$; if $X>0$ the payoff is $1 - (X+1) = -X$
So GREEDY-B's expected payoff is:
$$2 \times P(X=0) -E[X] = 2 \times \frac34- {1/4 \over 1 - 1/4} = \frac32 -\frac13 = \frac76 > 1$$
To summarize: for $n=4$ GREEDY-B has an expected payoff of $\frac76$ while GREEDY-A has a smaller expected payoff of $1$. But the more important point (to me) is that I no longer have confidence that for any $n$, one of GREEDY-A or GREEDY-B will be optimal.