Maximize expected sum of uniformly chosen numbers

274 Views Asked by At

Each day for the next ten days, you are presented with a number chosen uniformly at random on the interval [0,1] (each day the number chosen is independent of previous days). You are allowed to either "keep" or "reject" the number on each day, however, you may only keep a maximum of three numbers over all ten days. Your goal is to maximize the expected sum of the three numbers you keep. What is your strategy?

For instance, you may have the scenario: Day 1: Number = 0.82, You choose to keep

Day 2: Number = 0.26, You choose to reject

Day 3: Number = 0.15, You choose to reject

Day 4: Number = 0.15, You choose to reject

Day 5: Number = 0.93, You choose to keep

Day 6: Number = 0.91, You choose to keep

Day 7: Number = 0.99, You choose to reject

Day 8: Number = 0.99, You choose to reject

Day 9: Number = 0.99, You choose to reject

Day 10: Number = 0.99, You choose to reject

Here, you pick keep on days 1, 5 and 6. Note that on day 6, you had no idea that days 8, 9 and 10 would have 0.99. In general, on day t, you only have the knowledge of the number of the current day, and the history of numbers/actions on days <t. You can't foresee the future.

===

1

There are 1 best solutions below

0
On

You can solve the problem via dynamic programming. The states are $(d,k,u)$, where $d$ is the number of days remaining, $k$ is the number of "keeps" remaining, and $u$ is the current uniform number. Let $V(d,k,u)$ be the maximum expected sum in state $(d,k,u)$, and note that $$V(d,k,u)=\max\left\{u+\int_0^1 V(d-1,k-1,t) \mathrm{d}t, \int_0^1V(d-1,k,t) \mathrm{d}t\right\},$$ with boundary conditions \begin{align} V(d,0,u)&=0 \\ V(0,k,u)&=0 \end{align} The optimal value is $\int_0^1 V(10,3,u)\mathrm{d}u$. The optimal strategy is to keep if $$u+\int_0^1 V(d-1,k-1,t) \mathrm{d}t \ge \int_0^1V(d-1,k,t)\mathrm{d}t$$ and reject otherwise, and so the strategy is succinctly expressed by the expected values $W(d,k)=\int_0^1 V(d,k,t) \mathrm{d}t$: \begin{matrix} d\backslash k &0 &1 &2 &3 \\ \hline 0 &0.000 &0.000 &0.000 &0.000\\ 1 &0.000 &0.500 &0.500 &0.500\\ 2 &0.000 &0.625 &1.000 &1.000\\ 3 &0.000 &0.695 &1.195 &1.500\\ 4 &0.000 &0.742 &1.320 &1.742\\ 5 &0.000 &0.775 &1.409 &1.909\\ 6 &0.000 &0.800 &1.476 &2.034\\ 7 &0.000 &0.820 &1.529 &2.132\\ 8 &0.000 &0.836 &1.571 &2.211\\ 9 &0.000 &0.850 &1.606 &2.276\\ 10 &0.000 &0.861 &1.636 &2.330 \end{matrix} These expected values imply thresholds $W(d-1,k)-W(d-1,k-1)$: \begin{matrix} d\backslash k &1 &2 &3 \\ \hline 1 & 0.000 & 0.000 & 0.000 \\ 2 & 0.500 & 0.000 & 0.000 \\ 3 & 0.625 & 0.375 & 0.000 \\ 4 & 0.695 & 0.500 & 0.305 \\ 5 & 0.742 & 0.579 & 0.421 \\ 6 & 0.775 & 0.634 & 0.500 \\ 7 & 0.800 & 0.676 & 0.558 \\ 8 & 0.820 & 0.708 & 0.603 \\ 9 & 0.836 & 0.735 & 0.639 \\ 10 & 0.850 & 0.757 & 0.669 \end{matrix} The interpretation is that if with $d$ days and $k$ keeps remaining, you should keep any number at least as big as the threshold.