Say you have $B$ boxes, each of which contains a prize of value $v_b$. You have a $B$-sided die, and you win the value of box $b$ by rolling $b$. You get $R$ rolls, and so can collect multiple prizes, but you can’t win the same box more than once.
Given you know $R$, $B$, and all the box values $v_b$,how do you choose to weight your die as to optimize your expected total reward?
My attempts so far I've tried two approaches. Both share the definition of expected value:
$U = \sum_b \sum_r u_{b,r} $
Where
$u_{b,r} = v_b (1 - w_b)^{r-1}w_b$
The difference in my two (failed) methods is in the expression of the constraints. In one method, I let
$w_b = W_b/\sum_i W_i$
and find the critical points by differentiating with respect to every $W_b$ and setting to zero.
In the other method, I use Lagrange multiplier to express the constraint. But in both, I keep ending up with contradictions.
Let's try a reduced version at the problem: I'm even having a problem with the very simple case where R=1 and B=2 is. In this case, the expected value is
$U = v_1w_1 + v_2w_2$
and the constraint is
$w_1 + w_2 = 1$
The answer I'd expect is that $w_1=1$ if $v_1 > v_2$ and $w_1=0$ if $v_1 < v_2$ But, using the Lagrange multiplier method I get
$L = v_1w_1 + v_2w_2 - \lambda (1 - w_1 - w_2)$
and thus the useless set of constraints
$0=v_1 - \lambda$
$0=v_2 - \lambda$
$0=1 - w_1 - w_2$
It's been a very long time since I've had to think about multivariable calculus, so I suspect I'm doing something silly. Since this simplified function is just a plane, I'm not sure the differential method of finding maxima should work, even with the constraint. If I understand my problem with this simple case, maybe I can figure it out for the general problem.
Context I'm working on an analogous problem in the context of developing a resource distribution program.
The attempt at a solution above is flawed, due to the fact that it doesn't take into account that each box can be only earned once. This creates a natural notion of stopping time within the problem that changes it's structure completely.
Let's keep this to $B=2$ now and generalize later. The weights of the die being $p_1, p_2$, we would like to calculate the expectation value of the winnings after $R$ attempts. After $R$ attempts it is still possible that the game has not ended yet; the strings of rolls (11...1), (22...2) satisfy the condition. These are also the only strings in the game that earn you less than $v_1+v_2$ (all the others eventually stopped before the R-th attempt yielding the full amount). The winnings EV is
$$\langle W_2\rangle=\sum_i W_i P_{i}=v_1 p(11...1)+v_2p(22...2)+(v_1+v_2)(1-p(11..1)-p(22...2))$$
or more explicitly $\langle W\rangle=v_1(1-p_2^R)+(1-p_1^R)v_2$
We can now use Lagrange multipliers on this to yield that the values that maximize the average winnings are
$$p_1=\frac{v_1^{\frac{1}{R-1}}}{v_1^{\frac{1}{R-1}}+v_2^{\frac{1}{R-1}}}~~, ~~p_2=\frac{v_2^{\frac{1}{R-1}}}{v_1^{\frac{1}{R-1}}+v_2^{\frac{1}{R-1}}}$$
For $B=3$ we consider the same setup and find the strings that do not terminate the game after $R$ attempts. These are comprised of the strings that contain only one or two of the 3 potential outcomes. Counting the probability of those strings is relatively simple and the expected winnings are
$$\langle W_3 \rangle=\sum_{i=1}^3 v_ip_i^R+(v_1+v_2)(p_1+p_2)^R+(v_2+v_3)(p_2+p_3)^R+(v_3+v_1)(p_3+p_1)^R+(v_1+v_2+v_3)(1-\sum_i p_i^R-(p_1+p_2)^R-(p_2+p_3)^R-(p_3+p_1)^R)$$
This already has become too difficult to maximize analytically. This can be generalized to a die of arbitrary number of sides $B$- just consider $S=$ all the $2^B-2$ subsets of the sets $\{1,2,..., B\}$ that are not empty and not the set itself and the winnings expected are
$$\langle W_B\rangle=\sum_{s\in S}\left[\left(\sum_{i\in s} v_i\right)\left(\sum_{j\in s}p_j\right)^R\right]+\left(1-\sum_{s\in S}\left(\sum_{i\in s}p_i\right)^R\right)\sum_{j=1}^{B}v_j$$
Still, other than the obvious situation when $v_1=v_2=...v_B$ where the maximization is achieved by $p_1=p_2=...=p_B=\frac{1}{B}$ for other winnings per box, the solution is not to my knowledge available analytically, and one has to resort to computational methods.