Problem statement: Here's a single-player probabilistic game. In front of you are $L$ urns, each containing bills of various values. You get $N$ chances to draw a bill from any urn you like, check its value, then put it back. You may draw from the same urn multiple times. On the $(N+1)$th draw, you get to keep the bill as your winnings from the game. How can you maximize your expected winnings?
If it makes it easier, you can assume $N > L$.
This might be too difficult to solve without making certain assumptions about the distributions of bills in each urn. A perfectly general solution wouldn't assume anything about bill denominations or the number of bills in each urn, but I'd still appreciate a less general solution.
My thoughts so far: there is probably something in the reinforcement learning literature related to this, but I haven't quite found it. The explore-exploit tradeoff is present in a weird form here. You lose nothing getting bad draws in the first $N$ rounds, so from one point of view, you should be fully focused on exploring. On the other hand, you still should explore promising urns more thoroughly. Maximizing information gain won't be the best strategy, because you care less about further information from the urns that already seem to have generally lower bill values.