Maximizing payout given two piles, followup

99 Views Asked by At

Here's a followup to my question here: Strategy for game with two piles, maximize the payout

Say I'm playing a game with two piles, both of which are initially empty. On each turn I can perform one of the following moves:

  1. Add a penny to one of the two piles at random.
  2. Pick a pile at random, and then take all of its contents for myself.

Assume a random number generator selects ahead of time, unbeknownst to me, the number of turns the game will last... it will be an integer between $100$ and $200$ inclusive, let's call it $X$. I play this game for $X$ turns. What's the expected value for this game, and what should the strategy be I take to maximize my payout?

Update: Henry asked:

Are the rules that (a) if you pick a pile under rule 2 then the game stops

No, the game still continues.

and (b) if arrives and you have never applied rule 2 then you get nothing?

Yes, that's right.

1

There are 1 best solutions below

0
On BEST ANSWER

Building a recurrence

Define a game state in terms of variables as follows:

  • $p$ = num coins currently in first pile
  • $q$ = num coins currently in second pile
  • $s$ = min possible number of remaining turns
  • $t$ = max possible number of remaining turns
  • $f(p,q,s,t)$ = best possible $E[payout]$ for the rest of this game given the state $(p,q,s,t)$.

Now we can write a recurrence based on what will happen on the very next turn. If $s > 0$ we have: $$f(p,q,s,t) = \max \Big\{\frac{p+f(0,q,s-1,t-1) + q+f(p,0,s-1,t-1)}{2}, \frac{f(p+1,q,s-1,t-1) + f(p,q+1,s-1,t-1)}{2} \Big\}$$ where the first option inside the $\max$ corresponds to taking a pile, and the second option corresponds to increasing a pile size.

If $s = 0$ we have $$f(p,q,0,t) = \frac{t}{t+1}\max \Big\{\frac{p+f(0,q,0,t-1) + q+f(p,0,0,t-1)}{2}, \frac{f(p+1,q,0,t-1) + f(p,q+1,0,t-1)}{2} \Big\}$$ which is mostly similar; the $\frac{t}{t+1}$ factor in front accounts for the possibility that the game might actually just end instead of letting us play this turn.

We also need some base cases. $f(p,q,0,0) = 0$ regardless of $p,q$: no more turns will be played, so it doesn't matter what's in the piles.

Getting an answer using Python

We want to find $f(0,0,100,200$. The recurrence built above looks quite messy, and it may be difficult/impossible to find a nice closed form solution. Fortunately, we can instead just plug into the recurrence over and over using a computer. This is called dynamic programming.

The total number of game states that could ever arise is pretty limited. We'll always have $p+q \le 200$ and $t \le 200$, and $s$ doesn't increase the state space because we'll always have $s = \max\{t-100,0\}$. That means we have around $200^3 = 8$ million states total, which is no problem for a computer.

I wrote a quick Python script to print the optimal expected score; you can see it here. Try editing the game length (via parameters MIN_END and MAX_END at the top) and then click Run to see how the optimal score changes. For your stated version with MIN_END = 100 and MAX_END = 200, the answer is 133.76.

Strategy insights

It seems clear that the best strategy should involve adding to piles for awhile, and then trying to reap the benefits near the end of the game. The Python program lets me see what strategy was optimal in different game situations. In this section I'll write some patterns I looked into for purposes of getting rough intuition about how to play optimally. (The numbers below are specific to the case MIN_END = 100, MAX_END = 200.)

If $p = q$, then we should Take if the max number of remaining turns is small, and Add if the max number of remaining turns is large. The cutoff point depends on $p$. In particular, the cutoff points for small $p$ are $\{ 2, 5, 11, 20, 31, 45, 62, 81, 102, 102, 102, \cdots \}$. Here's how to interpret that list: If $p=q=0$ then we should Add if there MAY be at least $2$ turns left in the game (we've taken $\le 198$ turns so far); otherwise we should Take. If $p=q=3$ and there MAY be at least 20 turns left in the game then we should Add, otherwise we should Take. And so on. Note the list still increases after those 102s, but very slowly.

If $q = 0$ then once again we should Take if the game's about to end and Add if there's more time left, so we can make another list of cutoff values. We expect these cutoffs to be smaller than the corresponding value in the list above, because Taking is less appealing when only one pile has coins. It turns out the new list of cutoffs is $\{2, 3, 5, 8, 11, 15, 20, 25, 31, 38, 45, 53, 62, 71, 81, 92, 102, 102, 102, \cdots\}$. Interpretation example: If $p=3,q=0$ then we should Add if there MAY be at least 8 turns left in the game (i.e. if we've taken $\le 192$ turns so far) and Take otherwise.

Full source code

In the interest of making this answer self-contained, I'm also pasting the full source code of my script here.

MIN_END = 100
MAX_END = 200

# 0 <= t <= MAX_END
# 0 <= p, q <= MAX_END+1 (we'll only use up to MAX_END, but having one extra slot makes indexing simpler)
f = [[[0 for t in range(MAX_END+1)] for q in range(MAX_END+2)] for p in range(MAX_END+2)]
strategy = [[[None for t in range(MAX_END+1)] for q in range(MAX_END+2)] for p in range(MAX_END+2)]
for t in range(1, MAX_END+1):
    for p in range(MAX_END+1):
        for q in range(MAX_END - p + 1):
            # compute and store f(p, q, s, t).
            # note we don't need to track s because it's fully determined by t.
            option_1 = .5 * (p + f[0][q][t-1] + q + f[p][0][t-1])  # represents taking a pile
            option_2 = .5 * (f[p+1][q][t-1] + f[p][q+1][t-1])  # represents adding to a pile
            if t <= MAX_END - MIN_END: scalar = t / (t+1)
            else: scalar = 1

            # store the results.
            f[p][q][t] = scalar * max(option_1, option_2)
            if option_1 >= option_2: strategy[p][q][t] = 't'  # t for "take"
            else: strategy[p][q][t] = 'a'  # a for "add"

print(f[0][0][MAX_END])