Optimal strategy for game: two players on a 1d board, determine fraction of troop deployment

109 Views Asked by At

There is a 1 dimension board with 7 tiles. You and the enemy are on tile 4.

You win if you reach tile 7 and you lose if the enemy reaches tile 1.

In a single turn you determine the fraction of your own army to deploy. Same with your enemy. Whoever deploys more troops wins, advances 1 square, and loses all troops that were deployed. The losing army retreats a square and loses no troops. If both of you deploy the same number of soldiers, you lose. What percentage of troops should you deploy on the first turn?

I have no idea on how to approach this type of question as I've never seen anything like it.

1

There are 1 best solutions below

0
On BEST ANSWER

I tried to solve the problem using Dynamic Programming. However, the formulation is a bit ambiguous (maybe this is what user26 was talking about in the comments when he asked about poker theory). The ambiguity arises from the fact that we don't know which strategy the opponent will use. So I tried to play against different strategies.

Mathematical formulation

I tried to write the rules in a mathematical way. Each situation of the game is identified from three parameters:

-$x$ is the tile where both the players are in (an integer from 1 to 7);

-$\bar{u}$ is the amount of army that player A still has (a real in $[0,1]$);

-$\bar{v}$ is the amount of army that player B still has (a real in $[0,1]$).

The initial situation is $(4,1,1)$.

The "dynamics" of the game can be formulated as follows. Starting from a situation $(x,\bar{u},\bar{v})$, the players choose two moves $u$ and $v$ respectively. Then:

-if $u≥ v$, player A advances, so the next state will be $(x-1,\bar{u}-u,\bar{v}-v)$;

-if $u<v$, player B advances, so the next state will be $(x+1,\bar{u}-u,\bar{v}-v)$.

The game ends at the first turn $t \in \mathbb{N}$ such that either $x=1$ or $x=7$.

1° case: Player A moves after Player B and optimally

A characteristic of this game which makes it harder to me is that the moves are taken simultaneously. In fact, in other even more complex games like chess, players move one at a time.

I tried to re-formulate the game in this way, so that player A (which is pushing towards 1) plays always after player B (which is pushing towards 7). This allows a differential game formulation of the problem and allows the use of dynamic programming with two players, where each of them tries to maximize his own chance of victory.

In this case, one can define the value function, which player B wants to maximize. The value function quantifies how good is a certain situation $(x,\bar{u},\bar{v})$ for player B. Starting from a situation $(x,\bar{u},\bar{v})$, we define the best or optimal move the one which maximizes the worst possible case.

From dynamic programming, the value function solves this set equations (it should be a particular case of the so-called Hamilton-Jacobi-Isaacs equation of differential games):

$$ \left\{ \begin{aligned} V(x,\bar{u},\bar{v}) &= \max_{v \in [0,\bar{v}]} \min_{u \in [0,\bar{u}]} F_{x,\bar{u},\bar{v}}(u,v) \qquad && x = 2, \dots, 6, \ \bar{u} \in [0,1], \ \bar{v} \in [0,1] \\ V(1,\bar{u},\bar{v}) &= -1 \qquad && \bar{u} \in [0,1], \ \bar{v} \in [0,1] \\ V(7,\bar{u},\bar{v}) &= +1 \qquad && \bar{u} \in [0,1], \ \bar{v} \in [0,1] \end{aligned} \right. $$

where $$ F_{x,\bar{u},\bar{v}}(u,v) = \begin{cases} V(x+1,\bar{u}-u,\bar{v}-v), \qquad \textrm{if $u<v$} \\ V(x-1,\bar{u}-u,\bar{v}-v), \qquad \textrm{if $u≥v$} . \end{cases} $$

I will try to explain why this formula makes sense:

I am assuming that player A moves after player B, so for each move $v \in [0,\bar{v}]$ that player B can make, we consider all possible moves $u \in [0,\bar{u}]$ that player A will be able to do and choose the best move from the point of view of player A. Then, among all possible moves $v$, we pick the one which maximizes the worst case possible. The argument $F$ of the max-min operator is the value function computed on the next state of the system. The "boundary conditions" on $V(1,\bar{u},\bar{v})$ and $V(7,\bar{u},\bar{v})$ are natural and explicit the fact that player B wins a positive price when the game ends in 7 and a negative price when the game ends in 1.

The formula above is implicit, since the RHS depends on the value function $V$ itself. However, usually these kinds of equations are contraction operators (I didn't check that this is actually a contraction) and admits a unique fixed point by the Banach-Caccioppoli theorem.

At this time, I discretized the moves space $[0,1]$ (for instance, using 11 nodes you get $U = \{0,0.1,\dots,0.9,1\}$ but you can use even 101 or 1001 nodes) and I solved the fixed point equation in Matlab. In order to solve it, I used a classical fixed point method.

The result is that if both players start with an army of 1, player A will always win, since we are assuming that he moves in second position. However, when the army of player A starts smaller than the one of player B (for instance 0.5 vs 1), there exists a winning strategy for player B (and he will always win using that strategy, because he is trained against the worst scenario!).

2° case: Player A moves randomly (but simultaneously)

If we assume that player A picks a random move at each turn, we can see the game as a 1-player optimal control problem, where player B tries to maximize the average price, averaging over the possible moves of player A.

The dynamic programming equation will become $$ \left\{ \begin{aligned} V(x,\bar{u},\bar{v}) &= \max_{v \in [0,\bar{v}]} \frac{1}{\bar{u}} \int_0^\bar{u} F_{x,\bar{u},\bar{v}}(u,v)du \qquad && x = 2, \dots, 6, \ \bar{u} \in [0,1], \ \bar{v} \in [0,1] \\ V(1,\bar{u},\bar{v}) &= -1 \qquad && \bar{u} \in [0,1], \ \bar{v} \in [0,1] \\ V(7,\bar{u},\bar{v}) &= +1 \qquad && \bar{u} \in [0,1], \ \bar{v} \in [0,1] \end{aligned} \right. $$

Notice that if we discretize the moves space, the integral becomes a finite sum and the functional to minimize becomes an arithmetic average.

I simulated this problem as well and obtained more interesting results. The value function in $(4,1,1)$ equals 0.95. This means that starting from $(4,1,1)$ and following the optimal strategy for the whole game, player B manages to win 95% of the games, assuming that player A moves randomly. The best first move for B is given again by the dynamic programming, minimizing that average. In this case player B should start using 0.17 of the army.

3° case: Player A and B play optimally and simultaneously

This seems the correct way to interpret the text, or at least this is what I understand from the text. However, if A and B start both with an army of 1 and play the same optimal strategy, the game will certainly end with A winning. In fact, one of the rule is that in case that the two players use the same amount of soldiers, player A advances.

It may be that this little imbalance will produce also different optimal strategies, but player A will win anyway (if they both play optimally).

Finally, if we allowed player B to start with a larger army, the problem would become more interesting and less trivial. But honestly I didn't think about this variant.