Let's assume A and B play a game in which A wants a high outcome while B wants a low outcome. A can choose top or bottom and B can choose left or right. Depending on their choices the outcome is $$\begin{bmatrix}0 & 3 \\\ 2 & -2\end{bmatrix}$$
For A the top choice is better on average (3 vs 0). For B the right choice is better on average (1 vs 2). However:
- If B assumes that A picks top it would pick left.
- If A assumes that B picks left it would pick bottom.
- If B assumes that A would pick bottom it picks right.
- If A assumes that B picks right it would pick top.
I would like to implement an algorithm which computes the best choice for A and B based on an outcome matrix like above. In my actual use case the matrix is much bigger with many more choices for A and B. I can weed out the "inferior" strategies, but how do I deal with these "logical cycles" when there is no optimal strategy? If I would make a choice at random how would I assign a probability?
For a $2 \times 2$ game the principle is that each player should choose the strategies in a proportion that means their expected return is independent of what the other player does. If they choose any other mix the other player can take advantage of it. This is a Nash equilibrium.
In your example, let $A$ choose top with probability $p$. If $B$ chooses left with probability $q$, $A's$ expectation is $0pq+3p(1-q)+2(1-p)q-2(1-p)(1-q)=-7 p q + 5 p + 4 q - 2$ If we don't want to care about $q$ we can say the coefficient of $q$ must be $0$, so $-7p+4=0, p=\frac 47$. This makes $A$'s expectation $5 \cdot \frac 47-2=\frac 67$
For larger games it is more complicated and I recommend a text in game theory.