What does it mean for the Monte Carlo Tree Search result to converge to the minimax algorithm result?

136 Views Asked by At

Consider a perfect information board game, where an outcome can be either a win or a loss.

Suppose we try to find the best move using the Monte Carlo Tree Search and after many iterations the following scores are given for the game tree:

enter image description here

Here $P$ is the starting position, $A, B, C, D$ are the positions resulting from the 4 possible moves. Next to each position two statistics $w(n)$ and $s(n)$ are shown:

  • $w(n)$ denotes the number of times a win has occurred starting from the position $n$
  • $s(n)$ denotes the number of times a position $n$ was chosen during the simulation process.

Question: what does it mean that in the limit the statistics of the MCTS converge to the output of the minimax algorithm?

Suppose minimax gives $+1$ for a winning position and $0$ for a loosing position. Does it mean that the small $w(A) / s(A)$ value indicates the position $A$ is most likely loosing and the high value $w(D) / s(D)$ indicates the position $D$ is most likely winning? Why when choosing the best move we choose the move $C$ even if $w(C) / s(C) \lt w(D) / S(D)$?