Consider a perfect information board game, where an outcome can be either a win or a loss.
Suppose we try to find the best move using the Monte Carlo Tree Search and after many iterations the following scores are given for the game tree:
Here $P$ is the starting position, $A, B, C, D$ are the positions resulting from the 4 possible moves. Next to each position two statistics $w(n)$ and $s(n)$ are shown:
- $w(n)$ denotes the number of times a win has occurred starting from the position $n$
- $s(n)$ denotes the number of times a position $n$ was chosen during the simulation process.
Question: what does it mean that in the limit the statistics of the MCTS converge to the output of the minimax algorithm?
Suppose minimax gives $+1$ for a winning position and $0$ for a loosing position. Does it mean that the small $w(A) / s(A)$ value indicates the position $A$ is most likely loosing and the high value $w(D) / s(D)$ indicates the position $D$ is most likely winning? Why when choosing the best move we choose the move $C$ even if $w(C) / s(C) \lt w(D) / S(D)$?
