Demonstration of the UCB formula used in Monte-Carlo Tree search

891 Views Asked by At

I am really interested in Monte-Carlo Tree Search, and its "UCB1" formula : $$S_i =x_i + C\sqrt{\frac{\ln(t)}{n_i}}$$ $S_i$ is the "Score" of a node $i$, $x_i$ its empirical mean, $C$ a constant, $t$ the total number of simulations, and $n_i$ is the number of simulations of the node $i$ (if I understood correctly).
The term $C\sqrt{\frac{\ln(t)}{n_i}}$ is supposed to represent the "minimization of the regret", or how confident you are in $x_i$ : confidence interval, for example in the case of a multi-armed bandit
However, I searched and I can't find a mathematical demonstration for the formula of this term.

Can anyone help me ?

edit : I found the paper of the people who introduced it : https ://link.springer.com/content/pdf/10.1023%2FA%3A1013689704352.pdf, but I fail to understand the logic behind what THM 1 is saying and the formula above.