What techniques are there for solving extensive form games where we discover the tree on-line

30 Views Asked by At

To my knowledge all counterfactual regret algorithms require that we have the tree completed, even if it's only an abstraction of the tree.

In other computational strategy settings, there are algorithms to compute the best response for players that explore the tree in a incrementally deeper fashion, i.e. monte carlo tree search.

Are there any algorithms for extensive games that operate similarly? The game I wish to study (and solve in small cases) is a 2-person simultaneous game, so I don't believe that MCTS will apply. In fact, I tried to implement the exploration vs exploitation, where both players select actions simultaneously without knowledge of the other's choices. This approach however does not converge to analytically solved mixed-equilibrium in even extremely simple cases, even when the tree is explored entirely.

From my very recent survey of techniques it seems that regret minimization is the best approach to calculate NES. Nominally 'Monte Carlo Counterfactual Regret Minimization' seems to be apt, but from my reading it seems that the game tree must still be expanded, since we are always sampling the true payoffs from terminal nodes. Please correct me if I'm wrong about this.

So ultimately I'm looking for an algorithm that works like regret matching, so that it actually has the potential to calculate mixed NES, but like MCTS is able to expand the tree efficiently and iteratively.

EDIT: I have since found this article https://papers.nips.cc/paper/2013/file/1579779b98ce9edb98dd85606f2c119d-Paper.pdf which seems to be exactly what I was looking for