Understanding Sampling-Based Adversarial Search Methods
Until 2007, the best computer programs for playing the board game Go performed at the level of a weak amateur, while employing the same Minimax algorithm that had proven so successful in other games such as Chess and Checkers. Thanks to a revolutionary new sampling-based planning approach named Upper Confidence bounds applied to Trees (UCT), today's best Go programs play at a master level on full-sized 19 x 19 boards. Intriguingly, UCT's spectacular success in Go has not been replicated in domains that have been the traditional stronghold of Minimax-style approaches. The focus of this thesis is on understanding this phenomenon. We begin with a thorough examination of the various facets of UCT in the games of Chess and Mancala, where we can contrast the behavior of UCT to that of the better understood Minimax approach. We then introduce the notion of shallow search traps - positions in games where short winning strategies for the opposing player exist - and demonstrate that these are distributed very differently in different games, and that this has a significant impact on the performance of UCT. Finally, we study UCT and Minimax in two novel synthetic game settings that permit mathematical analysis. We show that UCT is relatively robust to misleading heuristic feedback if the noise samples are independently drawn, whereas systematic biases in a heuristic can cause UCT to prematurely "freeze" onto sub-optimal lines of play and thus perform poorly. We conclude with a discussion of the potential avenues for future work.
Minimax; Game Playing; Monte Carlo Tree Search
Crawford, Barbara A; Hopcroft, John E
Ph.D. of Computer Science
Doctor of Philosophy
dissertation or thesis