Variance-Aware Prior-Based Tree Policies for Monte Carlo Tree Search

📰 ArXiv cs.AI

arXiv:2512.21648v2 Announce Type: replace-cross Abstract: Monte Carlo Tree Search (MCTS) has profoundly influenced reinforcement learning (RL) by integrating planning and learning in tasks requiring long-horizon reasoning, exemplified by the AlphaZero family of algorithms. Central to MCTS is the search strategy, governed by a tree policy based on an upper confidence bound (UCB) applied to trees (UCT). A key factor in the success of AlphaZero is the introduction of a prior term in the UCB1-based

Published 14 Apr 2026
Read full paper → ← Back to Reads