SMART: When is it Actually Worth Expanding a Speculative Tree?

📰 ArXiv cs.AI

arXiv:2604.09731v1 Announce Type: cross Abstract: Tree-based speculative decoding accelerates autoregressive generation by verifying a branching tree of draft tokens in a single target-model forward pass. However, existing methods prioritize maximizing token-level likelihood or the number of accepted tokens while ignoring a critical ``efficiency paradox'': the computational overhead of drafting and verifying big trees can grow super-linearly, particularly at scale. This often leads to negative w

Published 14 Apr 2026

Read full paper → ← Back to Reads