SMART: When is it Actually Worth Expanding a Speculative Tree?
📰 ArXiv cs.AI
arXiv:2604.09731v1 Announce Type: cross Abstract: Tree-based speculative decoding accelerates autoregressive generation by verifying a branching tree of draft tokens in a single target-model forward pass. However, existing methods prioritize maximizing token-level likelihood or the number of accepted tokens while ignoring a critical ``efficiency paradox'': the computational overhead of drafting and verifying big trees can grow super-linearly, particularly at scale. This often leads to negative w
DeepCamp AI