Pyramid MoA: A Probabilistic Framework for Cost-Optimized Anytime Inference

📰 ArXiv cs.AI

arXiv:2602.19509v3 Announce Type: replace-cross Abstract: We observe that LLM cascading and routing implicitly solves an anytime computation problem -- a class of algorithms, well-studied in classical AI, that improve solutions as additional computation is allocated. We formalize this connection and propose Pyramid MoA, a hierarchical Mixture-of-Agents architecture governed by a decision-theoretic router that escalates queries only when necessary. We establish a Probabilistic Anytime Property wi

Published 14 Apr 2026
Read full paper → ← Back to Reads