Concurrency without Model Changes: Future-based Asynchronous Function Calling for LLMs

📰 ArXiv cs.AI

arXiv:2605.15077v1 Announce Type: cross Abstract: Function calling, also known as tool use, is a core capability of modern LLM agents but is typically constrained by synchronous execution semantics. Under these semantics, LLM decoding is blocked until each function call completes, resulting in increasing end-to-end latency. In this work, we introduce AsyncFC, a pure execution-layer framework that decouples LLM decoding from function execution, enabling overlap between model decoding and function

Published 16 May 2026
Read full paper → ← Back to Reads