Scepsy: Serving Agentic Workflows Using Aggregate LLM Pipelines
📰 ArXiv cs.AI
arXiv:2604.15186v1 Announce Type: cross Abstract: Agentic workflows carry out complex tasks by orchestrating multiple large language models (LLMs) and tools. Serving such workflows at a target throughput with low latency is challenging because they can be defined using arbitrary agentic frameworks and exhibit unpredictable execution times: execution may branch, fan-out, or recur in data-dependent ways. Since LLMs in workflows often outnumber available GPUs, their execution also leads to GPU over
DeepCamp AI