Characterizing Performance-Energy Trade-offs of Large Language Models in Multi-Request Workflows
📰 ArXiv cs.AI
arXiv:2604.09611v1 Announce Type: cross Abstract: Large language models (LLMs) are increasingly used in applications forming multi-request workflows like document summarization, search-based copilots, and multi-agent programming. While these workflows unlock richer functionality, they also amplify latency and energy demand during inference. Existing measurement and benchmarking efforts either focus on assessing LLM inference systems or consider single-request evaluations, overlooking workflow de
DeepCamp AI