TokenButler: Token Importance is Predictable

📰 ArXiv cs.AI

arXiv:2503.07518v2 Announce Type: replace-cross Abstract: Large Language Models (LLMs) rely on the Key-Value (KV) Cache to store token history, enabling efficient decoding of tokens. As the KV-Cache grows, it becomes a major memory and computation bottleneck. However, there is an opportunity to alleviate this bottleneck, prior research has shown that only a small subset of tokens contribute meaningfully to each decoding step. A key challenge in finding these critical tokens is that they are dyna

Published 18 May 2026

Read full paper → ← Back to Reads