TTKV: Temporal-Tiered KV Cache for Long-Context LLM Inference

📰 ArXiv cs.AI

arXiv:2604.19769v1 Announce Type: cross Abstract: Key-value (KV) caching is critical for efficient inference in large language models (LLMs), yet its memory footprint scales linearly with context length, resulting in a severe scalability bottleneck. Existing approaches largely treat KV states as equally important across time, implicitly assuming uniform precision and accessibility. However, this assumption contrasts with human memory systems, where memories vary in clarity, recall frequency, and

Published 23 Apr 2026
Read full paper → ← Back to Reads