FibQuant: Universal Vector Quantization for Random-Access KV-Cache Compression

📰 ArXiv cs.AI

arXiv:2605.11478v1 Announce Type: new Abstract: Long-context inference is increasingly a memory-traffic problem. The culprit is the key--value (KV) cache: it grows with context length, batch size, layers, and heads, and it is read at every decoding step. Rotation-based scalar codecs meet this systems constraint by storing a norm, applying a shared random rotation, and quantizing one coordinate at a time. They are universal and random-access, but they discard the geometry created by the normaliza

Published 13 May 2026

Read full paper → ← Back to Reads