TurboAngle: Near-Lossless KV Cache Compression via Uniform Angle Quantization

📰 ArXiv cs.AI

TurboAngle compresses KV cache entries using uniform angle quantization in the Fast Walsh-Hadamard domain

advanced Published 31 Mar 2026

Action Steps

Quantize angles in the Fast Walsh-Hadamard domain to compress KV cache entries
Apply random diagonal rotation to make consecutive element pairs approximately uniformly distributed on the unit circle
Extend the angular quantizer with per-layer early-boost to configure K and V codebook sizes at each layer
Allocate higher precision to critical layers using model-specific subset selection

Who Needs to Know This

This research benefits AI engineers and ML researchers working on model compression and optimization, as it provides a new approach to reducing memory usage while maintaining model performance

Key Insight

💡 Uniform angle quantization in the Fast Walsh-Hadamard domain can achieve near-lossless compression of KV cache entries