ReSpinQuant: Efficient Layer-Wise LLM Quantization via Subspace Residual Rotation Approximation
📰 ArXiv cs.AI
arXiv:2604.11080v1 Announce Type: cross Abstract: Rotation-based Post-Training Quantization (PTQ) has emerged as a promising solution for mitigating activation outliers in the quantization of Large Language Models (LLMs). Global rotation methods achieve inference efficiency by fusing activation rotations into attention and FFN blocks, but suffer from limited expressivity as they are constrained to use a single learnable rotation matrix across all layers. To tackle this, layer-wise transformation
DeepCamp AI