Saliency-Aware Regularized Quantization Calibration for Large Language Models

📰 ArXiv cs.AI

arXiv:2605.05693v1 Announce Type: new Abstract: Post-training quantization (PTQ) is an effective approach for deploying large language models (LLMs) under memory and latency constraints. Most existing PTQ methods determine quantization parameters by minimizing a layer-wise reconstruction error on a predetermined calibration dataset, usually optimized via either scale search or Gram-based methods. However, from the perspective of generalization risk, existing calibration objectives of PTQ based o

Published 9 May 2026

Read full paper → ← Back to Reads