How Quantization Changes Interpretable Features: A Sparse Autoencoder Analysis of Language Models
📰 ArXiv cs.AI
arXiv:2606.03002v1 Announce Type: cross Abstract: Quantization is a standard path to deploying large language models, and a quantized model is typically judged acceptable when its perplexity or downstream accuracy stays close to the full-precision original. Whether the model still computes in the same way, or whether the interpretable features identified in the full-precision model survive weight rounding, is rarely tested, even as safety audits and steering interventions increasingly rely on th
DeepCamp AI