How Pruning Reshapes Features: Sparse Autoencoder Analysis of Weight-Pruned Language Models

📰 ArXiv cs.AI

Researchers analyze how weight pruning affects language models' internal representations using Sparse Autoencoders

advanced Published 27 Mar 2026

Action Steps

Apply weight pruning to language models using magnitude and Wanda methods
Use Sparse Autoencoders as interpretability probes to analyze the reshaped feature geometry
Evaluate the effects of pruning on three model families: Gemma 3 1B, Gemma 2 2B, and Llama 3.2 1B
Analyze the results to understand how pruning reshapes the internal representations of language models

Who Needs to Know This

AI engineers and ML researchers benefit from this study as it provides insights into the effects of pruning on language models, which can inform model compression and interpretability strategies

Key Insight

💡 Weight pruning significantly alters the feature geometry of language models, which can impact their performance and interpretability