Neurons Speak in Ranges: Breaking Free from Discrete Neuronal Attribution
📰 ArXiv cs.AI
arXiv:2502.06809v3 Announce Type: replace-cross Abstract: Pervasive polysemanticity in large language models (LLMs) undermines discrete neuron-concept attribution, posing a significant challenge for model interpretation and control. We systematically analyze both encoder and decoder based LLMs across diverse datasets, and observe that even highly salient neurons for specific semantic concepts consistently exhibit polysemantic behavior. Importantly, we uncover a consistent pattern: concept-condit
DeepCamp AI