Whether, Not Which: Mechanistic Interpretability Reveals Dissociable Affect Reception and Emotion Categorization in LLMs

📰 ArXiv cs.AI

Mechanistic interpretability reveals that LLMs have dissociable affect reception and emotion categorization mechanisms

advanced Published 25 Mar 2026

Action Steps

Use mechanistic interpretability to analyze LLMs' internal representations of emotion
Examine the difference between affect reception and emotion categorization in LLMs
Investigate how LLMs respond to emotional stimuli without explicit emotion keywords

Who Needs to Know This

AI researchers and engineers working with LLMs can benefit from understanding how these models process emotions, and product managers can use this knowledge to improve AI-powered products

Key Insight

💡 LLMs' emotion detection mechanisms can be dissociated from their reliance on explicit emotion keywords