Theory of Mind and Self-Attributions of Mentality are Dissociable in LLMs

📰 ArXiv cs.AI

Research dissociates Theory of Mind and self-attributions of mentality in Large Language Models (LLMs)

advanced Published 1 Apr 2026
Action Steps
  1. Investigate the relationship between Theory of Mind (ToM) and self-attributions of mentality in LLMs
  2. Conduct safety ablation and mechanistic analyses of representational similarity to understand the impact of suppressing mind-attribution tendencies on ToM
  3. Analyze the results to determine if ToM and self-attributions of mentality are dissociable in LLMs
  4. Apply the findings to improve safety fine-tuning and socio-cognitive abilities in LLMs
Who Needs to Know This

AI researchers and engineers working on LLMs and safety fine-tuning can benefit from this research to improve model performance and safety, while ML researchers can apply these findings to develop more sophisticated socio-cognitive abilities in AI models

Key Insight

💡 Theory of Mind and self-attributions of mentality can be separated in LLMs, allowing for more targeted safety fine-tuning

Share This
💡 New research shows Theory of Mind & self-attributions of mentality are dissociable in LLMs #AI #LLMs
Read full paper → ← Back to News