CoSToM:Causal-oriented Steering for Intrinsic Theory-of-Mind Alignment in Large Language Models

📰 ArXiv cs.AI

arXiv:2604.10031v1 Announce Type: cross Abstract: Theory of Mind (ToM), the ability to attribute mental states to others, is a hallmark of social intelligence. While large language models (LLMs) demonstrate promising performance on standard ToM benchmarks, we observe that they often fail to generalize to complex task-specific scenarios, relying heavily on prompt scaffolding to mimic reasoning. The critical misalignment between the internal knowledge and external behavior raises a fundamental que

Published 14 Apr 2026

Read full paper → ← Back to Reads