CoSToM:Causal-oriented Steering for Intrinsic Theory-of-Mind Alignment in Large Language Models
📰 ArXiv cs.AI
arXiv:2604.10031v1 Announce Type: cross Abstract: Theory of Mind (ToM), the ability to attribute mental states to others, is a hallmark of social intelligence. While large language models (LLMs) demonstrate promising performance on standard ToM benchmarks, we observe that they often fail to generalize to complex task-specific scenarios, relying heavily on prompt scaffolding to mimic reasoning. The critical misalignment between the internal knowledge and external behavior raises a fundamental que
DeepCamp AI