Quantitative Introspection in Language Models: Tracking Emotive States Across Conversation

📰 ArXiv cs.AI

arXiv:2603.18893v2 Announce Type: replace Abstract: Tracking the internal states of large language models across conversations is important for safety, interpretability, and model welfare, yet current methods are limited. Linear probes and other white-box methods compress high-dimensional representations imperfectly and are harder to apply with increasing model size. Taking inspiration from human psychology, where numeric self-report is a widely used tool for tracking internal states, we ask whe

Published 14 Apr 2026
Read full paper → ← Back to Reads