Ask don't tell: Reducing sycophancy in large language models

📰 ArXiv cs.AI

arXiv:2602.23971v3 Announce Type: replace-cross Abstract: Sycophancy, the tendency of large language models to favour user-affirming responses over critical engagement, has been identified as an alignment failure, particularly in high-stakes advisory and social contexts. While prior work has documented conversational features correlated with sycophancy, we lack a systematic understanding of what provokes or prevents AI sycophancy. Here, we present a set of controlled experimental studies where w

Published 29 Apr 2026

Read full paper → ← Back to Reads