Ask don't tell: Reducing sycophancy in large language models
📰 ArXiv cs.AI
arXiv:2602.23971v3 Announce Type: replace-cross Abstract: Sycophancy, the tendency of large language models to favour user-affirming responses over critical engagement, has been identified as an alignment failure, particularly in high-stakes advisory and social contexts. While prior work has documented conversational features correlated with sycophancy, we lack a systematic understanding of what provokes or prevents AI sycophancy. Here, we present a set of controlled experimental studies where w
DeepCamp AI