How Controllable Are Large Language Models? A Unified Evaluation across Behavioral Granularities
📰 ArXiv cs.AI
arXiv:2603.02578v2 Announce Type: replace-cross Abstract: Large Language Models (LLMs) are increasingly deployed in socially sensitive domains, yet their unpredictable behaviors, ranging from misaligned intent to inconsistent personality, pose significant risks. We introduce SteerEval, a hierarchical benchmark for evaluating LLM controllability across three domains: language features, sentiment, and personality. Each domain is structured into three specification levels: L1 (what to express), L2
DeepCamp AI