How Controllable Are Large Language Models? A Unified Evaluation across Behavioral Granularities

📰 ArXiv cs.AI

arXiv:2603.02578v2 Announce Type: replace-cross Abstract: Large Language Models (LLMs) are increasingly deployed in socially sensitive domains, yet their unpredictable behaviors, ranging from misaligned intent to inconsistent personality, pose significant risks. We introduce SteerEval, a hierarchical benchmark for evaluating LLM controllability across three domains: language features, sentiment, and personality. Each domain is structured into three specification levels: L1 (what to express), L2

Published 14 Apr 2026

Read full paper → ← Back to Reads