Adaptive Multi-Subspace Representation Steering for Attribute Alignment in Large Language Models

📰 ArXiv cs.AI

arXiv:2508.10599v4 Announce Type: replace Abstract: Activation steering offers a promising approach to controlling the behavior of Large Language Models by directly manipulating their internal activations. However, most existing methods struggle to jointly steer multiple attributes, often resulting in interference and undesirable trade-offs. To address this challenge, we propose Multi-Subspace Representation Steering (MSRS), a novel framework for effective multi-attribute steering via subspace r

Published 28 Apr 2026
Read full paper → ← Back to Reads