From Attribution to Action: A Human-Centered Application of Activation Steering

📰 ArXiv cs.AI

arXiv:2604.11467v1 Announce Type: new Abstract: Explainable AI (XAI) methods reveal which features influence model predictions, yet provide limited means for practitioners to act on these explanations. Activation steering of components identified via XAI offers a path toward actionable explanations, although its practical utility remains understudied. We introduce an interactive workflow combining SAE-based attribution with activation steering for instance-level analysis of concept usage in visi

Published 14 Apr 2026
Read full paper → ← Back to Reads