Language Models Can Explain Visual Features via Steering

📰 ArXiv cs.AI

Language models can explain visual features via steering, a method based on causal interventions in Vision-Language Models

advanced Published 25 Mar 2026

Action Steps

Leverage the structure of Vision-Language Models to identify individual features
Apply steering to SAE features to generate explanations
Use causal interventions to analyze the relationship between language and visual features
Evaluate the effectiveness of the steering method in explaining visual features

Who Needs to Know This

AI researchers and engineers working on computer vision and natural language processing tasks can benefit from this approach to better understand and interpret visual features, and it can be applied by ml-researchers and ai-engineers in the development of more transparent and explainable AI models

Key Insight

💡 Language models can be used to explain visual features without requiring human intervention via steering