Language Models Can Explain Visual Features via Steering
📰 ArXiv cs.AI
Language models can explain visual features via steering, a method based on causal interventions in Vision-Language Models
Action Steps
- Leverage the structure of Vision-Language Models to identify individual features
- Apply steering to SAE features to generate explanations
- Use causal interventions to analyze the relationship between language and visual features
- Evaluate the effectiveness of the steering method in explaining visual features
Who Needs to Know This
AI researchers and engineers working on computer vision and natural language processing tasks can benefit from this approach to better understand and interpret visual features, and it can be applied by ml-researchers and ai-engineers in the development of more transparent and explainable AI models
Key Insight
💡 Language models can be used to explain visual features without requiring human intervention via steering
Share This
💡 Language models can explain visual features via steering! #AI #ComputerVision #NLP
Key Takeaways
Language models can explain visual features via steering, a method based on causal interventions in Vision-Language Models
Full Article
Title: Language Models Can Explain Visual Features via Steering
Abstract:
arXiv:2603.22593v1 Announce Type: cross Abstract: Sparse Autoencoders uncover thousands of features in vision models, yet explaining these features without requiring human intervention remains an open challenge. While previous work has proposed generating correlation-based explanations based on top activating input examples, we present a fundamentally different alternative based on causal interventions. We leverage the structure of Vision-Language Models and steer individual SAE features in the
Abstract:
arXiv:2603.22593v1 Announce Type: cross Abstract: Sparse Autoencoders uncover thousands of features in vision models, yet explaining these features without requiring human intervention remains an open challenge. While previous work has proposed generating correlation-based explanations based on top activating input examples, we present a fundamentally different alternative based on causal interventions. We leverage the structure of Vision-Language Models and steer individual SAE features in the
DeepCamp AI