Multimodal Function Vectors for Visual Relations

📰 ArXiv cs.AI

arXiv:2510.02528v2 Announce Type: replace Abstract: Large Multimodal Models (LMMs) demonstrate impressive in-context learning abilities from few multimodal demonstrations, yet the internal mechanisms supporting such task learning remain opaque. Building on prior work of Large Language Models, we show that a small subset of attention heads in Large Multimodal Models is responsible for transmitting representations of visual relations. The activations of these attention heads, termed function vecto

Published 2 Jun 2026

Read full paper → ← Back to Reads