Selective Aggregation of Attention Maps Improves Diffusion-Based Visual Interpretation

📰 ArXiv cs.AI

arXiv:2604.05906v1 Announce Type: cross Abstract: Numerous studies on text-to-image (T2I) generative models have utilized cross-attention maps to boost application performance and interpret model behavior. However, the distinct characteristics of attention maps from different attention heads remain relatively underexplored. In this study, we show that selectively aggregating cross-attention maps from heads most relevant to a target concept can improve visual interpretability. Compared to the dif

Published 8 Apr 2026

Read full paper → ← Back to News