Diffusion-CAM: Faithful Visual Explanations for dMLLMs

📰 ArXiv cs.AI

arXiv:2604.11005v1 Announce Type: new Abstract: While diffusion Multimodal Large Language Models (dMLLMs) have recently achieved remarkable strides in multimodal generation, the development of interpretability mechanisms has lagged behind their architectural evolution. Unlike traditional autoregressive models that produce sequential activations, diffusion-based architectures generate tokens via parallel denoising, resulting in smooth, distributed activation patterns across the entire sequence. C

Published 14 Apr 2026

Read full paper → ← Back to Reads