Diffusion-CAM: Faithful Visual Explanations for dMLLMs
📰 ArXiv cs.AI
arXiv:2604.11005v1 Announce Type: new Abstract: While diffusion Multimodal Large Language Models (dMLLMs) have recently achieved remarkable strides in multimodal generation, the development of interpretability mechanisms has lagged behind their architectural evolution. Unlike traditional autoregressive models that produce sequential activations, diffusion-based architectures generate tokens via parallel denoising, resulting in smooth, distributed activation patterns across the entire sequence. C
DeepCamp AI