MaskForge: Structure-Aware Adaptive Attacks for Jailbreaking Diffusion Large Language Models
📰 ArXiv cs.AI
arXiv:2606.04027v1 Announce Type: cross Abstract: Diffusion large language models (dLLMs) generate text by iteratively denoising partially masked sequences under bidirectional context, exposing a safety surface distinct from autoregressive LLMs. Because mask tokens are native inputs and tokens are committed by confidence rather than position, harmful content can be induced through infilling and outside the monitored prefix. Existing jailbreaks either miss this native infill capability or rely on
DeepCamp AI