Re-Mask and Redirect: Exploiting Denoising Irreversibility in Diffusion Language Models

📰 ArXiv cs.AI

arXiv:2604.08557v1 Announce Type: cross Abstract: Diffusion-based language models (dLLMs) generate text by iteratively denoising masked token sequences. We show that their safety alignment rests on a single fragile assumption: that the denoising schedule is monotonic and committed tokens are never re-evaluated. Safety-aligned dLLMs commit refusal tokens within the first 8-16 of 64 denoising steps, and the schedule treats these commitments as permanent. A trivial two-step intervention - re-maskin

Published 13 Apr 2026

Read full paper → ← Back to Reads