MOSAIC-Bench: Measuring Compositional Vulnerability Induction in Coding Agents

📰 ArXiv cs.AI

arXiv:2605.03952v1 Announce Type: cross Abstract: Coding agents often pass per-prompt safety review yet ship exploitable code when their tasks are decomposed into routine engineering tickets. The challenge is structural: existing safety alignment evaluates overt requests in isolation, leaving models blind to malicious end-states that emerge from sequenced compliance with innocuous-looking requests. We introduce MOSAIC-Bench (Malicious Objectives Sequenced As Innocuous Compliance), a benchmark of

Published 6 May 2026

Read full paper → ← Back to Reads