Stealthy Backdoor Attacks against LLMs Based on Natural Style Triggers

📰 ArXiv cs.AI

arXiv:2604.21700v1 Announce Type: cross Abstract: The growing application of large language models (LLMs) in safety-critical domains has raised urgent concerns about their security. Many recent studies have demonstrated the feasibility of backdoor attacks against LLMs. However, existing methods suffer from three key shortcomings: explicit trigger patterns that compromise naturalness, unreliable injection of attacker-specified payloads in long-form generation, and incompletely specified threat mo

Published 25 Apr 2026
Read full paper → ← Back to Reads