LLM-Agnostic Semantic Representation Attack

📰 ArXiv cs.AI

arXiv:2605.08898v1 Announce Type: cross Abstract: Large Language Models (LLMs) increasingly employ alignment techniques to prevent harmful outputs. Despite these safeguards, attackers can circumvent them by crafting adversarial prompts. Predominant token-level optimization methods primarily rely on optimizing for exact affirmative templates (e.g., ``\textit{Sure, here is...}''). However, these paradigms frequently encounter bottlenecks such as suboptimal convergence, compromised prompt naturalne

Published 12 May 2026
Read full paper → ← Back to Reads