Adaptive Prompt Embedding Optimization for LLM Jailbreaking
📰 ArXiv cs.AI
arXiv:2604.24983v1 Announce Type: new Abstract: Existing white-box jailbreak attacks against aligned LLMs typically append discrete adversarial suffixes to the user prompt, which visibly alters the prompt and operates in a combinatorial token space. Prior work has avoided directly optimizing the embeddings of the original prompt tokens, presumably because perturbing them risks destroying the prompt's semantic content. We propose Prompt Embedding Optimization (PEO), a multi-round white-box jailbr
DeepCamp AI