Cross-Lingual Jailbreak Detection via Semantic Codebooks

📰 ArXiv cs.AI

arXiv:2604.25716v1 Announce Type: cross Abstract: Safety mechanisms for large language models (LLMs) remain predominantly English-centric, creating systematic vulnerabilities in multilingual deployment. Prior work shows that translating malicious prompts into other languages can substantially increase jailbreak success rates, exposing a structural cross-lingual security gap. We investigate whether such attacks can be mitigated through language-agnostic semantic similarity without retraining or l

Published 29 Apr 2026
Read full paper → ← Back to Reads