Cross-Lingual Jailbreak Detection via Semantic Codebooks
📰 ArXiv cs.AI
arXiv:2604.25716v1 Announce Type: cross Abstract: Safety mechanisms for large language models (LLMs) remain predominantly English-centric, creating systematic vulnerabilities in multilingual deployment. Prior work shows that translating malicious prompts into other languages can substantially increase jailbreak success rates, exposing a structural cross-lingual security gap. We investigate whether such attacks can be mitigated through language-agnostic semantic similarity without retraining or l
DeepCamp AI