Risk Awareness Injection: Calibrating Vision-Language Models for Safety without Compromising Utility
📰 ArXiv cs.AI
arXiv:2602.03402v3 Announce Type: replace Abstract: Vision language models (VLMs) extend the reasoning capabilities of large language models (LLMs) to cross-modal settings, yet remain highly vulnerable to multimodal jailbreak attacks. Existing defenses predominantly rely on safety fine-tuning or aggressive token manipulations, incurring substantial training costs or significantly degrading utility. Recent research shows that LLMs inherently recognize unsafe content in text, and the incorporation
DeepCamp AI