Risk Awareness Injection: Calibrating Vision-Language Models for Safety without Compromising Utility

📰 ArXiv cs.AI

arXiv:2602.03402v3 Announce Type: replace Abstract: Vision language models (VLMs) extend the reasoning capabilities of large language models (LLMs) to cross-modal settings, yet remain highly vulnerable to multimodal jailbreak attacks. Existing defenses predominantly rely on safety fine-tuning or aggressive token manipulations, incurring substantial training costs or significantly degrading utility. Recent research shows that LLMs inherently recognize unsafe content in text, and the incorporation

Published 14 Apr 2026

Read full paper → ← Back to Reads