Enhancing the Safety of Medical Vision-Language Models by Synthetic Demonstrations

📰 ArXiv cs.AI

arXiv:2506.09067v2 Announce Type: replace-cross Abstract: Generative medical vision-language models~(Med-VLMs) are primarily designed to generate complex textual information~(e.g., diagnostic reports) from multimodal inputs including vision modality~(e.g., medical images) and language modality~(e.g., clinical queries). However, their security vulnerabilities remain underexplored. Med-VLMs should be capable of rejecting harmful queries, such as \textit{Provide detailed instructions for using this

Published 13 Apr 2026

Read full paper → ← Back to Reads