Seeing is Believing: Robust Vision-Guided Cross-Modal Prompt Learning under Label Noise

📰 ArXiv cs.AI

arXiv:2604.09532v1 Announce Type: cross Abstract: Prompt learning is a parameter-efficient approach for vision-language models, yet its robustness under label noise is less investigated. Visual content contains richer and more reliable semantic information, which remains more robust under label noise. However, the prompt itself is highly susceptible to label noise. Motivated by this intuition, we propose VisPrompt, a lightweight and robust vision-guided prompt learning framework for noisy-label

Published 13 Apr 2026

Read full paper → ← Back to Reads