Edge Reliability Gap in Vision-Language Models: Quantifying Failure Modes of Compressed VLMs Under Visual Corruption

📰 ArXiv cs.AI

Study quantifies failure modes of compressed vision-language models under visual corruption, revealing differences in error patterns between compact and large models

advanced Published 31 Mar 2026

Action Steps

Identify the error taxonomy for vision-language models, including Object Blindness, Semantic Drift, and Prior Bias
Compare the performance of compact models (e.g., Qwen2.5-VL-7B) with larger models (e.g., SmolVLM2-500M) under visual corruption
Analyze the failure modes of compressed models using a dataset of 4,000 samples from VQAv2 and COCO Captions
Develop strategies to mitigate the edge reliability gap in vision-language models, such as data augmentation and robustness techniques

Who Needs to Know This

AI engineers and researchers working on vision-language models can benefit from this study to improve the reliability of their models, especially when deploying them on edge devices

Key Insight

💡 Compact vision-language models exhibit distinct error patterns compared to larger models, highlighting the need for tailored robustness techniques