Edge Reliability Gap in Vision-Language Models: Quantifying Failure Modes of Compressed VLMs Under Visual Corruption

📰 ArXiv cs.AI

Study quantifies failure modes of compressed vision-language models under visual corruption, revealing differences in error patterns between compact and large models

advanced Published 31 Mar 2026

Action Steps

Identify the error taxonomy for vision-language models, including Object Blindness, Semantic Drift, and Prior Bias
Compare the performance of compact models (e.g., Qwen2.5-VL-7B) with larger models (e.g., SmolVLM2-500M) under visual corruption
Analyze the failure modes of compressed models using a dataset of 4,000 samples from VQAv2 and COCO Captions
Develop strategies to mitigate the edge reliability gap in vision-language models, such as data augmentation and robustness techniques

Who Needs to Know This

AI engineers and researchers working on vision-language models can benefit from this study to improve the reliability of their models, especially when deploying them on edge devices

Key Insight

💡 Compact vision-language models exhibit distinct error patterns compared to larger models, highlighting the need for tailored robustness techniques

Key Takeaways

Study quantifies failure modes of compressed vision-language models under visual corruption, revealing differences in error patterns between compact and large models

Full Article

Title: Edge Reliability Gap in Vision-Language Models: Quantifying Failure Modes of Compressed VLMs Under Visual Corruption

Abstract:
arXiv:2603.26769v1 Announce Type: cross Abstract: The rapid compression of large vision-language models (VLMs) for edge deployment raises an underexplored question: do compact models fail differently, not merely more often? This study compares a 7-billion-parameter quantised VLM (Qwen2.5-VL-7B, 4-bit NF4) against a 500-million-parameter FP16 model (SmolVLM2-500M) across 4,000 samples from VQAv2 and COCO Captions. A three-category error taxonomy (Object Blindness, Semantic Drift, Prior Bias) is a

Read full paper → ← Back to Reads

Edge Reliability Gap in Vision-Language Models: Quantifying Failure Modes of Compressed VLMs Under Visual Corruption

Key Takeaways

Full Article

Related Videos