SIEVES: Selective Prediction Generalizes through Visual Evidence Scoring
📰 ArXiv cs.AI
arXiv:2604.25855v1 Announce Type: cross Abstract: Multimodal large language models (MLLMs) achieve ever-stronger performance on visual-language tasks. Even as traditional visual question answering benchmarks approach saturation, reliable deployment requires satisfying low error tolerances in real-world out-of-distribution (OOD) scenarios. Precisely, selective prediction aims to improve coverage, i.e. the share of inputs the system answers, while adhering to a user-defined risk level. This is typ
DeepCamp AI