Mitigating Coordinate Prediction Bias from Positional Encoding Failures

📰 ArXiv cs.AI

arXiv:2510.22102v2 Announce Type: replace-cross Abstract: While Multimodal Large Language Models (MLLMs) excel at general vision-language tasks, precise coordinate prediction remains a significant challenge, particularly as high-resolution inputs cause visual positional encodings (VPEs) to degrade. We demonstrate that these encoding failures do not result in random noise but instead trigger predictable, directional biases, suggesting that models default to internal spatial priors when grounding

Published 29 Apr 2026
Read full paper → ← Back to Reads