UI-Zoomer: Uncertainty-Driven Adaptive Zoom-In for GUI Grounding
📰 ArXiv cs.AI
arXiv:2604.14113v1 Announce Type: cross Abstract: GUI grounding, which localizes interface elements from screenshots given natural language queries, remains challenging for small icons and dense layouts. Test-time zoom-in methods improve localization by cropping and re-running inference at higher resolution, but apply cropping uniformly across all instances with fixed crop sizes, ignoring whether the model is actually uncertain on each case. We propose \textbf{UI-Zoomer}, a training-free adaptiv
DeepCamp AI