Symbolic Grounding Reveals Representational Bottlenecks in Abstract Visual Reasoning

📰 ArXiv cs.AI

arXiv:2604.21346v1 Announce Type: new Abstract: Vision--language models (VLMs) often fail on abstract visual reasoning benchmarks such as Bongard problems, raising the question of whether the main bottleneck lies in reasoning or representation. We study this on Bongard-LOGO, a synthetic benchmark of abstract concept learning with ground-truth generative programs, by comparing end-to-end VLMs on raw images with large language models (LLMs) given symbolic inputs derived from those images. Using sy

Published 25 Apr 2026

Read full paper → ← Back to Reads