Compositional Image Synthesis with Inference-Time Scaling

📰 ArXiv cs.AI

arXiv:2510.24133v2 Announce Type: replace-cross Abstract: Despite their impressive realism, modern text-to-image models still struggle with compositionality, often failing to render accurate object counts, attributes, and spatial relations. To address this challenge, we present a training-free framework that combines an object-centric approach with self-refinement to improve layout faithfulness while preserving aesthetic quality. Specifically, we leverage large language models (LLMs) to synthesi

Published 30 Mar 2026
Read full paper → ← Back to News