When Do Diffusion Models learn to Generate Multiple Objects?

📰 ArXiv cs.AI

arXiv:2605.00273v1 Announce Type: cross Abstract: Text-to-image diffusion models achieve impressive visual fidelity, yet they remain unreliable in multi-object generation. Despite extensive empirical evidence of these failures, the underlying causes remain unclear. We begin by asking how much of this limitation arises from the data itself. To disentangle data effects, we consider two regimes across different dataset sizes: (1) concept generalization, where each individual concept is observed dur

Published 5 May 2026

Read full paper → ← Back to Reads