Tiled Prompts: Overcoming Prompt Misguidance in Image and Video Super-Resolution

📰 ArXiv cs.AI

arXiv:2602.03342v2 Announce Type: replace-cross Abstract: Text-conditioned diffusion models have advanced image and video super-resolution by using prompts as semantic priors, and modern super-resolution pipelines typically rely on latent tiling to scale to high resolutions. In practice, a single global caption is used with the latent tiling, often causing prompt misguidance. Specifically, a coarse global prompt often misses localized details (errors of omission) and provides locally irrelevant

Published 13 Apr 2026
Read full paper → ← Back to Reads