Persistent Visual Memory: Sustaining Perception for Deep Generation in LVLMs

📰 ArXiv cs.AI

arXiv:2605.00814v1 Announce Type: cross Abstract: While autoregressive Large Vision-Language Models (LVLMs) demonstrate remarkable proficiency in multimodal tasks, they face a "Visual Signal Dilution" phenomenon, where the accumulation of textual history expands the attention partition function, causing visual attention to decay inversely with generated sequence length. To counteract this, we propose Persistent Visual Memory (PVM), a lightweight learnable module designed to ensure sustained, on-

Published 5 May 2026
Read full paper → ← Back to Reads