MAny: Merge Anything for Multimodal Continual Instruction Tuning

📰 ArXiv cs.AI

arXiv:2604.14016v1 Announce Type: cross Abstract: Multimodal Continual Instruction Tuning (MCIT) is essential for sequential task adaptation of Multimodal Large Language Models (MLLMs) but is severely restricted by catastrophic forgetting. While existing literature focuses on the reasoning language backbone, in this work, we expose a critical yet neglected dual-forgetting phenomenon across both perception drift in Cross-modal Projection Space and reasoning collapse in Low-rank Parameter Space. T

Published 16 Apr 2026

Read full paper → ← Back to Reads