On Token's Dilemma: Dynamic MoE with Drift-Aware Token Assignment for Continual Learning of Large Vision Language Models

📰 ArXiv cs.AI

Researchers propose a dynamic Mixture of Experts (MoE) approach with drift-aware token assignment to mitigate forgetting in continual learning of large vision language models

advanced Published 31 Mar 2026

Action Steps

Identify the token's dilemma in MoE architectures where expert isolation is not enough to prevent forgetting
Develop a drift-aware token assignment strategy to dynamically assign tokens to experts
Implement a dynamic MoE approach that incrementally adds new experts and expands routers while keeping existing ones frozen
Evaluate the performance of the proposed approach on multimodal continual instruction tuning tasks

Who Needs to Know This

AI engineers and researchers working on large vision language models can benefit from this approach to improve model performance and mitigate forgetting, while software engineers can apply these techniques to develop more efficient continual learning systems

Key Insight

💡 Dynamic MoE with drift-aware token assignment can effectively mitigate forgetting in continual learning of large vision language models