Unveiling Language Routing Isolation in Multilingual MoE Models for Interpretable Subnetwork Adaptation
📰 ArXiv cs.AI
arXiv:2604.03592v1 Announce Type: cross Abstract: Mixture-of-Experts (MoE) models exhibit striking performance disparities across languages, yet the internal mechanisms driving these gaps remain poorly understood. In this work, we conduct a systematic analysis of expert routing patterns in MoE models, revealing a phenomenon we term Language Routing Isolation, in which high- and low-resource languages tend to activate largely disjoint expert sets. Through layer-stratified analysis, we further sho
DeepCamp AI