Stabilizing Unsupervised Self-Evolution of MLLMs via Continuous Softened Retracing reSampling
📰 ArXiv cs.AI
arXiv:2604.03647v1 Announce Type: cross Abstract: In the unsupervised self-evolution of Multimodal Large Language Models, the quality of feedback signals during post-training is pivotal for stable and effective learning. However, existing self-evolution methods predominantly rely on majority voting to select the most frequent output as the pseudo-golden answer, which may stem from the model's intrinsic biases rather than guaranteeing the objective correctness of the reasoning paths. To counterac
DeepCamp AI