TorchUMM: A Unified Multimodal Model Codebase for Evaluation, Analysis, and Post-training

📰 ArXiv cs.AI

arXiv:2604.10784v1 Announce Type: new Abstract: Recent advances in unified multimodal models (UMMs) have led to a proliferation of architectures capable of understanding, generating, and editing across visual and textual modalities. However, developing a unified framework for UMMs remains challenging due to the diversity of model architectures and the heterogeneity of training paradigms and implementation details. In this paper, we present TorchUMM, the first unified codebase for comprehensive e

Published 14 Apr 2026
Read full paper → ← Back to Reads