VLA Foundry: A Unified Framework for Training Vision-Language-Action Models

📰 ArXiv cs.AI

arXiv:2604.19728v1 Announce Type: cross Abstract: We present VLA Foundry, an open-source framework that unifies LLM, VLM, and VLA training in a single codebase. Most open-source VLA efforts specialize on the action training stage, often stitching together incompatible pretraining pipelines. VLA Foundry instead provides a shared training stack with end-to-end control, from language pretraining to action-expert fine-tuning. VLA Foundry supports both from-scratch training and pretrained backbones f

Published 22 Apr 2026

Read full paper → ← Back to Reads