Piper: A Programmable Distributed Training System

📰 ArXiv cs.AI

arXiv:2606.11169v1 Announce Type: cross Abstract: Large-scale model training increasingly relies on composing multiple parallelism strategies, such as data, pipeline, and expert parallelism, together with memory-saving optimizations like ZeRO. Deployed systems for foundation model pretraining often rely on human experts to manually design a high-level parallelism strategy then implement the corresponding low-level execution strategy, making it difficult to adapt the system to new strategies. Mea

Published 10 Jun 2026

Read full paper → ← Back to Reads