Lightning Talk: Graph Based Pipeline Parallelism - Sanket Purandare, Meta & Simon Fan, Meta PyTorch

PyTorch · Intermediate ·📄 Research Papers Explained ·3w ago
Lightning Talk: Graph Based Pipeline Parallelism - Sanket Purandare, Meta & Simon Fan, Meta PyTorch Pipeline parallelism is vital for large models, but advanced schedules for SOTA LLMs are difficult to express in current PyTorch. MoE communication dominates the critical path, making latency hiding essential. Leading systems use fw-bw overlapping; fw-fw and bw-bw overlapping further boost throughput. Schedules like ZeroBubbleV and DualPipeV rely on dI-dW backward splitting for fine-grained overlap. However, eager-mode implementations require a patchwork of fragile integrations (multi-threading, custom autograd functions, activation checkpointing, etc.) that rely on implicit behavior and hand-written logic with poor torch.compile compatibility and upstream composability. We present Graph-Based PP: stages are compiled to reusable FX graphs executed via an explicit schedule language. Users write standard PyTorch code while specifying schedules at varying granularity; all manipulations run as graph passes, abstracting complexity away from user code and into the compiler/runtime, allowing for greater composability. We have integrated Graph-PP into TorchTitan and AutoParallel on real MoE workloads, targeting upstream inclusion in torch.distributed.
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Related AI Lessons

The ABCs of reading medical research and review papers these days
Learn to critically evaluate medical research papers by accepting nothing at face value, believing no one blindly, and checking everything
Medium · LLM
#1 DevLog Meta-research: I Got Tired of Tab Chaos While Reading Research Papers.
Learn to manage research paper tabs efficiently and apply meta-research techniques to improve productivity
Dev.to AI
How to Set Up a Karpathy-Style Wiki for Your Research Field
Learn to set up a Karpathy-style wiki for your research field to organize and share knowledge effectively
Medium · AI
The Non-Optimality of Scientific Knowledge: Path Dependence, Lock-In, and The Local Minimum Trap
Scientific knowledge may be stuck in a local minimum, hindering optimal progress, and understanding this concept is crucial for advancing research
ArXiv cs.AI
Up next
Microsoft Research Forum | Season 2, Episode 4
Microsoft Research
Watch →