Stanford CS25: Transformers United V6 I The Ultra-Scale Talk: Scaling Training to Thousands of GPUs

Stanford Online · Advanced ·🧠 Large Language Models ·2d ago
For more information about Stanford’s graduate programs, visit: https://online.stanford.edu/graduate-education April 23, 2026 This seminar covers: • The practicalities of ultra-scale training • How 5D parallelism makes it possible to stretch a single run across massive GPU clusters • How Mixture-of-Experts architectures introduce new scaling dimensions and stability challenges, and the performance tuning and communication patterns that drive throughput Follow along with the seminar schedule. Visit: https://web.stanford.edu/class/cs25/ Guest Speaker: Nouamane Tazi (Hugging Face) Instructors: • Steven Feng, Stanford Computer Science PhD student and NSERC PGS-D scholar • Karan P. Singh, Electrical Engineering PhD student and NSF Graduate Research Fellow in the Stanford Translational AI Lab • Michael C. Frank, Benjamin Scott Crocker Professor of Human Biology Director, Symbolic Systems Program • Christopher Manning, Thomas M. Siebel Professor in Machine Learning, Professor of Linguistics and of Computer Science, Co-Founder and Senior Fellow of the Stanford Institute for Human-Centered Artificial Intelligence (HAI)
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Related AI Lessons

I Asked AI to Teach Algebra. The First Result Was Slop. Here’s How We Fixed It.
Learn how to improve AI-generated educational content by refining prompts and fine-tuning models, as demonstrated by a project to create an AI-generated algebra course
Medium · Machine Learning
AI Is Like a Super Smart Toy Box — But It Still Needs You
Discover how AI can augment human capabilities, but still requires human input and oversight to function effectively
Medium · AI
AI Is Like a Super Smart Toy Box — But It Still Needs You
AI is a powerful tool that still requires human input and oversight to function effectively
Medium · Machine Learning
OpenAI Prompt Caching in 2026: When You'll Save 75% (And When You Won't)
Learn how OpenAI prompt caching can save you 75% of costs in 2026 and when it's not applicable
Dev.to · Leolionel221
Up next
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Dave Ebbelaar (LLM Eng)
Watch →