Intro to Mixture of Experts | Aritra Roy Gosthipaty | HF Podcast #2

Hugging Face · Beginner ·📄 Research Papers Explained ·2w ago
In this episode, Alejandro sits down with Aritra Roy Gosthipaty from the Hugging Face Transformers team to talk about mixture-of-experts models, why dense models still matter, how synthetic data changes training, and what coding agents are changing for working engineers. They discuss Mixtral, DeepSeek-V2, Switch Transformers, vLLM, Inference Providers, TinyAya, data curation, local inference limits, and how practitioners should think about coding with agents without losing core engineering skill. More from Aritra: - Hugging Face profile — https://huggingface.co/ariG23498 Chapters 00:00 Meet Aritra Roy Gosthipaty 00:45 How Aritra joined Hugging Face 03:05 What a Developer Advocate on Transformers works on 04:00 What mixture-of-experts models are 08:26 Why MOEs matter now 11:36 Where dense models still win 15:00 Where to start learning and training MOEs 18:44 Synthetic data, data engines, and data quality 22:18 Why MOEs are still hard to run locally 23:11 How coding tools changed engineering work 25:49 Do agents weaken creativity and skill? 28:29 Should beginners rely on coding agents? 33:21 What coding will look like in a year 35:25 The biggest recent AI wow moments If you enjoyed the episode, subscribe for more conversations about open models, infrastructure, and the future of AI. --- Sources / References • Aritra Roy Gosthipaty — https://huggingface.co/ariG23498 • Mixture of Experts Explained — https://huggingface.co/blog/moe • Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer — https://arxiv.org/abs/1701.06538 • vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention — https://arxiv.org/abs/2309.06180 • DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model — https://arxiv.org/abs/2405.04434 • Mixtral of Experts — https://arxiv.org/abs/2401.04088 • Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity — https://arxiv.org/abs/2101.03961 • Hugging Face Inferenc
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Related AI Lessons

#1 DevLog Meta-research: I Got Tired of Tab Chaos While Reading Research Papers.
Learn to manage research paper tabs efficiently and apply meta-research techniques to improve productivity
Dev.to AI
How to Set Up a Karpathy-Style Wiki for Your Research Field
Learn to set up a Karpathy-style wiki for your research field to organize and share knowledge effectively
Medium · AI
The Non-Optimality of Scientific Knowledge: Path Dependence, Lock-In, and The Local Minimum Trap
Scientific knowledge may be stuck in a local minimum, hindering optimal progress, and understanding this concept is crucial for advancing research
ArXiv cs.AI
How Archimedes Started: A Research Tool I Built for Myself
Learn how Archimedes started as a personal research tool to streamline the research process and reduce inefficiencies
Dev.to AI

Chapters (14)

Meet Aritra Roy Gosthipaty
0:45 How Aritra joined Hugging Face
3:05 What a Developer Advocate on Transformers works on
4:00 What mixture-of-experts models are
8:26 Why MOEs matter now
11:36 Where dense models still win
15:00 Where to start learning and training MOEs
18:44 Synthetic data, data engines, and data quality
22:18 Why MOEs are still hard to run locally
23:11 How coding tools changed engineering work
25:49 Do agents weaken creativity and skill?
28:29 Should beginners rely on coding agents?
33:21 What coding will look like in a year
35:25 The biggest recent AI wow moments
Up next
Hugging Face Journal Club: Embarrassingly Simple Self-Distillation Improves Code Generation
Hugging Face
Watch →