Intro to Mixture of Experts | Aritra Roy Gosthipaty | HF Podcast #2
Skills:
Reading ML Papers80%
In this episode, Alejandro sits down with Aritra Roy Gosthipaty from the Hugging Face Transformers team to talk about mixture-of-experts models, why dense models still matter, how synthetic data changes training, and what coding agents are changing for working engineers.
They discuss Mixtral, DeepSeek-V2, Switch Transformers, vLLM, Inference Providers, TinyAya, data curation, local inference limits, and how practitioners should think about coding with agents without losing core engineering skill.
More from Aritra:
- Hugging Face profile — https://huggingface.co/ariG23498
Chapters
00:00 Meet Aritra Roy Gosthipaty
00:45 How Aritra joined Hugging Face
03:05 What a Developer Advocate on Transformers works on
04:00 What mixture-of-experts models are
08:26 Why MOEs matter now
11:36 Where dense models still win
15:00 Where to start learning and training MOEs
18:44 Synthetic data, data engines, and data quality
22:18 Why MOEs are still hard to run locally
23:11 How coding tools changed engineering work
25:49 Do agents weaken creativity and skill?
28:29 Should beginners rely on coding agents?
33:21 What coding will look like in a year
35:25 The biggest recent AI wow moments
If you enjoyed the episode, subscribe for more conversations about open models, infrastructure, and the future of AI.
---
Sources / References
• Aritra Roy Gosthipaty — https://huggingface.co/ariG23498
• Mixture of Experts Explained — https://huggingface.co/blog/moe
• Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer — https://arxiv.org/abs/1701.06538
• vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention — https://arxiv.org/abs/2309.06180
• DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model — https://arxiv.org/abs/2405.04434
• Mixtral of Experts — https://arxiv.org/abs/2401.04088
• Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity — https://arxiv.org/abs/2101.03961
• Hugging Face Inferenc
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
More on: Reading ML Papers
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
#1 DevLog Meta-research: I Got Tired of Tab Chaos While Reading Research Papers.
Dev.to AI
How to Set Up a Karpathy-Style Wiki for Your Research Field
Medium · AI
The Non-Optimality of Scientific Knowledge: Path Dependence, Lock-In, and The Local Minimum Trap
ArXiv cs.AI
How Archimedes Started: A Research Tool I Built for Myself
Dev.to AI
Chapters (14)
Meet Aritra Roy Gosthipaty
0:45
How Aritra joined Hugging Face
3:05
What a Developer Advocate on Transformers works on
4:00
What mixture-of-experts models are
8:26
Why MOEs matter now
11:36
Where dense models still win
15:00
Where to start learning and training MOEs
18:44
Synthetic data, data engines, and data quality
22:18
Why MOEs are still hard to run locally
23:11
How coding tools changed engineering work
25:49
Do agents weaken creativity and skill?
28:29
Should beginners rely on coding agents?
33:21
What coding will look like in a year
35:25
The biggest recent AI wow moments
🎓
Tutor Explanation
DeepCamp AI