What are Mixture-of-Experts Models | ft. Aritra
In this clip, Aritra Roy Gosthipaty from the Hugging Face Transformers team breaks down one of the most important (and often misunderstood) architectures in modern AI: Mixture-of-Experts models.
Main MOE explainer: what they are, why they became mainstream, and why the ecosystem shifted around them.
Chapters:
- 00:00 Why Mixture-of-Experts Models Matter
- 00:14 Mixture-of-Experts Layers
- 01:07 vLLM and Serving Stacks
- 01:51 DeepSeek-V2
- 02:55 Mixtral 8x7B
- 03:20 Switch Transformers
- 04:25 Inference Providers
- 05:12 Unsloth Kernels
Topics covered:
- Mixture-of-Experts Layers
- vLLM and Serving Stacks
- DeepSeek-V2
- Mixtral 8x7B
- Switch Transformers
- Inference Providers
Sources mentioned:
- Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer — https://arxiv.org/abs/1701.06538
- vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention — https://arxiv.org/abs/2309.06180
- DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model — https://arxiv.org/abs/2405.04434
- Mixtral of Experts — https://arxiv.org/abs/2401.04088
- Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity — https://arxiv.org/abs/2101.03961
- Inference Providers — https://huggingface.co/docs/inference-providers/index
- Unsloth Docs — https://unsloth.ai/docs
Listen to the full podcast on Spotify: https://open.spotify.com/show/2BWAr3zLa2xhUqoHlg8DAD?si=-nXiwfyyQfaowCqb58Ig-w
Watch the full conversation on YouTube: https://youtu.be/O3Ul6H20pLI
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
More on: LLM Foundations
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
#1 DevLog Meta-research: I Got Tired of Tab Chaos While Reading Research Papers.
Dev.to AI
How to Set Up a Karpathy-Style Wiki for Your Research Field
Medium · AI
The Non-Optimality of Scientific Knowledge: Path Dependence, Lock-In, and The Local Minimum Trap
ArXiv cs.AI
How Archimedes Started: A Research Tool I Built for Myself
Dev.to AI
🎓
Tutor Explanation
DeepCamp AI