What are Mixture-of-Experts Models | ft. Aritra

Hugging Face · Beginner ·📄 Research Papers Explained ·2w ago
In this clip, Aritra Roy Gosthipaty from the Hugging Face Transformers team breaks down one of the most important (and often misunderstood) architectures in modern AI: Mixture-of-Experts models. Main MOE explainer: what they are, why they became mainstream, and why the ecosystem shifted around them. Chapters: - 00:00 Why Mixture-of-Experts Models Matter - 00:14 Mixture-of-Experts Layers - 01:07 vLLM and Serving Stacks - 01:51 DeepSeek-V2 - 02:55 Mixtral 8x7B - 03:20 Switch Transformers - 04:25 Inference Providers - 05:12 Unsloth Kernels Topics covered: - Mixture-of-Experts Layers - vLLM and Serving Stacks - DeepSeek-V2 - Mixtral 8x7B - Switch Transformers - Inference Providers Sources mentioned: - Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer — https://arxiv.org/abs/1701.06538 - vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention — https://arxiv.org/abs/2309.06180 - DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model — https://arxiv.org/abs/2405.04434 - Mixtral of Experts — https://arxiv.org/abs/2401.04088 - Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity — https://arxiv.org/abs/2101.03961 - Inference Providers — https://huggingface.co/docs/inference-providers/index - Unsloth Docs — https://unsloth.ai/docs Listen to the full podcast on Spotify: https://open.spotify.com/show/2BWAr3zLa2xhUqoHlg8DAD?si=-nXiwfyyQfaowCqb58Ig-w Watch the full conversation on YouTube: https://youtu.be/O3Ul6H20pLI
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Related AI Lessons

#1 DevLog Meta-research: I Got Tired of Tab Chaos While Reading Research Papers.
Learn to manage research paper tabs efficiently and apply meta-research techniques to improve productivity
Dev.to AI
How to Set Up a Karpathy-Style Wiki for Your Research Field
Learn to set up a Karpathy-style wiki for your research field to organize and share knowledge effectively
Medium · AI
The Non-Optimality of Scientific Knowledge: Path Dependence, Lock-In, and The Local Minimum Trap
Scientific knowledge may be stuck in a local minimum, hindering optimal progress, and understanding this concept is crucial for advancing research
ArXiv cs.AI
How Archimedes Started: A Research Tool I Built for Myself
Learn how Archimedes started as a personal research tool to streamline the research process and reduce inefficiencies
Dev.to AI
Up next
Hugging Face Journal Club: Embarrassingly Simple Self-Distillation Improves Code Generation
Hugging Face
Watch →