What are Mixture-of-Experts Models | ft. Aritra

Name: What are Mixture-of-Experts Models | ft. Aritra
Uploaded: 2026-04-15T06:07:01Z
Channel: Hugging Face
Description: In this clip, Aritra Roy Gosthipaty from the Hugging Face Transformers team breaks down one of the most important (and often misunderstood) architecture...

Hugging Face · Beginner ·📄 Research Papers Explained ·2w ago

Skills: LLM Foundations70%Reading ML Papers60%

In this clip, Aritra Roy Gosthipaty from the Hugging Face Transformers team breaks down one of the most important (and often misunderstood) architectures in modern AI: Mixture-of-Experts models. Main MOE explainer: what they are, why they became mainstream, and why the ecosystem shifted around them. Chapters: - 00:00 Why Mixture-of-Experts Models Matter - 00:14 Mixture-of-Experts Layers - 01:07 vLLM and Serving Stacks - 01:51 DeepSeek-V2 - 02:55 Mixtral 8x7B - 03:20 Switch Transformers - 04:25 Inference Providers - 05:12 Unsloth Kernels Topics covered: - Mixture-of-Experts Layers - vLLM and Serving Stacks - DeepSeek-V2 - Mixtral 8x7B - Switch Transformers - Inference Providers Sources mentioned: - Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer — https://arxiv.org/abs/1701.06538 - vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention — https://arxiv.org/abs/2309.06180 - DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model — https://arxiv.org/abs/2405.04434 - Mixtral of Experts — https://arxiv.org/abs/2401.04088 - Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity — https://arxiv.org/abs/2101.03961 - Inference Providers — https://huggingface.co/docs/inference-providers/index - Unsloth Docs — https://unsloth.ai/docs Listen to the full podcast on Spotify: https://open.spotify.com/show/2BWAr3zLa2xhUqoHlg8DAD?si=-nXiwfyyQfaowCqb58Ig-w Watch the full conversation on YouTube: https://youtu.be/O3Ul6H20pLI

Watch on YouTube ↗ (saves to browser)