Intro to Mixture of Experts | Aritra Roy Gosthipaty | HF Podcast #2

Hugging Face · Beginner ·📄 Research Papers Explained ·2w ago

Skills: Reading ML Papers80%

In this episode, Alejandro sits down with Aritra Roy Gosthipaty from the Hugging Face Transformers team to talk about mixture-of-experts models, why dense models still matter, how synthetic data changes training, and what coding agents are changing for working engineers. They discuss Mixtral, DeepSeek-V2, Switch Transformers, vLLM, Inference Providers, TinyAya, data curation, local inference limits, and how practitioners should think about coding with agents without losing core engineering skill. More from Aritra: - Hugging Face profile — https://huggingface.co/ariG23498 Chapters 00:00 Meet Aritra Roy Gosthipaty 00:45 How Aritra joined Hugging Face 03:05 What a Developer Advocate on Transformers works on 04:00 What mixture-of-experts models are 08:26 Why MOEs matter now 11:36 Where dense models still win 15:00 Where to start learning and training MOEs 18:44 Synthetic data, data engines, and data quality 22:18 Why MOEs are still hard to run locally 23:11 How coding tools changed engineering work 25:49 Do agents weaken creativity and skill? 28:29 Should beginners rely on coding agents? 33:21 What coding will look like in a year 35:25 The biggest recent AI wow moments If you enjoyed the episode, subscribe for more conversations about open models, infrastructure, and the future of AI. --- Sources / References • Aritra Roy Gosthipaty — https://huggingface.co/ariG23498 • Mixture of Experts Explained — https://huggingface.co/blog/moe • Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer — https://arxiv.org/abs/1701.06538 • vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention — https://arxiv.org/abs/2309.06180 • DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model — https://arxiv.org/abs/2405.04434 • Mixtral of Experts — https://arxiv.org/abs/2401.04088 • Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity — https://arxiv.org/abs/2101.03961 • Hugging Face Inferenc

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

More on: Reading ML Papers

View skill →

Automatic Literature Review with GPT-3 - I embedded and indexed all of arXiv into a search engine!

Automatic Literature Review with GPT-3 - I embedded and indexed all of arXiv into a search engine!

Marcos Lopez Caniego - ESASky's JupyterLab widget| JupyterCon 2020

Marcos Lopez Caniego - ESASky's JupyterLab widget| JupyterCon 2020

Obsidian Zotero Integration Plugin | Streamline Your Research Paper Workflow 📝️

Obsidian Zotero Integration Plugin | Streamline Your Research Paper Workflow 📝️

This FULLY FREE Research Agent can BUILD Reports in Minutes!!!

This FULLY FREE Research Agent can BUILD Reports in Minutes!!!

Claude 3.7 Sonnet API | Build a Research Assistant

Claude 3.7 Sonnet API | Build a Research Assistant

I Built An Obsidian AI Research Assistant with Oz...

I Built An Obsidian AI Research Assistant with Oz...

Related AI Lessons

#1 DevLog Meta-research: I Got Tired of Tab Chaos While Reading Research Papers.

Learn to manage research paper tabs efficiently and apply meta-research techniques to improve productivity

How to Set Up a Karpathy-Style Wiki for Your Research Field

Learn to set up a Karpathy-style wiki for your research field to organize and share knowledge effectively

The Non-Optimality of Scientific Knowledge: Path Dependence, Lock-In, and The Local Minimum Trap

Scientific knowledge may be stuck in a local minimum, hindering optimal progress, and understanding this concept is crucial for advancing research

How Archimedes Started: A Research Tool I Built for Myself

Learn how Archimedes started as a personal research tool to streamline the research process and reduce inefficiencies

Chapters (14)

Meet Aritra Roy Gosthipaty

0:45 How Aritra joined Hugging Face

3:05 What a Developer Advocate on Transformers works on

4:00 What mixture-of-experts models are

8:26 Why MOEs matter now

11:36 Where dense models still win

15:00 Where to start learning and training MOEs

18:44 Synthetic data, data engines, and data quality

22:18 Why MOEs are still hard to run locally

23:11 How coding tools changed engineering work

25:49 Do agents weaken creativity and skill?

28:29 Should beginners rely on coding agents?

33:21 What coding will look like in a year

35:25 The biggest recent AI wow moments

Hugging Face Journal Club: Embarrassingly Simple Self-Distillation Improves Code Generation