Mixture of Experts (MoE) + Switch Transformers: Build MASSIVE LLMs with CONSTANT Complexity!

Name: Mixture of Experts (MoE) + Switch Transformers: Build MASSIVE LLMs with CONSTANT Complexity!
Uploaded: 2024-01-13T02:34:30+00:00
Channel: Quick Tutorials
Description: 🚀In this video, we present a quick tutorial on Switch Transformers by which you can scale up any transformer-based deep learning model such as Large La...

Quick Tutorials · Beginner ·🧠 Large Language Models ·2y ago

🚀In this video, we present a quick tutorial on Switch Transformers by which you can scale up any transformer-based deep learning model such as Large Language Models (LLMs) to trillion parameters with constant complexity within a Mixture of Experts (MoE) framework during both the training and inference times! In fact, this tutorial is a visual guide for original Transformers, Self-Attention Mechanism, Multi-Head Self-Attention mechanism and Switch Transformers. 🚀The original paper for Switch Transformers is this: W. Fedus et al, "Switch Transformers: Scaling to Trillion Parameter Models wit…

Watch on YouTube ↗ (saves to browser)

Next Up

5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems

Dave Ebbelaar (LLM Eng)