Mixture of Experts (MoE) + Switch Transformers: Build MASSIVE LLMs with CONSTANT Complexity!

Quick Tutorials ยท Beginner ยท๐Ÿง  Large Language Models ยท2y ago
๐Ÿš€In this video, we present a quick tutorial on Switch Transformers by which you can scale up any transformer-based deep learning model such as Large Language Models (LLMs) to trillion parameters with constant complexity within a Mixture of Experts (MoE) framework during both the training and inference times! In fact, this tutorial is a visual guide for original Transformers, Self-Attention Mechanism, Multi-Head Self-Attention mechanism and Switch Transformers. ๐Ÿš€The original paper for Switch Transformers is this: W. Fedus et al, "Switch Transformers: Scaling to Trillion Parameter Models witโ€ฆ
Watch on YouTube โ†— (saves to browser)
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Next Up
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Dave Ebbelaar (LLM Eng)