Mixture of Experts (MoE) + Switch Transformers: Build MASSIVE LLMs with CONSTANT Complexity!
๐In this video, we present a quick tutorial on Switch Transformers by which you can scale up any transformer-based deep learning model such as Large Language Models (LLMs) to trillion parameters with constant complexity within a Mixture of Experts (MoE) framework during both the training and inference times! In fact, this tutorial is a visual guide for original Transformers, Self-Attention Mechanism, Multi-Head Self-Attention mechanism and Switch Transformers.
๐The original paper for Switch Transformers is this:
W. Fedus et al, "Switch Transformers: Scaling to Trillion Parameter Models witโฆ
Watch on YouTube โ
(saves to browser)
DeepCamp AI