Mamba and S4 Explained: Architecture, Parallel Scan, Kernel Fusion, Recurrent, Convolution, Math

Umar Jamil · Beginner ·📄 Research Papers Explained ·2y ago
Explanation of the paper Mamba: Linear-Time Sequence Modeling with Selective State Spaces In this video I will be explaining Mamba, a new sequence modeling architecture that can compete with the Transformer. I will first start by introducing the various sequence modeling architectures (RNN, CNN and Transformer) and then deep dive into State Space Models. To fully understand State Space Models, we need to have some background in differential equations. That's why, I will provide a brief introduction to differential equations (in 5 minutes!) and then proceed to derive the recurrent formula and …
Watch on YouTube ↗ (saves to browser)

Chapters (15)

Introduction
1:46 Sequence modeling
7:12 Differential equations (basics)
11:38 State Space Models
13:53 Discretization
23:08 Recurrent computation
26:32 Convolutional computation
34:18 Skip connection term
35:21 Multidimentional SSM
37:44 The HIPPO theory
43:30 The motivation behind Mamba
46:56 Selective Scan algorithm
51:34 The Scan operation
54:24 Parallel Scan
57:20 Innovations in Selec
The Secret Spy Tech Inside Every Credit Card
Next Up
The Secret Spy Tech Inside Every Credit Card
Veritasium