Mamba and S4 Explained: Architecture, Parallel Scan, Kernel Fusion, Recurrent, Convolution, Math

Umar Jamil · Beginner ·📄 Research Papers Explained ·2y ago
Explanation of the paper Mamba: Linear-Time Sequence Modeling with Selective State Spaces In this video I will be explaining Mamba, a new sequence modeling architecture that can compete with the Transformer. I will first start by introducing the various sequence modeling architectures (RNN, CNN and Transformer) and then deep dive into State Space Models. To fully understand State Space Models, we need to have some background in differential equations. That's why, I will provide a brief introduction to differential equations (in 5 minutes!) and then proceed to derive the recurrent formula and the convolutional formula from first principles. I will also prove mathematically (with the help of visual diagrams) why State Space Models can be run as a convolution. I will explain what is the HIPPO matrix and how it can help the model "memorize" the input history in a finite state. In the second part of the video, I will explore Mamba and in particular the Selective Scan algorithm, but first explaining what is the scan operation and how it can be parallelized, and then showing how the authors further improved the algorithm with Kernel Fusion and activations recomputation. I will also provide a brief lesson on the memory hierarchy in the GPU and why some operations may be IO-bound. In the last part of the video we will explore the architecture of Mamba and some performance results to compare it with the Transformer. Slides PDF and Parallel Scan (excel file): https://github.com/hkproj/mamba-notes Chapters 00:00:00 - Introduction 00:01:46 - Sequence modeling 00:07:12 - Differential equations (basics) 00:11:38 - State Space Models 00:13:53 - Discretization 00:23:08 - Recurrent computation 00:26:32 - Convolutional computation 00:34:18 - Skip connection term 00:35:21 - Multidimentional SSM 00:37:44 - The HIPPO theory 00:43:30 - The motivation behind Mamba 00:46:56 - Selective Scan algorithm 00:51:34 - The Scan operation 00:54:24 - Parallel Scan 00:57:20 - Innovations in Selec
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Related AI Lessons

The ABCs of reading medical research and review papers these days
Learn to critically evaluate medical research papers by accepting nothing at face value, believing no one blindly, and checking everything
Medium · LLM
#1 DevLog Meta-research: I Got Tired of Tab Chaos While Reading Research Papers.
Learn to manage research paper tabs efficiently and apply meta-research techniques to improve productivity
Dev.to AI
How to Set Up a Karpathy-Style Wiki for Your Research Field
Learn to set up a Karpathy-style wiki for your research field to organize and share knowledge effectively
Medium · AI
The Non-Optimality of Scientific Knowledge: Path Dependence, Lock-In, and The Local Minimum Trap
Scientific knowledge may be stuck in a local minimum, hindering optimal progress, and understanding this concept is crucial for advancing research
ArXiv cs.AI

Chapters (15)

Introduction
1:46 Sequence modeling
7:12 Differential equations (basics)
11:38 State Space Models
13:53 Discretization
23:08 Recurrent computation
26:32 Convolutional computation
34:18 Skip connection term
35:21 Multidimentional SSM
37:44 The HIPPO theory
43:30 The motivation behind Mamba
46:56 Selective Scan algorithm
51:34 The Scan operation
54:24 Parallel Scan
57:20 Innovations in Selec
Up next
X Revealed Their Secret Algorithm on Github #algorithm #twitter #tech
Analytics Vidhya
Watch →