Inside the Matrix: How does matrix multiplication work inside GPUs?
Key Takeaways
This video teaches matrix multiplication inside GPUs, the core computation powering deep neural networks and large language models
Original Description
In this video, we dive into the mechanics of a GPU and learn how they perform matrix multiplication; the core computation powering deep neural networks and large language models. By the end of the video you'll learn, an efficient formulation of matrix multiplication, computing matrix multiplication with tiling and kernel fusion.
GEMM basics: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html
CUDA linear algebra: https://developer.nvidia.com/blog/cutlass-linear-algebra-cuda/
A100 specifications: https://developer.nvidia.com/blog/nvidia-ampere-architecture-in-depth/
00:00 - Introduction
02:40 - GEMM basics
03:24 - Naive implementation of matmul
04:19 - GPU memory hierarchy
05:34 - Memory thrashing of GPUs
06:00 - Memory efficient implementation of matmul
06:33 - Matmul with tiling
08:17 - GPU execution hierarchy
09:25 - Magic of power of 2
10:15 - Tile quantization
11:14 - Kernel fusion
12:24 - Conclusion
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Related Reads
📰
📰
📰
📰
The MMM Data Model -- A Normative Specification for Knowledge Interoperability in a Decentralisable Knowledge Commons
ArXiv cs.AI
Constructing Epistemic AI Literacy: Detecting Epistemic Aims and Processes in Student-AI Co-Programming
ArXiv cs.AI
From Signals to Structure: How Memory Architecture Drives Language Emergence in LLM Agents
ArXiv cs.AI
Seed2.0 Model Card: Towards Intelligence Frontier for Real-World Complexity
ArXiv cs.AI
Chapters (12)
Introduction
2:40
GEMM basics
3:24
Naive implementation of matmul
4:19
GPU memory hierarchy
5:34
Memory thrashing of GPUs
6:00
Memory efficient implementation of matmul
6:33
Matmul with tiling
8:17
GPU execution hierarchy
9:25
Magic of power of 2
10:15
Tile quantization
11:14
Kernel fusion
12:24
Conclusion
🎓
Tutor Explanation
DeepCamp AI