Inside the Matrix: How does matrix multiplication work inside GPUs?

DeepLearning Hero · Beginner ·🧠 Large Language Models ·2y ago
In this video, we dive into the mechanics of a GPU and learn how they perform matrix multiplication; the core computation powering deep neural networks and large language models. By the end of the video you'll learn, an efficient formulation of matrix multiplication, computing matrix multiplication with tiling and kernel fusion. GEMM basics: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html CUDA linear algebra: https://developer.nvidia.com/blog/cutlass-linear-algebra-cuda/ A100 specifications: https://developer.nvidia.com/blog/nvidia-ampere-archi…
Watch on YouTube ↗ (saves to browser)

Chapters (12)

Introduction
2:40 GEMM basics
3:24 Naive implementation of matmul
4:19 GPU memory hierarchy
5:34 Memory thrashing of GPUs
6:00 Memory efficient implementation of matmul
6:33 Matmul with tiling
8:17 GPU execution hierarchy
9:25 Magic of power of 2
10:15 Tile quantization
11:14 Kernel fusion
12:24 Conclusion
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Next Up
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Dave Ebbelaar (LLM Eng)