Latest PyTorch's Secret Power to Handle Sequences of 10K or 100K Length

DeepLearning Hero · Beginner ·🧠 Large Language Models ·2y ago
In this video, we'll be exploring a very cool feature in PyTorch 1.13+ that is so powerful but you may not even be harnessing the full power of; Flash attention. In this video I'll show you how conservative new Pytorch with memory and how it helps us to fit 10K or even 100K long sequences even on a modest GPU. Flash attention repo: https://github.com/HazyResearch/flash-attention Github code: https://github.com/thushv89/tutorials_deeplearninghero/blob/master/llms/flash_attention_torch.ipynb 00:00 - Introduction 00:28 - Scaled dot production in Pytorch 01:51 - Google colab environment 02:21 -…
Watch on YouTube ↗ (saves to browser)

Chapters (13)

Introduction
0:28 Scaled dot production in Pytorch
1:51 Google colab environment
2:21 Pytorch version for Flash Attention
2:51 Input data
3:01 Hyperparameters and the architecture
4:17 Few important arguments to the model
4:55 Utility functions
5:39 Torch without Flash Attention
7:06 Torch with Flash Attention
7:49 Limitations of Flash Attention
8:59 Analysing the results
10:47 Conclusion
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Next Up
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Dave Ebbelaar (LLM Eng)