Self Attention in Transformers | Transformers in Deep Learning

Learn With Jay · Beginner ·🧠 Large Language Models ·1y ago
We dive deep into the concept of Self Attention in Transformers! Self attention is a key mechanism that allows models like BERT and GPT to capture long-range dependencies within text, making them powerful for NLP tasks. We’ll break down how self attention in transformers works, looking at the math of how it generates a new word representation from embeddings. Whether you're new to Transformers or looking to strengthen your understanding, this video provides a clear and accessible explanation of Self Attention in Transformers with visuals and complete mathematics. ➖➖➖➖➖➖➖➖➖➖➖➖➖➖➖ Timestamps: …
Watch on YouTube ↗ (saves to browser)

Chapters (11)

Intro
1:13 The Problem
4:00 Self Attention Overview
6:04 Self Attention Mathematics - Part 1
19:20 Self Attention as Gravity
20:07 Problems with the equation
26:51 Self Attention Complete
31:18 Benefits of Self Attention
34:30 Recap of Self Attention
38:53 Self Attention in the form of matrix multiplication
42:39 Outro
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Next Up
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Dave Ebbelaar (LLM Eng)