Self Attention in Transformers | Transformers in Deep Learning
We dive deep into the concept of Self Attention in Transformers! Self attention is a key mechanism that allows models like BERT and GPT to capture long-range dependencies within text, making them powerful for NLP tasks. We’ll break down how self attention in transformers works, looking at the math of how it generates a new word representation from embeddings. Whether you're new to Transformers or looking to strengthen your understanding, this video provides a clear and accessible explanation of Self Attention in Transformers with visuals and complete mathematics.
➖➖➖➖➖➖➖➖➖➖➖➖➖➖➖
Timestamps:
…
Watch on YouTube ↗
(saves to browser)
Chapters (11)
Intro
1:13
The Problem
4:00
Self Attention Overview
6:04
Self Attention Mathematics - Part 1
19:20
Self Attention as Gravity
20:07
Problems with the equation
26:51
Self Attention Complete
31:18
Benefits of Self Attention
34:30
Recap of Self Attention
38:53
Self Attention in the form of matrix multiplication
42:39
Outro
DeepCamp AI