Transformer Architecture: Attention is All you Need Paper Explained
The Transformer is a neural network architecture based on an attention mechanism. Transformer architecture was introduced in Attention is All You Need Paper, an excellent paper you might know. Transformers were initially introduced for machine translation, but as of day, they are applied in other areas of AI, from computer vision, multimodal learning, robotics, reinforcement learning, etc...This is a comprehensive video that dives deep into Transformer architecture and attention mechanism and other related topics such as Large Language Models(covering BERT and GPT-3), efficient Transformers an…
Watch on YouTube ↗
(saves to browser)
Chapters (14)
Introduction
1:32
Neural networks before transformers(RNNs, LSTMs, CNNs)
6:19
Transformer architecture
13:03
Attention
25:00
Other elements of Transformer architecture
31:37
Visualizing attention
34:59
Large language models(LLMs)
44:04
BERT
49:41
GPT-3
57:39
Implementations of Transformers
1:04:21
Current state of Transformers
1:05:49
Efficient Transformers
1:09:20
Amusing Transformer tweet by Karpathy
1:12:20
Summary
DeepCamp AI