Decoder Architecture in Transformers | Step-by-Step from Scratch
Transformers have revolutionized deep learning, but have you ever wondered how the decoder in a transformer actually works? ๐ค In this video, we break down Decoder Architecture in Transformers step by step!
๐ก What Youโll Learn:
โ
The fundamentals of encoding-decoding in deep learning and how it's different in Transformers.
โ
The role of each layer in the decoder and how they work together.
โ
A deep dive into masked self-attention, cross-attention, and feed-forward networks in the decoder.
โ
How transformers generate meaningful sequences in tasks like language modeling, machine translation, aโฆ
Watch on YouTube โ
(saves to browser)
Chapters (14)
Intro
0:56
Encoder-Decoder model in Deep Learning
2:24
Encoder-Decoder in Transformers
5:25
Parallelizing Training in Transformers
12:57
Masked Multi-head attention
19:29
Encoder-Decoder in training of Transformers
22:01
Positional Encodings
23:08
Add & Norm Layer
24:47
Cross Attention
32:33
Feed Forward Network
33:53
Stacking of Decoder blocks
34:42
Final Prediction Layer
37:06
Decoder during inference
40:05
Outro
DeepCamp AI