Cross Attention Made Easy | Decoder Learns from Encoder
In this video, we explain Cross Attention in Transformers step by step using simple language and clear matrix shapes.
You will learn:
• Why cross attention is required in the transformer decoder
• Difference between masked self-attention and cross-attention
• How Query, Key, and Value are created
• Why Query comes from the decoder and Key and Value come from the encoder
• Matrix shapes used in cross-attention (4×3 and 3×3)
• How Q × Kᵀ works with an easy intuitive explanation
• Softmax explained with a simple numeric example
• How attention weights multiply with the Value matrix
• Why cross-a…
Watch on YouTube ↗
(saves to browser)
DeepCamp AI