Positional Encoding in Transformer | Sinusoidal Positional Encoding Explained
Transformers process tokens in parallel — so how do they understand word order?
In this video, we explore positional encodings in Transformers, starting with sinusoidal positional encodings and learnable absolute position embeddings.
We begin by explaining why Transformers need positional information, and why naive indexing or normalization approaches fail. Then, step by step, we build intuition for sinusoidal positional encodings — including the role of sine and cosine, the meaning of the 10,000 scaling factor, and how different dimensions capture local vs global positional relationships.
…
Watch on YouTube ↗
(saves to browser)
Chapters (8)
Intro
0:48
Why transformers need positional information
2:08
Naive Approaches to positional encoding
3:24
Sinusoidal Positional encodings explained
6:45
Connection to Binary Encoding
10:29
Why 10000 as default in Positional encodings
14:22
Why cosine in Sinusoidal encodings
17:08
Absolute Learnable Position Embeddings
DeepCamp AI