Positional Encoding in Transformer | Sinusoidal Positional Encoding Explained
Key Takeaways
This video teaches sinusoidal positional encoding in Transformers to understand word order
Original Description
Transformers process tokens in parallel — so how do they understand word order?
In this video, we explore positional encodings in Transformers, starting with sinusoidal positional encodings and learnable absolute position embeddings.
We begin by explaining why Transformers need positional information, and why naive indexing or normalization approaches fail. Then, step by step, we build intuition for sinusoidal positional encodings — including the role of sine and cosine, the meaning of the 10,000 scaling factor, and how different dimensions capture local vs global positional relationships.
You’ll also see the connection between sinusoidal encodings and binary representations, and why using continuous sinusoidal waves makes it easier for attention layers to learn positional patterns. We then discuss why cosine is essential, and how it enables a linear relationship between positions, setting the foundation for relative and rotary position embeddings.
Finally, we compare fixed sinusoidal embeddings with learnable absolute position embeddings, and analyze how positional information interacts with the self-attention mechanism.
This video is Part 1 of a two-part series on positional encoding in Transformers.
In Part 2, we’ll dive into relative positional embeddings and Rotary Position Embeddings (RoPE) in detail.
⏱️ Timestamps:
00:00 Intro
00:48 Why transformers need positional information
02:08 Naive Approaches to positional encoding
03:24 Sinusoidal Positional encodings explained
06:45 Connection to Binary Encoding
10:29 Why 10000 as default in Positional encodings
14:22 Why cosine in Sinusoidal encodings
17:08 Absolute Learnable Position Embeddings
📖 Resources:
Attention is all you need paper - https://arxiv.org/abs/1706.03762
Nice Positional Encoding tutorial from Huggingface - https://huggingface.co/blog/designing-positional-encoding
🔔 Subscribe :
https://tinyurl.com/exai-channel-link
Email - explainingai.official@gmail.com
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Related AI Lessons
⚡
⚡
⚡
⚡
I Spent Weeks Looking for a Research Gap Before I Realized I Was Searching the Wrong Way
Medium · AI
ICMI 2026 Reviews [D]
Reddit r/MachineLearning
Workshop submission for main conference paper under review [D]
Reddit r/MachineLearning
Kept context-switching between arxiv, OpenReview, GitHub, and HuggingFace for every paper, so I built this. Chrome extension + website with everything inline, plus citation graph + SPECTER2 neighbors. 3M papers, free, feedback welcome [P]
Reddit r/MachineLearning
Chapters (8)
Intro
0:48
Why transformers need positional information
2:08
Naive Approaches to positional encoding
3:24
Sinusoidal Positional encodings explained
6:45
Connection to Binary Encoding
10:29
Why 10000 as default in Positional encodings
14:22
Why cosine in Sinusoidal encodings
17:08
Absolute Learnable Position Embeddings
🎓
Tutor Explanation
DeepCamp AI