Positional Encoding in Transformer | Sinusoidal Positional Encoding Explained

ExplainingAI · Beginner ·📄 Research Papers Explained ·6mo ago

Key Takeaways

This video teaches sinusoidal positional encoding in Transformers to understand word order

Original Description

Transformers process tokens in parallel — so how do they understand word order? In this video, we explore positional encodings in Transformers, starting with sinusoidal positional encodings and learnable absolute position embeddings. We begin by explaining why Transformers need positional information, and why naive indexing or normalization approaches fail. Then, step by step, we build intuition for sinusoidal positional encodings — including the role of sine and cosine, the meaning of the 10,000 scaling factor, and how different dimensions capture local vs global positional relationships. You’ll also see the connection between sinusoidal encodings and binary representations, and why using continuous sinusoidal waves makes it easier for attention layers to learn positional patterns. We then discuss why cosine is essential, and how it enables a linear relationship between positions, setting the foundation for relative and rotary position embeddings. Finally, we compare fixed sinusoidal embeddings with learnable absolute position embeddings, and analyze how positional information interacts with the self-attention mechanism. This video is Part 1 of a two-part series on positional encoding in Transformers. In Part 2, we’ll dive into relative positional embeddings and Rotary Position Embeddings (RoPE) in detail. ⏱️ Timestamps: 00:00 Intro 00:48 Why transformers need positional information 02:08 Naive Approaches to positional encoding 03:24 Sinusoidal Positional encodings explained 06:45 Connection to Binary Encoding 10:29 Why 10000 as default in Positional encodings 14:22 Why cosine in Sinusoidal encodings 17:08 Absolute Learnable Position Embeddings 📖 Resources: Attention is all you need paper - https://arxiv.org/abs/1706.03762 Nice Positional Encoding tutorial from Huggingface - https://huggingface.co/blog/designing-positional-encoding 🔔 Subscribe : https://tinyurl.com/exai-channel-link Email - explainingai.official@gmail.com
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Related AI Lessons

I Spent Weeks Looking for a Research Gap Before I Realized I Was Searching the Wrong Way
Learn how to effectively find research gaps by changing your approach, a crucial skill for AI researchers and academics
Medium · AI
ICMI 2026 Reviews [D]
Learn how to interpret ICMI 2026 reviews and improve your paper's acceptance chances
Reddit r/MachineLearning
Workshop submission for main conference paper under review [D]
Learn how to navigate submitting a paper to a non-archival workshop before the final decision of a main conference like ECCV
Reddit r/MachineLearning
Kept context-switching between arxiv, OpenReview, GitHub, and HuggingFace for every paper, so I built this. Chrome extension + website with everything inline, plus citation graph + SPECTER2 neighbors. 3M papers, free, feedback welcome [P]
Streamline your research with a new Chrome extension and website that integrates 3M papers from arxiv, OpenReview, GitHub, and HuggingFace, including citation graphs and SPECTER2 neighbors, and provide feedback to improve it
Reddit r/MachineLearning

Chapters (8)

Intro
0:48 Why transformers need positional information
2:08 Naive Approaches to positional encoding
3:24 Sinusoidal Positional encodings explained
6:45 Connection to Binary Encoding
10:29 Why 10000 as default in Positional encodings
14:22 Why cosine in Sinusoidal encodings
17:08 Absolute Learnable Position Embeddings
Up next
How to Open HSD Files (Husqvarna Viking Designer Embroidery)
File Extension Geeks
Watch →