Why Rotating Vectors Solves Positional Encoding in Transformers | Rotary Positional Embeddings(ROPE)
Rotary Positional Embeddings (RoPE) explained from first principles. This video covers how transformers encode relative positional information using rotation, dot products, and attention, and how RoPE works mathematically.
Unlike absolute positional encodings, Rotary Positional Embeddings allow transformers to reason about relative distance between tokens, which is crucial for long-context models and large language models.
We start by building intuition around relative positional information, then carefully derive how RoPE uses rotations to inject relative position into attention scores. F…
Watch on YouTube ↗
(saves to browser)
Chapters (6)
In this video
0:40
What and Why of Relative Positional Information
4:29
2D Rotation Review
6:40
Rotary Position Embeddings(ROPE) Explained
11:00
ROPE beyond 2D
13:29
Why & How Rotary Positional Encodings Work
DeepCamp AI