RoPE (Rotary positional embeddings) explained: The positional workhorse of modern LLMs
Unlike sinusoidal embeddings, RoPE are well behaved and more resilient to predictions exceeding the training sequence length.
Watch on YouTube ↗
(saves to browser)
DeepCamp AI