REFRAG Explained!
REFRAG from Meta Superintelligence Labs is a SUPER exciting breakthrough that may spark the second summer of Vector Databases! REFRAG illustrates how Database Systems are becoming even more integral to LLM inference! By making clever use of how context vectors are integrated with LLM generation, REFRAG is able to make TTFT (Time-to-First-Token) 31X faster and TTIT (Time-to-Iterative-Token) 3X faster, overall improving LLM throughput by 7X! REFRAG is also able to process much longer input contexts than standard LLMs!
Most of the RAG systems today that are built with Vector Databases, such as Weaviate, throw away the associated vector with retrieved search results, only making use of the text content. REFRAG instead passes these vectors to the LLM, instead of the text content! This is further enhanced with a fine-grained chunk encoding strategy, and a 4-stage training algorithm that includes a selective chunk expansion policy trained with GRPO / PPO.
I hope you find the video useful! Happy to answer any questions, or discuss any ideas about REFRAG!
Chapters
0:00 REFRAG Explained!
1:58 REFRAG Architecture
5:20 Speed gains
8:50 Training Stages for REFRAG
12:15 RL for Selective Expansion
16:45 Experimental Results
21:32 Ablation Studies
24:55 Personal Takeaways
Links
REFRAG Paper Link: https://arxiv.org/abs/2509.01092
Transformers as Universal Computation Engines: https://arxiv.org/abs/2103.05247
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
More on: Vector Stores
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
Limits of RAG and implications for self-hosted AI
Medium · RAG
Best Vector Databases for RAG (Free & Paid)
Medium · RAG
Retrieval-Augmented Generation: The Architecture That Made AI Actually Useful in Production
Medium · RAG
Most RAG Systems Waste 60% of Their Retrieval Calls. Skill-RAG Fixes That.
Medium · AI
Chapters (8)
REFRAG Explained!
1:58
REFRAG Architecture
5:20
Speed gains
8:50
Training Stages for REFRAG
12:15
RL for Selective Expansion
16:45
Experimental Results
21:32
Ablation Studies
24:55
Personal Takeaways
🎓
Tutor Explanation
DeepCamp AI