REFRAG Explained!

Weaviate vector database · Advanced ·🔍 RAG & Vector Search ·7mo ago
REFRAG from Meta Superintelligence Labs is a SUPER exciting breakthrough that may spark the second summer of Vector Databases! REFRAG illustrates how Database Systems are becoming even more integral to LLM inference! By making clever use of how context vectors are integrated with LLM generation, REFRAG is able to make TTFT (Time-to-First-Token) 31X faster and TTIT (Time-to-Iterative-Token) 3X faster, overall improving LLM throughput by 7X! REFRAG is also able to process much longer input contexts than standard LLMs! Most of the RAG systems today that are built with Vector Databases, such as Weaviate, throw away the associated vector with retrieved search results, only making use of the text content. REFRAG instead passes these vectors to the LLM, instead of the text content! This is further enhanced with a fine-grained chunk encoding strategy, and a 4-stage training algorithm that includes a selective chunk expansion policy trained with GRPO / PPO. I hope you find the video useful! Happy to answer any questions, or discuss any ideas about REFRAG! Chapters 0:00 REFRAG Explained! 1:58 REFRAG Architecture 5:20 Speed gains 8:50 Training Stages for REFRAG 12:15 RL for Selective Expansion 16:45 Experimental Results 21:32 Ablation Studies 24:55 Personal Takeaways Links REFRAG Paper Link: https://arxiv.org/abs/2509.01092 Transformers as Universal Computation Engines: https://arxiv.org/abs/2103.05247
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Related AI Lessons

Limits of RAG and implications for self-hosted AI
Learn the limitations of Retrieval-Augmented Generation (RAG) and their implications for self-hosted AI, understanding that scalability is not infinite
Medium · RAG
Best Vector Databases for RAG (Free & Paid)
Learn about the best vector databases for RAG to enable large language models to interact with private and domain-specific information
Medium · RAG
Retrieval-Augmented Generation: The Architecture That Made AI Actually Useful in Production
Learn about Retrieval-Augmented Generation (RAG), the AI architecture that enables useful AI applications in production, and how to implement it
Medium · RAG
Most RAG Systems Waste 60% of Their Retrieval Calls. Skill-RAG Fixes That.
Optimize RAG systems to reduce wasted retrieval calls by up to 60% using Skill-RAG, improving overall efficiency
Medium · AI

Chapters (8)

REFRAG Explained!
1:58 REFRAG Architecture
5:20 Speed gains
8:50 Training Stages for REFRAG
12:15 RL for Selective Expansion
16:45 Experimental Results
21:32 Ablation Studies
24:55 Personal Takeaways
Up next
Watch this before applying for jobs as a developer.
Tech With Tim
Watch →