REFRAG Explained!

Weaviate vector database · Advanced ·🧠 Large Language Models ·8mo ago

Key Takeaways

Explains REFRAG, a breakthrough in Vector Databases for LLM inference

Original Description

REFRAG from Meta Superintelligence Labs is a SUPER exciting breakthrough that may spark the second summer of Vector Databases! REFRAG illustrates how Database Systems are becoming even more integral to LLM inference! By making clever use of how context vectors are integrated with LLM generation, REFRAG is able to make TTFT (Time-to-First-Token) 31X faster and TTIT (Time-to-Iterative-Token) 3X faster, overall improving LLM throughput by 7X! REFRAG is also able to process much longer input contexts than standard LLMs! Most of the RAG systems today that are built with Vector Databases, such as Weaviate, throw away the associated vector with retrieved search results, only making use of the text content. REFRAG instead passes these vectors to the LLM, instead of the text content! This is further enhanced with a fine-grained chunk encoding strategy, and a 4-stage training algorithm that includes a selective chunk expansion policy trained with GRPO / PPO. I hope you find the video useful! Happy to answer any questions, or discuss any ideas about REFRAG! Chapters 0:00 REFRAG Explained! 1:58 REFRAG Architecture 5:20 Speed gains 8:50 Training Stages for REFRAG 12:15 RL for Selective Expansion 16:45 Experimental Results 21:32 Ablation Studies 24:55 Personal Takeaways Links REFRAG Paper Link: https://arxiv.org/abs/2509.01092 Transformers as Universal Computation Engines: https://arxiv.org/abs/2103.05247
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Related AI Lessons

Claude AI vs ChatGPT: Which One Is Actually Better in 2026?
Compare Claude AI and ChatGPT based on real-world usage and benchmarking to determine which one is better in 2026
Medium · AI
Claude AI vs ChatGPT: Which One Is Actually Better in 2026?
Compare Claude AI and ChatGPT to determine which AI model is better for your needs in 2026
Medium · Programming
IntelliBooks: Classic RAG vs Graph RAG vs Agentic RAG – Choosing the Right AI Retrieval Architecture for Enterprise AI
Learn to choose the right AI retrieval architecture for enterprise AI between Classic RAG, Graph RAG, and Agentic RAG
Dev.to AI
Fluid, natural voice translation with Gemini 3.5 Live Translate
Learn about Gemini 3.5 Live Translate, a new voice translation technology that enables fluid and natural conversations across languages
Dev.to AI

Chapters (8)

REFRAG Explained!
1:58 REFRAG Architecture
5:20 Speed gains
8:50 Training Stages for REFRAG
12:15 RL for Selective Expansion
16:45 Experimental Results
21:32 Ablation Studies
24:55 Personal Takeaways
Up next
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Dave Ebbelaar (LLM Eng)
Watch →