REFRAG Explained!

Weaviate vector database · Advanced ·🔍 RAG & Vector Search ·7mo ago

Skills: Vector Stores90%LLM Engineering80%

REFRAG from Meta Superintelligence Labs is a SUPER exciting breakthrough that may spark the second summer of Vector Databases! REFRAG illustrates how Database Systems are becoming even more integral to LLM inference! By making clever use of how context vectors are integrated with LLM generation, REFRAG is able to make TTFT (Time-to-First-Token) 31X faster and TTIT (Time-to-Iterative-Token) 3X faster, overall improving LLM throughput by 7X! REFRAG is also able to process much longer input contexts than standard LLMs! Most of the RAG systems today that are built with Vector Databases, such as Weaviate, throw away the associated vector with retrieved search results, only making use of the text content. REFRAG instead passes these vectors to the LLM, instead of the text content! This is further enhanced with a fine-grained chunk encoding strategy, and a 4-stage training algorithm that includes a selective chunk expansion policy trained with GRPO / PPO. I hope you find the video useful! Happy to answer any questions, or discuss any ideas about REFRAG! Chapters 0:00 REFRAG Explained! 1:58 REFRAG Architecture 5:20 Speed gains 8:50 Training Stages for REFRAG 12:15 RL for Selective Expansion 16:45 Experimental Results 21:32 Ablation Studies 24:55 Personal Takeaways Links REFRAG Paper Link: https://arxiv.org/abs/2509.01092 Transformers as Universal Computation Engines: https://arxiv.org/abs/2103.05247

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

More on: Vector Stores

View skill →

FULLY LOCAL Mistral AI PDF Processing [Hands-on Tutorial]

FULLY LOCAL Mistral AI PDF Processing [Hands-on Tutorial]

🚀 Deploy a PRIVATE Chroma Vector DB to AWS | Step by step 🚀

🚀 Deploy a PRIVATE Chroma Vector DB to AWS | Step by step 🚀

AI-Powered Resumes with Super People & Weaviate

AI-Powered Resumes with Super People & Weaviate

Weaviate vector database

Build Advanced Retrieval-Augmented Generation (RAG) with MongoDB Vector Search

Build Advanced Retrieval-Augmented Generation (RAG) with MongoDB Vector Search

Creating & Ingesting Your Own Embeddings in Weaviate | Vector Databases for Beginners | Part 7

Creating & Ingesting Your Own Embeddings in Weaviate | Vector Databases for Beginners | Part 7

Data Science Dojo

Configuring Vector Search in AlloyDB

Related AI Lessons

Limits of RAG and implications for self-hosted AI

Learn the limitations of Retrieval-Augmented Generation (RAG) and their implications for self-hosted AI, understanding that scalability is not infinite

Best Vector Databases for RAG (Free & Paid)

Learn about the best vector databases for RAG to enable large language models to interact with private and domain-specific information

Retrieval-Augmented Generation: The Architecture That Made AI Actually Useful in Production

Learn about Retrieval-Augmented Generation (RAG), the AI architecture that enables useful AI applications in production, and how to implement it

Most RAG Systems Waste 60% of Their Retrieval Calls. Skill-RAG Fixes That.

Optimize RAG systems to reduce wasted retrieval calls by up to 60% using Skill-RAG, improving overall efficiency

Chapters (8)

REFRAG Explained!

1:58 REFRAG Architecture

5:20 Speed gains

8:50 Training Stages for REFRAG

12:15 RL for Selective Expansion

16:45 Experimental Results

21:32 Ablation Studies

24:55 Personal Takeaways

Watch this before applying for jobs as a developer.