REFRAG with Xiaoqiang Lin - Weaviate Podcast #130!

Weaviate vector database · Beginner ·📄 Research Papers Explained ·6mo ago

Skills: RAG Basics80%

Hey everyone! Thank so much for watching the 130th episode of the Weaviate Podcast featuring Xiaoqiang Lin! Xiaoqiang Lin is a Ph.D. student at the National University of Singapore. During his time at Meta, Xiaoqiang lead the research behind REFRAG: Rethinking RAG-based Decoding. Traditional RAG systems use vectors to retrieve relevant context with semantic search, but then throw away the vectors when passing the context to the LLM. REFRAG instead feeds the LLM these pre-compute vectors, achieving massive gains in long context processing and LLM inference speed! REFRAG makes Time-To-First-Token (TTFT) 31x faster and Time-To-Iterative-Token (TTIT) 3x faster, boosting overall LLM throughput by 7x while also being able to handle much longer contexts! There are so many interesting aspects to this and I really loved diving into the details with Xiaoqiang! I hope you enjoy the podcast! Links: REFRAG: Rethinking RAG-based Decoding: https://arxiv.org/pdf/2509.01092 REFRAG Explained: https://www.youtube.com/watch?v=Ek0tZootK00 Check out more works from Xiaoqiang Lin: https://xqlin98.github.io/ Chapters 0:00 Welcome Xiaoqiang! 0:43 An Introduction to REFRAG 5:59 The limits of REFRAG compression 13:07 Redundant Information in RAG 22:01 Vector Databases 27:32 Chunk Expansion Policy with RL 30:57 4 Stage Training Algorithm 38:49 How to use REFRAG LLMs? 42:01 Downstream Task Experiments 48:41 TTFT vs. Throughput Latency 52:43 MRL and REFRAG Chunk Schemas 56:27 Future Directions for AI Testing chapters fix

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

More on: RAG Basics

View skill →

High Performance (Realtime) RAG Chains: From Basic to Advanced

High Performance (Realtime) RAG Chains: From Basic to Advanced

Coding the Ultimate RAG Engine from Zero

Coding the Ultimate RAG Engine from Zero

Building Agentic RAG From Scratch in Pure Python

Building Agentic RAG From Scratch in Pure Python

Build an LLM and RAG-based Chat Application using AlloyDB and LangChain

RAG Demo for Beginners: Full Hands-On Tutorial in Tamil | Build Your Own RAG AI | Karthik's Show

RAG Demo for Beginners: Full Hands-On Tutorial in Tamil | Build Your Own RAG AI | Karthik's Show

RAG with LangChain on Google Cloud

RAG with LangChain on Google Cloud

Google Cloud Tech

Related AI Lessons

The ABCs of reading medical research and review papers these days

Learn to critically evaluate medical research papers by accepting nothing at face value, believing no one blindly, and checking everything

#1 DevLog Meta-research: I Got Tired of Tab Chaos While Reading Research Papers.

Learn to manage research paper tabs efficiently and apply meta-research techniques to improve productivity

How to Set Up a Karpathy-Style Wiki for Your Research Field

Learn to set up a Karpathy-style wiki for your research field to organize and share knowledge effectively

The Non-Optimality of Scientific Knowledge: Path Dependence, Lock-In, and The Local Minimum Trap

Scientific knowledge may be stuck in a local minimum, hindering optimal progress, and understanding this concept is crucial for advancing research

Chapters (12)

Welcome Xiaoqiang!

0:43 An Introduction to REFRAG

5:59 The limits of REFRAG compression

13:07 Redundant Information in RAG

22:01 Vector Databases

27:32 Chunk Expansion Policy with RL

30:57 4 Stage Training Algorithm

38:49 How to use REFRAG LLMs?

42:01 Downstream Task Experiments

48:41 TTFT vs. Throughput Latency

52:43 MRL and REFRAG Chunk Schemas

56:27 Future Directions for AI

X Revealed Their Secret Algorithm on Github #algorithm #twitter #tech

Analytics Vidhya