REFRAG with Xiaoqiang Lin - Weaviate Podcast #130!

Weaviate vector database · Beginner ·📄 Research Papers Explained ·6mo ago
Skills: RAG Basics80%
Hey everyone! Thank so much for watching the 130th episode of the Weaviate Podcast featuring Xiaoqiang Lin! Xiaoqiang Lin is a Ph.D. student at the National University of Singapore. During his time at Meta, Xiaoqiang lead the research behind REFRAG: Rethinking RAG-based Decoding. Traditional RAG systems use vectors to retrieve relevant context with semantic search, but then throw away the vectors when passing the context to the LLM. REFRAG instead feeds the LLM these pre-compute vectors, achieving massive gains in long context processing and LLM inference speed! REFRAG makes Time-To-First-Token (TTFT) 31x faster and Time-To-Iterative-Token (TTIT) 3x faster, boosting overall LLM throughput by 7x while also being able to handle much longer contexts! There are so many interesting aspects to this and I really loved diving into the details with Xiaoqiang! I hope you enjoy the podcast! Links: REFRAG: Rethinking RAG-based Decoding: https://arxiv.org/pdf/2509.01092 REFRAG Explained: https://www.youtube.com/watch?v=Ek0tZootK00 Check out more works from Xiaoqiang Lin: https://xqlin98.github.io/ Chapters 0:00 Welcome Xiaoqiang! 0:43 An Introduction to REFRAG 5:59 The limits of REFRAG compression 13:07 Redundant Information in RAG 22:01 Vector Databases 27:32 Chunk Expansion Policy with RL 30:57 4 Stage Training Algorithm 38:49 How to use REFRAG LLMs? 42:01 Downstream Task Experiments 48:41 TTFT vs. Throughput Latency 52:43 MRL and REFRAG Chunk Schemas 56:27 Future Directions for AI Testing chapters fix
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Related AI Lessons

The ABCs of reading medical research and review papers these days
Learn to critically evaluate medical research papers by accepting nothing at face value, believing no one blindly, and checking everything
Medium · LLM
#1 DevLog Meta-research: I Got Tired of Tab Chaos While Reading Research Papers.
Learn to manage research paper tabs efficiently and apply meta-research techniques to improve productivity
Dev.to AI
How to Set Up a Karpathy-Style Wiki for Your Research Field
Learn to set up a Karpathy-style wiki for your research field to organize and share knowledge effectively
Medium · AI
The Non-Optimality of Scientific Knowledge: Path Dependence, Lock-In, and The Local Minimum Trap
Scientific knowledge may be stuck in a local minimum, hindering optimal progress, and understanding this concept is crucial for advancing research
ArXiv cs.AI

Chapters (12)

Welcome Xiaoqiang!
0:43 An Introduction to REFRAG
5:59 The limits of REFRAG compression
13:07 Redundant Information in RAG
22:01 Vector Databases
27:32 Chunk Expansion Policy with RL
30:57 4 Stage Training Algorithm
38:49 How to use REFRAG LLMs?
42:01 Downstream Task Experiments
48:41 TTFT vs. Throughput Latency
52:43 MRL and REFRAG Chunk Schemas
56:27 Future Directions for AI
Up next
X Revealed Their Secret Algorithm on Github #algorithm #twitter #tech
Analytics Vidhya
Watch →