REFRAG with Xiaoqiang Lin - Weaviate Podcast #130!
Skills:
RAG Basics80%
Hey everyone! Thank so much for watching the 130th episode of the Weaviate Podcast featuring Xiaoqiang Lin! Xiaoqiang Lin is a Ph.D. student at the National University of Singapore. During his time at Meta, Xiaoqiang lead the research behind REFRAG: Rethinking RAG-based Decoding. Traditional RAG systems use vectors to retrieve relevant context with semantic search, but then throw away the vectors when passing the context to the LLM. REFRAG instead feeds the LLM these pre-compute vectors, achieving massive gains in long context processing and LLM inference speed! REFRAG makes Time-To-First-Token (TTFT) 31x faster and Time-To-Iterative-Token (TTIT) 3x faster, boosting overall LLM throughput by 7x while also being able to handle much longer contexts!
There are so many interesting aspects to this and I really loved diving into the details with Xiaoqiang! I hope you enjoy the podcast!
Links:
REFRAG: Rethinking RAG-based Decoding: https://arxiv.org/pdf/2509.01092
REFRAG Explained: https://www.youtube.com/watch?v=Ek0tZootK00
Check out more works from Xiaoqiang Lin: https://xqlin98.github.io/
Chapters
0:00 Welcome Xiaoqiang!
0:43 An Introduction to REFRAG
5:59 The limits of REFRAG compression
13:07 Redundant Information in RAG
22:01 Vector Databases
27:32 Chunk Expansion Policy with RL
30:57 4 Stage Training Algorithm
38:49 How to use REFRAG LLMs?
42:01 Downstream Task Experiments
48:41 TTFT vs. Throughput Latency
52:43 MRL and REFRAG Chunk Schemas
56:27 Future Directions for AI
Testing chapters fix
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
More on: RAG Basics
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
The ABCs of reading medical research and review papers these days
Medium · LLM
#1 DevLog Meta-research: I Got Tired of Tab Chaos While Reading Research Papers.
Dev.to AI
How to Set Up a Karpathy-Style Wiki for Your Research Field
Medium · AI
The Non-Optimality of Scientific Knowledge: Path Dependence, Lock-In, and The Local Minimum Trap
ArXiv cs.AI
Chapters (12)
Welcome Xiaoqiang!
0:43
An Introduction to REFRAG
5:59
The limits of REFRAG compression
13:07
Redundant Information in RAG
22:01
Vector Databases
27:32
Chunk Expansion Policy with RL
30:57
4 Stage Training Algorithm
38:49
How to use REFRAG LLMs?
42:01
Downstream Task Experiments
48:41
TTFT vs. Throughput Latency
52:43
MRL and REFRAG Chunk Schemas
56:27
Future Directions for AI
🎓
Tutor Explanation
DeepCamp AI