REFRAG with Xiaoqiang Lin - Weaviate Podcast #130!
Hey everyone! Thank so much for watching the 130th episode of the Weaviate Podcast featuring Xiaoqiang Lin! Xiaoqiang Lin is a Ph.D. student at the National University of Singapore. During his time at Meta, Xiaoqiang lead the research behind REFRAG: Rethinking RAG-based Decoding. Traditional RAG systems use vectors to retrieve relevant context with semantic search, but then throw away the vectors when passing the context to the LLM. REFRAG instead feeds the LLM these pre-compute vectors, achieving massive gains in long context processing and LLM inference speed! REFRAG makes Time-To-First-Toke…
Watch on YouTube ↗
(saves to browser)
Chapters (12)
Welcome Xiaoqiang!
0:43
An Introduction to REFRAG
5:59
The limits of REFRAG compression
13:07
Redundant Information in RAG
22:01
Vector Databases
27:32
Chunk Expansion Policy with RL
30:57
4 Stage Training Algorithm
38:49
How to use REFRAG LLMs?
42:01
Downstream Task Experiments
48:41
TTFT vs. Throughput Latency
52:43
MRL and REFRAG Chunk Schemas
56:27
Future Directions for AI
DeepCamp AI