Local Retrieval Augmented Generation (RAG) from Scratch (step by step tutorial)

Daniel Bourke · Beginner ·🔍 RAG & Vector Search ·2y ago
In this video we'll build a Retrieval Augmented Generation (RAG) pipeline to run locally from scratch. There are frameworks to do this such as LangChain and LlamaIndex, however, building from scratch means that you'll know all the parts of the puzzle. Specifically, we'll build NutriChat, a RAG pipeline that allows someone to ask questions of a 1200 page Nutrition Textbook PDF. Code on GitHub - https://github.com/mrdbourke/simple-local-rag Whiteboard - https://whimsical.com/simple-local-rag-workflow-39kToR3yNf7E8kY4sS2tjV Be sure to check out NVIDIA GTC, NVIDIA's GPU Technology Conference running from March 18-21. It's free to attend virtually! That's what I'm doing. Sign up to GTC24 here: https://nvda.ws/3GUZygQ Other links: Download Nutrify (take a photo of food and learn about it) - https://nutrify.app Learn AI/ML (beginner-friendly course) - https://dbourke.link/ZTMMLcourse Learn TensorFlow - https://dbourke.link/ZTMTFcourse Learn PyTorch - https://dbourke.link/ZTMPyTorch AI/ML courses/books I recommend - https://www.mrdbourke.com/ml-resources/ Read my novel Charlie Walks - https://www.charliewalks.com Connect elsewhere: Web - https://dbourke.link/web Twitter - https://www.twitter.com/mrdbourke Twitch - https://www.twitch.tv/mrdbourke ArXiv channel (past streams) - https://dbourke.link/archive-channel Get email updates on my work - https://dbourke.link/newsletter Timestamps: 0:00 - Intro/NVIDIA GTC 2:25 - Part 0: Resources and overview 8:33 - Part 1: What is RAG? Why RAG? Why locally? 12:26 - Why RAG? 19:31 - What can RAG be used for? 26:08 - Why run locally? 30:26 - Part 2: What we're going to build 40:40 - Original Retrieval Augmented Generation paper 46:04 - Part 3: Importing and processing a PDF document 48:29 - Code starts! Importing a PDF and making it readable 1:17:09 - Part 4: Preprocessing our text into chunks (text splitting) 1:28:27 - Chunking our sentences together 1:56:38 - Part 5: Embedding creation 1:58:15 - Incredible embeddin
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Daniel Bourke · Daniel Bourke · 0 of 60

← Previous Next →
1 Xbox One S Unboxing and Xbox One Comparison
Xbox One S Unboxing and Xbox One Comparison
Daniel Bourke
2 Text/Profanity Checker in Python
Text/Profanity Checker in Python
Daniel Bourke
3 Drawing Flowers in Python
Drawing Flowers in Python
Daniel Bourke
4 Finding The Right Medium - TDBS 18 April 2017
Finding The Right Medium - TDBS 18 April 2017
Daniel Bourke
5 What Is Neuralink??! - TDBS 22 April 2017
What Is Neuralink??! - TDBS 22 April 2017
Daniel Bourke
6 Disagree and Commit, Words of Wisdom from Jeff Bezos - TDBS 19 April 2017
Disagree and Commit, Words of Wisdom from Jeff Bezos - TDBS 19 April 2017
Daniel Bourke
7 A Lesson In Movement | Raw Training Australia
A Lesson In Movement | Raw Training Australia
Daniel Bourke
8 FALLING IS FUN | Functional Friday 4
FALLING IS FUN | Functional Friday 4
Daniel Bourke
9 My first HACKATHON! | 100 Days of Code 1
My first HACKATHON! | 100 Days of Code 1
Daniel Bourke
10 MORE MACHINE LEARNING | 100 Days of Code 2
MORE MACHINE LEARNING | 100 Days of Code 2
Daniel Bourke
11 TensorBoard and learning from Einstein | 100 Days of Code 3
TensorBoard and learning from Einstein | 100 Days of Code 3
Daniel Bourke
12 Job Interview Tips and Open Ocean Swim | 100 Days of Code 4
Job Interview Tips and Open Ocean Swim | 100 Days of Code 4
Daniel Bourke
13 I Want To Help 100,000 People Workout | AI Powered Personal Trainer
I Want To Help 100,000 People Workout | AI Powered Personal Trainer
Daniel Bourke
14 MACHINE LEARNING IN 5 MINUTES
MACHINE LEARNING IN 5 MINUTES
Daniel Bourke
15 COFFEE, YOGA and AWS | 100 Days of Code 5
COFFEE, YOGA and AWS | 100 Days of Code 5
Daniel Bourke
16 MY FIRST STARTUP WEEKEND | 100 Days of Code 6
MY FIRST STARTUP WEEKEND | 100 Days of Code 6
Daniel Bourke
17 GENERATING TV SCRIPTS WITH DEEP LEARNING | 100 Days of Code 7
GENERATING TV SCRIPTS WITH DEEP LEARNING | 100 Days of Code 7
Daniel Bourke
18 Attention, please
Attention, please
Daniel Bourke
19 TEACHING BOTS TO PLAY GAMES | 100 Days of Code 9
TEACHING BOTS TO PLAY GAMES | 100 Days of Code 9
Daniel Bourke
20 Udacity Deep Learning Nanodegree Language Translation Project Submission | 100 Days of Code 10
Udacity Deep Learning Nanodegree Language Translation Project Submission | 100 Days of Code 10
Daniel Bourke
21 Learning about Generative Adversarial Networks on Udacity | 100 Days of Code 11
Learning about Generative Adversarial Networks on Udacity | 100 Days of Code 11
Daniel Bourke
22 Completing Andrew Ng's Machine Learning Course on Coursera | 100 Days of Code 12
Completing Andrew Ng's Machine Learning Course on Coursera | 100 Days of Code 12
Daniel Bourke
23 Finishing the Treehouse Python Track | 100 Days of Code 13
Finishing the Treehouse Python Track | 100 Days of Code 13
Daniel Bourke
24 GENERATING FACES WITH GANs | 100 Days of Code 14
GENERATING FACES WITH GANs | 100 Days of Code 14
Daniel Bourke
25 Graduating From the Udacity Deep Learning Nanodegree | 100 Days of Code 15
Graduating From the Udacity Deep Learning Nanodegree | 100 Days of Code 15
Daniel Bourke
26 WHAT I'VE LEARNED FROM TALKING TO PEOPLE
WHAT I'VE LEARNED FROM TALKING TO PEOPLE
Daniel Bourke
27 3 Life Principles I Learned From Ray Dalio
3 Life Principles I Learned From Ray Dalio
Daniel Bourke
28 PYTHON && POETRY | 100 Days of Code 16
PYTHON && POETRY | 100 Days of Code 16
Daniel Bourke
29 Physique Update and 6 Things I Wish I Knew Before Starting Gym
Physique Update and 6 Things I Wish I Knew Before Starting Gym
Daniel Bourke
30 The 100 Days is Over! | 100 Days of Code 17
The 100 Days is Over! | 100 Days of Code 17
Daniel Bourke
31 How to Burn Over 100 Calories in 4 Minutes
How to Burn Over 100 Calories in 4 Minutes
Daniel Bourke
32 Solving Sudoku with AI | Learning Intelligence 1
Solving Sudoku with AI | Learning Intelligence 1
Daniel Bourke
33 Upper Body Calisthenics Workout in the Park
Upper Body Calisthenics Workout in the Park
Daniel Bourke
34 What is an Adversarial Search Agent? | Learning Intelligence 2
What is an Adversarial Search Agent? | Learning Intelligence 2
Daniel Bourke
35 My Self-Created Artificial Intelligence Master's Degree | Learning Intelligence 0
My Self-Created Artificial Intelligence Master's Degree | Learning Intelligence 0
Daniel Bourke
36 Try Going Over It Again | Learning Intelligence 3
Try Going Over It Again | Learning Intelligence 3
Daniel Bourke
37 Python and Pullups | Learning Intelligence 4
Python and Pullups | Learning Intelligence 4
Daniel Bourke
38 AI Meets Blockchain! | Learning Intelligence 5
AI Meets Blockchain! | Learning Intelligence 5
Daniel Bourke
39 How to Pass the Turing Test + I FAILED | Learning Intelligence 6
How to Pass the Turing Test + I FAILED | Learning Intelligence 6
Daniel Bourke
40 Biology and Physics meet Computer Science | Learning Intelligence 7
Biology and Physics meet Computer Science | Learning Intelligence 7
Daniel Bourke
41 Udacity Artificial Intelligence Nanodegree Project 3 Progress | Learning Intelligence 8
Udacity Artificial Intelligence Nanodegree Project 3 Progress | Learning Intelligence 8
Daniel Bourke
42 Passing Project 3 of Udacity's Artificial Intelligence Nanodegree | Learning Intelligence 9
Passing Project 3 of Udacity's Artificial Intelligence Nanodegree | Learning Intelligence 9
Daniel Bourke
43 Bayes Networks, Hidden Markov Models and How I Wake Up | Learning Intelligence 10
Bayes Networks, Hidden Markov Models and How I Wake Up | Learning Intelligence 10
Daniel Bourke
44 Udacity AI Nanodegree Progress and Bayes' Rule Explained | Learning Intelligence 11
Udacity AI Nanodegree Progress and Bayes' Rule Explained | Learning Intelligence 11
Daniel Bourke
45 Udacity AI Nanodegree Project 4 Planning and Progress | Learning Intelligence 12
Udacity AI Nanodegree Project 4 Planning and Progress | Learning Intelligence 12
Daniel Bourke
46 Finishing Term 1 of Udacity's Artificial Intelligence Nanodegree | Learning Intelligence 13
Finishing Term 1 of Udacity's Artificial Intelligence Nanodegree | Learning Intelligence 13
Daniel Bourke
47 deeplearning.ai Progress! | Learning Intelligence 14
deeplearning.ai Progress! | Learning Intelligence 14
Daniel Bourke
48 Coursera Deep Learning Specialization Progress | Learning Intelligence 15
Coursera Deep Learning Specialization Progress | Learning Intelligence 15
Daniel Bourke
49 Computer Vision Basics + More deeplearning.ai Progress! | Learning Intelligence 16
Computer Vision Basics + More deeplearning.ai Progress! | Learning Intelligence 16
Daniel Bourke
50 My Experience at CodeCamp, Intro to Keras and Failing Hard | Learning Intelligence 17
My Experience at CodeCamp, Intro to Keras and Failing Hard | Learning Intelligence 17
Daniel Bourke
51 In-Depth Udacity Deep Learning Nanodegree Review
In-Depth Udacity Deep Learning Nanodegree Review
Daniel Bourke
52 Completing the Deeplearning.ai Specialization on Coursera | Learning Intelligence 18
Completing the Deeplearning.ai Specialization on Coursera | Learning Intelligence 18
Daniel Bourke
53 You're Never Too Young to Start Learning AI - Learning Intelligence Talks with Shaik Asad
You're Never Too Young to Start Learning AI - Learning Intelligence Talks with Shaik Asad
Daniel Bourke
54 Starting Term 2 of the Udacity Artificial Intelligence Nanodegree | Learning Intelligence 19
Starting Term 2 of the Udacity Artificial Intelligence Nanodegree | Learning Intelligence 19
Daniel Bourke
55 Submitting the Computer Vision Capstone Project | Udacity AI Nanodegree | Learning Intelligence 20
Submitting the Computer Vision Capstone Project | Udacity AI Nanodegree | Learning Intelligence 20
Daniel Bourke
56 Leg Day at World Gym Northlakes ft. Ben Jones Fitness
Leg Day at World Gym Northlakes ft. Ben Jones Fitness
Daniel Bourke
57 deeplearning.ai Sequence Models Course Progress | Learning Intelligence 21
deeplearning.ai Sequence Models Course Progress | Learning Intelligence 21
Daniel Bourke
58 Graduating from the deeplearning.ai Coursera Specialization | Learning Intelligence 22
Graduating from the deeplearning.ai Coursera Specialization | Learning Intelligence 22
Daniel Bourke
59 Udacity Artificial Intelligence Nanodegree NLP Concentration Progress | Learning Intelligence 23
Udacity Artificial Intelligence Nanodegree NLP Concentration Progress | Learning Intelligence 23
Daniel Bourke
60 Learning How to Build What's Next at Google Cloud On Board Brisbane
Learning How to Build What's Next at Google Cloud On Board Brisbane
Daniel Bourke

Related AI Lessons

Chunking Is Easy. Parsing Is Hard.
Learn why your RAG pipeline may be reasoning over broken data and how to improve it by understanding the differences between chunking and parsing
Medium · AI
Chunking Is Easy. Parsing Is Hard.
Learn how chunking and parsing impact RAG pipelines and why parsing is a crucial step in ensuring high-quality data
Medium · Machine Learning
RAG Evaluation with RAGAS: Measuring Faithfulness, Context Precision, and Recall in Production
Learn to evaluate RAG models using RAGAS, measuring faithfulness, context precision, and recall in production environments
Dev.to · Anna Danilec
Chunking for RAG: stop tuning the wrong knob
Learn how to optimize RAG performance with a practical chunking playbook, avoiding common pitfalls and improving evaluation metrics
Dev.to · saurabh naik

Chapters (14)

Intro/NVIDIA GTC
2:25 Part 0: Resources and overview
8:33 Part 1: What is RAG? Why RAG? Why locally?
12:26 Why RAG?
19:31 What can RAG be used for?
26:08 Why run locally?
30:26 Part 2: What we're going to build
40:40 Original Retrieval Augmented Generation paper
46:04 Part 3: Importing and processing a PDF document
48:29 Code starts! Importing a PDF and making it readable
1:17:09 Part 4: Preprocessing our text into chunks (text splitting)
1:28:27 Chunking our sentences together
1:56:38 Part 5: Embedding creation
1:58:15 Incredible embeddin
Up next
Watch this before applying for jobs as a developer.
Tech With Tim
Watch →