Build Your Own RAG System: Step-by-Step Python Tutorial (LangChain, CrewAI, OpenAI)
Key Takeaways
This video tutorial demonstrates how to build a Retrieval-Augmented Generation (RAG) system using LangChain, CrewAI, and OpenAI, covering the entire workflow from identifying relevant documents to deploying the system. The tutorial provides a step-by-step guide on how to use these tools to generate coherent responses to queries.
Full Transcript
hello everyone today we are going to learn to build a rag based system rag stands for retrieval augmented generation and simply put a rag system enables us to chat with a set of documents retrieval augmented generation or rag is a hybrid AI approach that combines retrieval with generation retrieval is the process to fetch relevant context from documents or knowledge base based on a query or a prompt generation is the process of combining the relevant context F by the retrieval with the general awareness of the llm to generate a coherent response a rack system can be used for several tasks such as building a customer query resolution system using reference documents it can also be used to build an internal knowledge retrieval system for employees to quickly find answers from company's documents a rack system can also be used to build a legal case research system or a healthcare decision support system simply put wherever there are lot of documents or context involved and we need to extract specific information that's where a rack can be used in this video I will be showing you how to build a rack system using a simplified version of learner query resolution system that we have built at analytics with the you could use this system as a reference to build other various kinds of rag applications the workflow over here shows the key components in a rack system let's quickly walk over it and understand it while building rag systems I often divide the process into two phases phase one involves building a vector database and phase two involves testing the responses to the queries the first step is identify relevant do documents in our use case our Learners post their questions on selected queries and it's very important to respond to these queries as soon as possible so that Learners have a great experience the queries that we get are mostly specific to the course or the lesson that Learners are going through sometimes these queries can be General as well so the list of documents that we initially thought were going to be relevant to answer the learner queries were videos of the courses ppts of the courses subtitles of the courses and the past queries that the Learners have posted on testing we realize that processing videos directly is going to be expensive without significant benefit to the Quality the ppts were also not capturing the entire context as instructors often have a tendency to explain a lot of Concepts verbally the last two resources the subtitles and the past queries are actually coming out to be very effective in the final version we used both of them however for this video to keep things simple I'm just going to show you the system which is based on subtitles after we identify the documents the next step is to break the documents into smaller parts called chunks this is required so that we just get the small relevant section of the docum ment which is relevant to the query posted the next step is embedding models now our computer systems don't really understand text they only understand numbers so this bunch of text is converted to some meaningful numbers which capture the gist of the spoken words this is done through some pre-trained embedding models some popular embedding models are open AI text embedding and sentence birth but generally speaking most most of the popular llms have their own set of embeddings so there's an embedding by Lama there's an embedding model which is used in deep seek and so on once we get the embeddings from chunks we need to store them efficiently using a vector DB store there are various Vector DB stores which are popular one of them is pine cone the other is viate and another one is chroma DB in our scenario we have used chroma because it's open source and free to use now that we understand phase one let's start implementing this in code let's move to vs code we will Begin by importing some essential libraries let's understand the important ones the recursive text splitter helps in chunking the documents the embeddings that we are using in this case are the openi embeddings we are using the chroma DB store which is already built in within Lang chain and we also importing the essential classes from the crew AI library to build the agents later on let's import the libraries next I'm going to use my openai API key you can either use an environment variable or store the key in a file like the way I have done it next we are going to build a helper function which will use P SRT to process the SRT files that we have uh in our system so let's run this function now the structure in which our data is stored is that there's a folder and then it will have multiple SRT files so what we are going to do over here is we are going to refer to the name of the course and we are creating a dictionary where we are directing it to the file path where that folder is stor and in that folder we would have various kind of files there would be ppts there would be SRT files for various different lessons so what this punch of code is going to do is is it's just going to extract the SRT files and it would start storing them uh in a list right so and that list is course SRT files right so in fact that's a dictionary rather so let's run that in fact uh let me show you some s files as well right so uh basically uh what we have is the different file Paths of the various SRT files that we have in the system so just to keep the code clean let me comment it back okay now we come to the next part of our system right we are going to chunk the document and set up a vector store as well after we get the embeddings done so one thing that you would notice is that we are creating a persistant directory to store our Vector DB the reason for this is let's say if we run this file again we would not want the same embeddings to be used again right so even though embedding models are quite cheap nowadays but still why not save money if you already have embeddings available for a particular course right so so in case embeddings are present for a course with the persistent directory we can make sure that we don't recreate the embeddings for that part okay so we set up a chunk size of, right so that means for each thousand characters we'll have one chunk and we have also added a chunk overlap over here a chunk overlap is added so that the context does not end abruptly in a chunk right so uh it would be like this so the text would start from here and the next chunk is going to start from a little bit of an overlap so that no context is lost between the different chunks we initialize our open air embeddings and then set up the vector store so what we have done is we have just set up things so far we have not actually built the vector store let me set up the vector store over here we have named The Collection as course material and this collection name would be used later on to retrieve information from the vector store right so and we have passed on the initialized open air embeddings over here along with the persistent directory that we created earlier okay let's run this code as well okay we get some warnings related to langin replication but not really important okay so this is a very interesting bit of code that we have done so just so that we can estimate we have also added the part about how much will it cost us to do the embeddings right so we have got some estimates on the costing and we are going to print out the total cost of actually doing the embedding right so from the dictionary that we saw earlier we are going to look at each of the course get the SRD files over there and add them to the collection right so interestingly what we also doing is we are creating a collection with a meta name we are creating the meta name as the name of the course this is a very important step because this is going to help us in efficient retrieval whenever we are going to post a query it is only going to look at the SRT files of that particular course so in our system when a learner posts a query they post it for a particular course and we needn't go through the entire Vector database to look at the relevant content for that we just need to directly jump to the meta description of that particular course and that should be quite efficient for us so this is a key component that you should be also thinking about in implementing in your system okay so uh what we do over here is we use the some of the functions that we created earlier to extract text from the SRD file this was the helper function that we created in the very beginning uh after we extract we use the document Library that we imported earlier and then the text splitter that we created in the earlier code with the chunk size of 1,000 and an overlap of 200 right so this is going to divide the data into multiple parts and we are going to do this in batch processes and finally in the vector store we are going to add the documents right so Vector store is something that we initialized earlier with the open air embedding LS and the persistent directory okay so let's run this code it's going to take maybe 30 seconds or so let's see how long does it take and let's look at the cost as well again this is a dummy version or a small version of the actual solution that we have implemented so in this code we have just used subtitles of three courses right so uh we see that it has added one course maybe for the other two courses the chunks were already present right so we have a course on Lang chain and it has added chunks for that particular course uh if I run it again it probably should say uh course were already added in fact let's try it out but before we do that let's also look at the cost nothing significant very minimal cost but also because uh we only had subtitles for one course but at analytics with we have several courses on various topics okay so let me just run this course code again right so this is what I wanted to show you it says course already exists right so this course name is already present over there and that's why it would not recreate the embedding process or actually we can show it as well so while this code was being run I realized that the names of the two other courses that we had in the system were not correct so I went above and changed the file Paths of those folders and then again added to the database and again that shows the beauty of this part right so the introduction to Lang chain course we already had Incorporated the subtitles and it said course already exists the two other courses which were there they were not present earlier and we have now embeddings for those two courses as well that brings us to the end of phase one before we jump to phase two let's understand what phase two brings for us phase two is all about querying and getting the response once the rack system receives the query it goes to the same embedding model as used in phase one and we get the embedded query the embedded query is then matched with similar embeddings in the vector DB store to get the relevant SRT contexts the Matched chunks are retrieved and used as relevant context for our current query in The Next Step both the query and the Rel document passes through the LM which then generates a final response now let's jump back to our python code and Implement phase two over there okay so by now we have created our Vector database toore the next phase is all about querying the data we have built a helper function over here called retrieve course materials what this function is going to do is first of all based on the query and the course name only filter out the the relevant course it will not look at other courses it will only look at the relevant course this brings back to the meta description that we talked about earlier once it Narrows down search to the embeddings for a particular course it's going to find the most similar embeddings based on the query and by default this is based on cosine similarity when we specify k equal to 3 we are saying give us the top three results only when we get the results we combine them in a document ment and return them so let's run the system and probably try out on a query as well so we have a course called introduction to deep learning using py to and one probable question that Learners would probably ask in this course is what is gradient descent so let's try and run this function and see the output okay so let's look at the relevant context from subtitles and if I if you look at the first chunk it seems quite relevant gradient descent is an optimization technique used to find the local minimum or optimize loss function right this is the first chunk the next chunk talks about linear models and mathematical solution and visibly so it is less related to gradient distance compared to the first chunk I'm guessing the third chunk would be even less relatable to gradient descent even though the second chunk mentions some bit about graded descent as well right so it says this is where gradient descent comes into picture but the first one was very spoton let's look at the third chunk as well uh this brings us to the intuition behind gradient descent right so our system is working fairly well all the three documents that it has retrieved on this query on what is gradient descent seem to be quite relevant this is the set of relevant documents that we would now want to pass on to llm so that it can take this context as well as the query to answer this question finally on what is gradient descent okay so let's move on now we could have just built a simple llm to do these tasks for us but we wanted a more professional system so we are building an agent using the crew AI library right so it has some popular classes which help you build an agent one of them is Agent the other is Task and then finally there's another one called crewp so as the name suggest the agent class helps us to build an agent and crew AI gives us enough opportunity to add relevant context so we are initializing this agent but look at the context we have assigned a role to the agent and in this case since we are answering queries it is learning support specialist look at the goal you help Learners with their queries with the best possible response backstory helps us add other detailed context related to the the overall objective of the learning support specialist right so I'm not going to uh read it but yes we give context about what we do we are a n tech company and our focus is courses on machine learning generative AI Etc okay this is the agent's goal let's look at the task that the agent would do again look at the elaborate description that we have provided over here I'll come to the description in a while but let me showcase the part that we have added within curly braces this is not string formatting but in crew AI within the task or for that matter even within the agent when we add some context within curly brackets they act as variables input variables that we can use so one input variable that we have provided is the query the actual question we are saying answer the learner queries to best of your abilities try to keep your response concise with less than 100 words here is the query and then the variable similarly the relevant content that we retrieved from the first step using that function is getting pasted over here here is similar content from course extracted from subtitles so one technique to build good agents is to provide as elaborate context as possible to your agent and that's what we are doing over here apart from that we have also added past discussions uh in this case these are not historical past discussions but these are discuss questions which happened to and from so let's say for an example if somebody asks what is gradient descent and we respond to that the person may have a follow-up query can you explain it to me in more detail right so there could be a thread of conversation happening over there and that thread is what we are passing over here so that the agent has the complete context of the past discussion which is happening finally we give the learner's name over there so that the agent can respond in a more personalized manner okay and we are saying that the output should be a accurate response to the query let's run this one as well query answer agent is not defined I think I forgot to run the earlier code let me run it again and come back to this file finally in the crew we'll combine the agent and the task and we keeping the verbos as false so that we don't get interim output and the crew that we have created is called response crew let's run it now what we are going to do is this is something a little practical it's uh not really related to building an agent but uh we are importing a CSV file which has a lot of queries so we are testing out various queries that have been posted uh and uh there are some helper bit of code that we are using that if a person has only one reply then don't look at the thread if it has more than one reply then look at the thread something like that uh we are getting the query so just basic processing in the current version of agent we are not processing images so in case a query has images we are saying we are not giving a response to that query the interesting part is this one where we are first retrieving the context from the function that we created earlier right and we are saving it as context and this is probably the CU of whatever we have done right so the crew that we created response crew is taking as input the query which was sort of referred over here the relevant content which has been extracted from here and then finally the thread in case uh we have that as available and finally to test the system out uh we are sort of adding some string formatting so let's run this and let's get responses to maybe a query which is on index one of our database okay so this was the question and it looks like that this is a follow-up question because it starts with thanks for the response so it means when input a question or query the query engine will fire llm call to check and so and so but let's look at the response as well one very personalized hi sushma yes when you input a question the query engine generated embeddings and fire llm calls and so on right so this is a very quick way to respond to queries provided that the response is accurate now usually you would not directly deploy such a system you would want to test test it extensively so I'm just going to show you this bunch of code but uh what we did while we were testing the system is we got the response from large number of our past queries right so uh We've printed the response that the agent gave out on some of the past queries and then we showed it to our internal query experts right so we showed them that this was the query and let's say this is the response which is generated by the agent is it accept or not so we took their feedback and then Incorporated that their feedback within the agent as well so it's a ongoing process and uh the process involves that you first create a solution then you evaluated it in detail once you're satisfied with the solution then you deploy it but you still keep on improving it as well so here are some of the suggestions that you can probably use to further improve such a system although there may be a a little bit specific to uh the problem that we have in hand over here and there you have it a fully functional rack system for Effective query resolution but this is not the end this rack system can be improved further you can explore different methods of chunking to find the better one you can also improve the retrieval through query enhancement we can also add image processing capability to answer queries with images in them we can test different approaches and select relevant documents based on various strategies we can also include other databases like past discussions which we have included in our system at analytics with as well that's it for this video do comment below other rag examples or use cases that you would want us to cover and finally please like and share for more such content
Original Description
GitHub Link - https://github.com/ApoorvV/RAG-for-Query-Resolution
Blog Link - https://www.analyticsvidhya.com/blog/2025/03/building-a-rag-based-query-resolution-system-with-langchain-and-crewai/
Learn how to build a fully functional Retrieval-Augmented Generation (RAG) system from scratch using Python in this step-by-step tutorial! Understand the core concepts behind RAG and see how to implement a practical query resolution system, similar to what we use at Analytics Vidhya.
RAG (Retrieval-Augmented Generation) combines the power of information retrieval with the text generation capabilities of Large Language Models (LLMs) to provide context-aware and accurate responses, allowing you to effectively "chat" with your documents.
In this video, you'll learn:
Timestamps:
0:00 Intro
0:08 What is RAG (Retrieval-Augmented Generation)?
0:45 RAG Use Cases
1:36 Simplified RAG Architecture Part 1
4:41 Phase 1: Code
12:24 Simplified RAG Architecture Part 2
13:09 Phase 2: Code
21:50 Ideas for Upgrading the System
22:41 Outro
Whether you're building internal knowledge bases, customer support bots, or research tools, this tutorial provides a solid foundation for developing your own RAG applications.
#RAG #RetrievalAugmentedGeneration #LLM #GenerativeAI #Python #LangChain #CrewAI #OpenAI #VectorDatabase #ChromaDB #AITutorial #AnalyticsVidhya
Like this video and subscribe to Analytics Vidhya for more tutorials on AI, Machine Learning, and Data Science! Let us know in the comments what other RAG use cases you'd like us to cover.
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Analytics Vidhya · Analytics Vidhya · 0 of 60
← Previous
Next →
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
The DataHour: Data Science in Retail
Analytics Vidhya
The DataHour: Anomaly detection using NLP and Predictive Modeling
Analytics Vidhya
The DataHour: Energy Data Science Project from Scratch
Analytics Vidhya
The DataHour: Explainable AI Need and Implementation
Analytics Vidhya
The DataHour: Google Cloud AI/ML
Analytics Vidhya
Prediction to Production in Machine Learning #machinelearning #prediction
Analytics Vidhya
Practical Applications of Data science in Ecommerce
Analytics Vidhya
How to tackle Overfitting?#machinelearning #overfitting
Analytics Vidhya
Building Data Pipelines on GCP #googlecloud #datapipelines #data
Analytics Vidhya
Hands-on with A/B Testing #abtesting #datascience
Analytics Vidhya
Efficient Implementations of Transformers #transformers #cnn #machinelearning
Analytics Vidhya
Modern Deep Learning Architecture #deeplearning #architecture #deeplearningtutorial
Analytics Vidhya
Key steps for Designing Artificial Neural Network (ANN) for Image classification #machinelearning
Analytics Vidhya
5 things you should know about Azure SQL #azure #sql #datahour #datascience
Analytics Vidhya
AI & ML in the Automotive Industry #machinelearning #ai
Analytics Vidhya
Building Machine Learning Models in BigQuery
Analytics Vidhya
NLP aspects in Telecommunication Industry
Analytics Vidhya
Practical Time Series Analysis
Analytics Vidhya
Fundamentals of Quantum Computing
Analytics Vidhya
A DAY IN THE LIFE of a Data Scientist (From waking up to working on algorithms)
Analytics Vidhya
Classification Machine Learning Model from Scratch
Analytics Vidhya
Knowledge Graph Solutions using Neo4j
Analytics Vidhya
Model Guesstimation (MLOps)
Analytics Vidhya
ETL Pipelines in Google Cloud Platform
Analytics Vidhya
Key steps for Designing Convolutional Neural Network(CNN) for Image Classification
Analytics Vidhya
Getting Started with AWS EC2 #amazon #aws
Analytics Vidhya
How to Use Azure NLP and Graph Databases for Intelligent Knowledge Mining
Analytics Vidhya
Certified AI & ML BlackBelt Plus Program #shorts
Analytics Vidhya
Visualizing Data using Python #machinelearning #visualization #python
Analytics Vidhya
DCNN for Machine RUL Prediction using Time-series Data #timeseries #machinelearning #datascience
Analytics Vidhya
M in ML stands for Math & Magic
Analytics Vidhya
An Unsupervised ML approach using Clustering
Analytics Vidhya
Customizing Large Language Models GPT3 for Real-life Use Cases #gpt3 #datascience
Analytics Vidhya
Model Parameters vs Hyperparameters - Techniques in ML Engineering #machinelearning
Analytics Vidhya
Practical MLOps #mlops #datascience
Analytics Vidhya
Data Engineering with Databricks #dataengineering #databricks
Analytics Vidhya
Multi-Objective Optimisation
Analytics Vidhya
When Airflow Meets Kubernetes
Analytics Vidhya
AI in Banking
Analytics Vidhya
Learn Convolutional Neural Network for Image Recognition
Analytics Vidhya
Extracting Value from Data
Analytics Vidhya
How to measure Marketing Channel Effectiveness
Analytics Vidhya
Transforming Lives | Data Science Immersive Bootcamp
Analytics Vidhya
Stock Market Analysis - AI driven approach
Analytics Vidhya
Become a Data Engineering Professional in 2022 | Future Trends + Skills Required
Analytics Vidhya
Ensemble Techniques in Machine Learning #machinelearning #ensemble #datascience
Analytics Vidhya
The Power of Visualization | Tableau Full Course | Analytics Vidhya
Analytics Vidhya
Demand for Data Engineers is on the Rise | Data Engineer | Analytics Vidhya
Analytics Vidhya
Data Visualization in Data Science | DataHour | Analytics Vidhya
Analytics Vidhya
Role of Optimization in Machine Learning & Deep Learning | DataHour | Analytics Vidhya
Analytics Vidhya
Solving any Machine Learning Problem | Approach and Steps Involved
Analytics Vidhya
Topic Modeling Explained with Implementation | Using LDA in Python | DataHour by Arpendu Ganguly
Analytics Vidhya
Data Engineering in E-Commerce | The Best Case Study
Analytics Vidhya
Introduction to Classification using Azure Machine Learning | DataHour | Analytics Vidhya
Analytics Vidhya
Introduction to Federated Learning | DataHour | Analytics Vidhya
Analytics Vidhya
Diffusion Models for Generative Arts | DataHour | Analytics Vidhya
Analytics Vidhya
Master Google Analytics in 1 Hour | DataHour | Analytics Vidhya
Analytics Vidhya
Learn Hypothesis Testing | DataHour | Analytics Vidhya
Analytics Vidhya
A Practical Approach to Kaggle Competition | DataHour | Analytics Vidhya
Analytics Vidhya
Making AI work for Business | DataHour | Analytics Vidhya
Analytics Vidhya
More on: LLM Foundations
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
Sub-10ms AI Workflows: Accelerating sim.ai with On-Device Semantic Search using Moss
Medium · Machine Learning
Stop Guessing: Guaranteed Structured Output from LLMs in Node.js
Dev.to · Hardik Mehta
Spring AI Tutorial — Your First REST Endpoint with OpenAI (2026)
Dev.to AI
Notes: Memory, Context, and Large Language Models (LLMs)
Dev.to · Vladimir Panov
Chapters (9)
Intro
0:08
What is RAG (Retrieval-Augmented Generation)?
0:45
RAG Use Cases
1:36
Simplified RAG Architecture Part 1
4:41
Phase 1: Code
12:24
Simplified RAG Architecture Part 2
13:09
Phase 2: Code
21:50
Ideas for Upgrading the System
22:41
Outro
🎓
Tutor Explanation
DeepCamp AI