Wikipedia RAG System in Python - Beginner Tutorial with LlamaIndex
Key Takeaways
The video demonstrates how to build a Wikipedia RAG system in Python using LlamaIndex and Streamlit, providing a beginner-friendly tutorial on the implementation of RAG systems.
Full Transcript
What is going on guys? Welcome back. In this video today, we're going to build a Wikipedia powered rack system in Python, which basically means we're going to build an application that we can ask questions to. And the answer is going to be based on context from Wikipedia. It's going to do retrieval augmented generation. So, it's going to retrieve the most important Wikipedia articles from the ones that we provide as uh a database, so to say, and then it's going to answer our questions based on the context it retrieves. We're going to do all of this with llama index today with very few lines of code and we're going to build a streamlit web application around it. So let us get right into it. All right. So we're going to implement a rack system in Python today based on Wikipedia as its context source. And we're going to do all of that in llama index which means it's going to be very simple, very easy and very beginner friendly because in llama index we don't have to specify anything manually. We don't need to do anything manually. We just need to say what we want to have done and then basically llama index does it for us behind the scenes and then we can also build easily a streamlit application around it to make it accessible in the browser. That is what we're going to do in this video today. Now for the process itself, it's quite simple. We select a couple of Wikipedia articles that we're interested in that we want to include as our knowledge base so to say and we download them. We embed them into vector space. This means we use an embedding model that takes the text and turns it into a highdimensional vector. And the idea is that if the model does it properly, vectors that are closer together in uh this vector space are then also more similar, which means that we can also embed questions or user queries and then find the most relevant articles uh in the most relevant context and then we can retrieve that and answer the question based on the context. But as I said, we don't need to do anything uh like this oursel. We can just define that we want this to happen and then Llama index is going to do all of it for us. So let's get started with the installation of the packages. We're going to use either pip or uv. So in my case I'm just going to use pip pip or pip 3 install. And then we're going to install streamlit. We're going to install um llama index. So llama-index like this. Now optionally you can also install here uh python-.en. The reason you might want to install this is because you want to load the API key of OpenAI or the OpenAI API key into your program uh from AEN file because you don't want to have it clear text in the code or you want to load it from the environment in general. This is what I'm going to do since I don't want to show my API key in the video. So, I'm going to use that package. And besides that, I'm not going to actually type it out now. But what you can also do or what you also have to do is you have to install a bunch of these dependencies that are related to llama index. So, I'm going to show you them by showing you my installed packages and graphing the Llama packages. So, I think we're going to definitely need Llama index embeddings open AAI. We're going to need Llama index LLM's OpenAI. We're going to need Llama index readers Wikipedia. So, some of them might be installed automatically or with certain uh tags in square brackets if you choose the correct ones, but we're definitely going to need these. So, we're going to need the Wikipedia reader. We're going to need the LM OpenAI and the embeddings OpenAI. In addition to that, you also need an OpenAI API key. So, just go to OpenAI to the developer platform. You go to settings, you go to API keys, and then you generate a new API key. You copy it and you write it either into your code directly or into a file. In my case here, I have this N file. What it looks like is I'm not going to show it, but I'm going to open up a new file just called enth. The basic idea is I have here one line which says openai_key or ai_key actually is equal to and then whatever the key is. That is what the file looks like. And we're going to load this file using the package. So let us open up a main.py file. And here we're going to write our simple application. We're going to start by importing OS since we want to be able to retrieve the environment variables. We want to also say uh import streamllet as st. Then from we want to import the load.n function. This is what I said. We're going to load the API key from the end file instead of specifying it manually. And then what we're going to do is we're going to say from llama_index.lms.openai we want to import the open AI class. Then from llama index dot embeddings. OpenAI we want to import the OpenAI embedding class. Then from llama index readers dot Wikipedia we want to import Wikipedia reader. And then finally from llama index.core we want to import vector store index storage context and load index from storage. So again to reiterate this we have this to actually get an answer from an LLM in the first place. So this is going to be our LLM answering the questions. This is going to be the embedding model that is going to take the Wikipedia articles and also our queries and embed them into vector space. This is going to be our reader for Wikipedia so that we can actually get the text of the articles. And here we're going to work with the vector store so that we can export and import it. That's the basic idea here. And now I'm going to start by just loading the API key into the environment by calling load.inf. Then we're going to define a constant that is going to be the directory of our uh vector store once it's done. So I'm going to just call this here index directory and uh let's call it I don't know wiki_rack or something like this. Now the next step is depending on your use case. So what we're going to say here is we're going to say pages is equal and then we're going to have a list of strings and these strings are the titles of the Wikipedia articles you want to use for your knowledge base. So if you want to do the same thing that I'm going to do here AI and machine learning related stuff you can just do that. Otherwise you can enter anything you want every country on earth every economic metric or every war in history or something like that. Whatever your rack system is about, you can just provide the pages that you want to use as context here. So I'm going to copy paste here from my prepared code the pages and this is what it looks like when it's done. So whatever pages you chose, you just have them here in a list. So this means it's going to pull all these pages from Wikipedia and it's going to be able to retrieve them as context. Which means if I ask a question about convolutional neural networks, it's not just going to answer from its knowledge. It's not just going to answer from uh the weights and biases of the model. So from the encoded intelligence, it's going to actually retrieve the article uh from Wikipedia and then base the answer on that context. That is the idea here. Now maybe to not uh use too much time here, let's just comment out a couple of them. So let's maybe do something like this. So they are loaded faster. Uh just so we get a a quicker result. And the rest of the code is also very very simple. We just need to define two cacheed resources. The first one is creating the index and getting the index. So we want to have a function that is just returning the index that we want to use as the context base so to say as the knowledge base. And for this we're going to say here streamlit dot cache resource which just means it's going to be a cached resource. And then we're going to call this function get_index. So this function is going to return our index which means that if it doesn't exist it needs to create it. If it does exist it needs to load and return it. So we're going to have these two cases here. We're going to say if os.path is directory if that is the case for the path that we provided. So for for the index directory if that is the case just load it and return it. We're going to do that by saying storage is equal to storage context from defaults persist directory. So persist dear is going to be equal to uh index directory and then we return uh load index from storage. So the basic idea is we get a storage context and then we need to load the index from that storage context which is going to return an index object. And if we don't have that, we need to create it, which means we're going to get the documents, so the individual pages from Wikipedia. We're going to do that by saying Wikipedia reader uh load data. So we create an instance of the Wikipedia reader. We call the load data function and we say uh the pages are the pages. So it's going to use these pages here as input. Um, and we're going to say auto suggest equals false. So, this is going to retrieve the documents from Wikipedia. And now we want to embed them into vector space. So, for this we need an embedding model. We're going to call this uh embedding_model is equal to. And now we're going to use open AAI. For this, of course, your API key needs to be present. So, you say open AI embedding. And now you have to choose a model that is available. You can just go to OpenAI um embedding models. So you can just Google OpenAI embedding models and you're going to see a list and then you just need to enter the identifier. I'm going to use here text- embedding-3- small. So this is going to be uh yeah a small model. You can also get a larger one if you want to if you want to have better performance here when it comes to the embeddings. And for this as I said the API key needs to be present. If it's present in the environment it's just going to recognize that. So you can either load it like this or you can just export open AAI API key um equals whatever the API key is. This is also going to be fine or you just uh do something like uh OS.environ and then you say manually here openai API_key is equal to whatever the API key is. This is also an option here. Uh but basically you create the embedding model and then we just use it. So we say index is equal to vector store index. So here now uh we create an index directly. Here we use a storage context and we load the index from the storage context. Here now we create a vector store index from documents and the documents are retrieved by the Wikipedia reader. So as you can see this is what I mean by llama index is extremely simple to use. You don't need to handle any data processing. Everything's already done for you in the proper format behind the scenes. I just need to define look I have a Wikipedia reader I have an embedding model I want to have a vector store index take the documents from the Wikipedia reader it's already in the correct format I don't need to do anything myself it's just handled for me and I have to provide here the embedding model is equal to embedding model so this is going to build an index based on the documents and for the embeddings it's going to use the embedding model and then we just need to save that index dots storage context. So this time we do it the other way around. We get the storage context from the index and we persist that storage context uh onto disk at the uh index directory and we return of course the index. So this is going to be the function that we use to get the index whether we have to create it the first time or whether we can just load it from disk. The second function we're going to use is also going to be a cached resource and it's going to be called uh get query engine. So another very abstract thing in llama index. We don't need to do anything ourselves here. We just need to define an LLM. We want to use GPT4 mini for example and use it as a query index. Um that's it. So super simple. We don't need to do any embeddings ourselves. We don't need to do any prompting oursel. All we have to do is we have to say index is equal to get index. So this uses the function from above to get the index. And then all we need to do is we need to say lm is equal to open ai. I want to use the model that is called gpt-40- mini with a temperature of zero. I want it to be very strict. And then I just say return index as query engine lm is equal to llm and similarity top k. This means uh similarity top_k. This means how many uh similar or how many relevant items it's going to retrieve. I'm going to go with a top three items as context here. And that is all we have to do here. We now just need to build a user interface around using this. But that is the whole magic. We define data that we want to have up here in the pages list. We say we want to retrieve them from Wikipedia. We want to embed them using this model. We want to create a vector store. And for this vector store, now we want to do querying the top three most relevant items using this lm. Done. That's it. This is how you do it with llama index. Very simple. Now let's go ahead and build the user interface. Let's say defaf main. And now we're going to say st title. This is now just styling stuff. We're going to call this Wikipedia rag application or something like this. Uh then we're going to say here question. This is going to be our prompt is st text input. And here now we're going to say ask a question and then we're going to say if the button is clicked. So we're going to define a button um st button which we going going to call which we're going to give the text submit to and if we submit and there's also a question if that happens. So this is how you write it for streamllet. You say if button and question which means that if the button is clicked and the question is not empty what we're going to do is we're going to say with st spinner we're going to so show a spinner animation which means it's loading. Um, and we're going to say here thinking while this is happening, we're going to see that spinner with thinking. And then we just say QA is equal to get query engine. So this is now how we retrieve that query engine here. Uh, and then we just say response is equal to QA. Query question. That's that. And also uh after the thinking is done. So after we receive the response what we do is we say st subheader answer and then we say st write response dotresponse. This is going to show the answer to our question. So just the direct answer. And then we're going to also say st subheader. And we're going to say here uh retrieved context because we want to see what has actually been used as context here. And I'm going to say now for source in response. Notes we want to say st markdown source node.getcontent. And that is our application. That's it. Maybe the only thing left here would be if name underscore or underscore name is equal to underscore main then we want to call main. But we're going to run this with streamlit run anyway. So that is the whole magic. This is all we need to do for our rack system. Again reiterating one more time. We define pages that we're interested in. We load the API key of course first. uh we then say if the index already exists then load it otherwise create it by getting the pages from Wikipedia embedding them with this model into an index into a vector store then return that and also save it to disk so that we can load it the next time and then we just say get this index attach the GPT40 mini model to it and when I ask a question retrieve the three most relevant articles answer the question and display it here in streamllet that's it now in order to run this we just need to exit the editor and we need to say streamlit run main.py. That's it. So I can now open this or it automatically opens in the browser. Let me move my terminal. Takes some time and now I can ask a question. What were the pages that I was using? Um I was using for example convolutional neural networks. So what can you tell me about CNN question mark or CNN's question mark? Submit this. Now you can see here thinking running get query engine which means now it's downloading everything and of course we have a problem. What is that? I need to of course say that this is the model I want to use. This is a keyword argument not a positional argument. So let's close this and run this again. And now we can type the same question again. What can you tell me about CNN's question mark? So thinking getting the query engine which means it downloads now all the articles and once it has done that it's going to ask the question and give us the answer and the context. So there we have it answer convolutional neural networks. CNN's are a specialized type of feed forward neural network designed to and so on and so forth. And down below we can see here retrieved context. This retrieved context here is a convolutional neural network. CNN is a type of feed forward neural network. This is the Wikipedia page. This is the text from the Wikipedia page. And I think that the LLM would also be able to answer this without the context, but not always. In this case, yes, because CNN's are well-known architectures, but sometimes it's going to be more difficult. Sometimes it's going to be more niche. And then it makes sense to retrieve the context. And also by retrieving the context, you make sure that you actually know what you're talking about. Now, we can actually see that this probably also works for RNNs. So, if I say, what can you tell me about RNNs? actually it might now be confused and say I don't know about that or it's going to retrieve the wrong context and provide that but in this case now here okay in this case what it does is I think this is still relevant yeah this this is still relevant because now it retrieved the the context it retrieved was artificial neural networks which means the neural network page uh so there is the information also about LSTMs and RNN so actually it retrieved the proper context text. Let's try something else. What can you tell me about XLSTM? So, I think it doesn't know that. It's now probably going to retrieve the neural network page. Oh, there you go. Perfect. It's actually behaving exactly as it should. It tells us there is no specific information provided about XLSTMs in the context. However, LSTMs are mentioned and so on. And then it retrieves the context for LSTMs, which is exactly what we expect. So now if I ask something like what is I'm not sure what happens when I say what is the capital of France because in this case it should know the answer is Paris but hopefully it tells me that okay now it knows it because it's very simple information but if I ask something else probably it's not going to be able to do that. So what is the difference between P and NP? I think that if it knows the answer because it's very easy, it will just give me the answer or actually not which is better. I think this is the better approach. If it doesn't know that it does doesn't answer the query unless it's something super simple like what's the capital of uh France? But yeah, that's perfect actually. And this is how easily you can build that with llama index in Python. So that's it for today's video. I hope you enjoyed it and hope you learned something. If so, let me know by hitting a like button and leaving a comment in the comment section down below. Oh, and of course, don't forget to subscribe to this channel and hit the notification bell to not miss a single future video for free. Other than that, thank much for watching. See you in the next video and bye.
Original Description
In this video, we learn how to easily build a RAG system based on Wikipedia in Python. For this we use LlamaIndex and Streamlit.
◾◾◾◾◾◾◾◾◾◾◾◾◾◾◾◾◾
📚 Programming Books & Merch 📚
🐍 The Python Bible Book: https://www.neuralnine.com/books/
💻 The Algorithm Bible Book: https://www.neuralnine.com/books/
👕 Programming Merch: https://www.neuralnine.com/shop
💼 Services 💼
💻 Freelancing & Tutoring: https://www.neuralnine.com/services
🖥️ Setup & Gear 🖥️: https://neuralnine.com/extras/
🌐 Social Media & Contact 🌐
📱 Website: https://www.neuralnine.com/
📷 Instagram: https://www.instagram.com/neuralnine
🐦 Twitter: https://twitter.com/neuralnine
🤵 LinkedIn: https://www.linkedin.com/company/neuralnine/
📁 GitHub: https://github.com/NeuralNine
🎙 Discord: https://discord.gg/JU4xr8U3dm
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from NeuralNine · NeuralNine · 0 of 60
← Previous
Next →
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Visualizing Stock Data With Candlestick Charts in Python
NeuralNine
Python Beginner Tutorial #1 - Installation and First Program
NeuralNine
Python Beginner Tutorial #2 - Variables and Data Types
NeuralNine
Python Beginner Tutorial #3 - Operators and User Input
NeuralNine
Python Beginner Tutorial #4 - If Statements and Conditions
NeuralNine
Python Beginner Tutorial #5 - Loops
NeuralNine
Python Beginner Tutorial #6 - Sequences and Collections
NeuralNine
Python Beginner Tutorial #7 - Functions
NeuralNine
Python Beginner Tutorial #8 - Exception Handling
NeuralNine
Python Beginner Tutorial #9 - File Operations
NeuralNine
Python Beginner Tutorial #10 - String Functions
NeuralNine
Python Intermediate Tutorial #1 - Classes and Objects
NeuralNine
Python Intermediate Tutorial #2 - Inheritance
NeuralNine
Python Intermediate Tutorial #3 - Multithreading
NeuralNine
Python Intermediate Tutorial #4 - Synchronizing Threads
NeuralNine
Python Intermediate Tutorial #5 - Events and Daemon Threads
NeuralNine
Python Intermediate Tutorial #6 - Queues
NeuralNine
Python Intermediate Tutorial #7 - Sockets and Network Programming
NeuralNine
Python Intermediate Tutorial #8 - Database Programming
NeuralNine
Python Intermediate Tutorial #9 - Recursion
NeuralNine
Python Intermediate Tutorial #10 - XML Processing
NeuralNine
Python Intermediate Tutorial #11 - Logging
NeuralNine
Python Data Science Tutorial #1 - Anaconda and PyCharm Setup
NeuralNine
Python Data Science Tutorial #2 - NumPy Arrays
NeuralNine
Python Data Science Tutorial #3 - Numpy Functions
NeuralNine
Python Data Science Tutorial #4 - Plotting Functions With Matplotlib
NeuralNine
Python Data Science Tutorial #5 - Subplots and Multiple Windows
NeuralNine
Python Data Science Tutorial #6 - Matplotlib Styling
NeuralNine
Python Data Science Tutorial #7 - Bar Charts with Matplotlib
NeuralNine
Python Data Science Tutorial #8 - Pie Charts with Matplotlib
NeuralNine
Python Data Science Tutorial #9 - Plotting Histograms with Matplotlib
NeuralNine
Python Data Science Tutorial #10 - Scatter Plots with Matplotlib
NeuralNine
Python Data Science Tutorial #11 - 3D Plotting with Matplotlib
NeuralNine
Python Data Science Tutorial #12 - Pandas Series
NeuralNine
Python Data Science Tutorial #13 - Pandas Data Frames
NeuralNine
Python Data Science Tutorial #14 - Pandas Statistics
NeuralNine
Python Data Science Tutorial #15 - Pandas Sorting and Functions
NeuralNine
Python Data Science Tutorial #16 - Pandas Merging Data Frames
NeuralNine
Python Data Science Tutorial #17 - Pandas Queries
NeuralNine
Python Machine Learning Tutorial #1 - What is Machine Learning?
NeuralNine
Python Machine Learning Tutorial #2 - Linear Regression
NeuralNine
Python Machine Learning Tutorial #3 - K-Nearest Neighbors Classification
NeuralNine
Python Machine Learning #4 - Support Vector Machines
NeuralNine
Python Machine Learning Tutorial #5 - Decision Trees and Random Forest Classification
NeuralNine
Python Machine Learning Tutorial #6 - K-Means Clustering
NeuralNine
Python Machine Learning Tutorial #7 - Neural Networks
NeuralNine
Python Machine Learning Tutorial #8 - Handwritten Digit Recognition with Tensorflow
NeuralNine
Generating Poetic Texts with Recurrent Neural Networks in Python
NeuralNine
Stock Portfolio Visualization with Matplotlib in Python
NeuralNine
Analyzing Coronavirus with Python (COVID-19)
NeuralNine
Making Text Images Readable Again with Python and OpenCV
NeuralNine
Neural Networks Simply Explained (Theory)
NeuralNine
Motion Filtering with OpenCV in Python
NeuralNine
Top 5 Programming Languages To Learn in 2020
NeuralNine
Simple TCP Chat Room in Python
NeuralNine
Image Classification with Neural Networks in Python
NeuralNine
Edge Detection with OpenCV in Python
NeuralNine
S&P 500 Web Scraping with Python
NeuralNine
Simple Sentiment Text Analysis in Python
NeuralNine
Introduction - Algorithms & Data Structures #1
NeuralNine
More on: RAG Basics
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
I Built a Free AI-Powered YouTube SEO Toolkit With Zero Budget. Here’s What Actually Happened.
Medium · Startup
How to Create a Second Version of Yourself Inside Obsidian Using AI (Step-by-Step Guide)
Medium · ChatGPT
How to prepare for Spain civil service TIC exam using AI in 2026
Dev.to · David García
Going Viral! How I Created AI Kissing Videos Step by Step Easily Using AIAI.com
Medium · AI
🎓
Tutor Explanation
DeepCamp AI