Create Your Own ChatGPT with PDF Data in 5 Minutes (LangChain Tutorial)
Key Takeaways
This video tutorial demonstrates how to create a custom knowledge chat GPT using LangChain and PDF data, leveraging techniques such as retrieval augmented generation, fine-tuning, and vector databases. The tutorial covers the entire process, from chunking documents to embedding and querying, and finally to creating a conversational retrieval chain.
Full Transcript
in this video I'm going to be showing you the fastest and easiest way that you can create a custom knowledge chat GPT using Lang chain that's trained on your own data from your own PDFs I've seen a lot of different tutorials that have over complicated this a little bit so I thought I'd hop on and make a fast and to Theo version that you're able to copy and paste my code and get started with building these custom knowledge tools for your business and for your personal use as quickly as possible now if you're familiar with applications like chat PDF where you're able to drag and drop in a document and start chatting over it what we're going to be building today is essentially the exact same thing you're going to be able to take that functionality and put your own PDFs in and then use it for any purposes that you like but the best part about what I'm about to show you is that this method is going to give you complete flexibility and customization over how your app works and how the documents are processed now just quickly I'd like to plug my AI newsletter which is launched recently now if you want to get all of the hottest and latest AI news distilled down to a quick 5 minute read and delivered to your email then be sure to head down below and sign up to that firstly we'll be going through a very very brief explainer on how these systems work and the different part involved so that you can understand what we're building here and how it all works and then secondly we're going to be jumping straight into the notebook that I've created for this video that you're going to be able to copy and paste over to your projects and just change the name of the PDF okay guys here's a quick visualization of how this is actually working under the hood so this is the system we are creating using Lang chain which is essentially going to take in our documents chunk it embid it put it in evect the database and then allow users to query it and get answers back so I'll take us through the step by step now so the first step here is to take a document and split it into smaller pieces pieces now this is done because when we are recalling it and querying the database in order to get an answer based on the document we need to receive a bunch of smaller chunks that are relevant to the user's query and not just the entire mass of information so step one here is to chunk it we're going to be doing it in 512 tokens or less so we're going to chunk our document down into however many chunks needed in order to get below those 512 tokens per piece and then what we're going to do is take the chunks and embed each one of them one by one so we're using the add 002 model by open AI which is by far one of the best embedding models available right now then we're going to be able to take all of these different embeddings for each chunk and put them into a vctor database so that they're ready for recall when the user queries then the final step is to allow users to actually query the database so we do this by taking in the user's query we put it through the exact same embedding model that we do over here and then you query the database based on the embeddings of the user query so we get back a number of documents that are most similar to what the user is speaking about and then we'll also able to pass that around to a large language model and include it in the context so we take also the user's query and then the match documents combine them together and ask the language model hey can you answer this question given this context and then we're able to send the answer back to the user so that's a very quick high level overview of how these applications work now we can jump straight into building it at the top here we've got a summary of all the different steps we're going to be going over so you can take a look at that but we can jump straight into these installs and imports now I've simplified it all down so you guys can just run these sales as you go through so you can run that you need to run this sale here which is going to install all the packages my API key is already set up you need to replace this with your API key and once those all installed you're ready to get started now for the purposes of my chat bot in this video I'm going to be using attention is all you need which is the Transformers research paper that was done by Google so I thought it' be interesting to use this within the chat bot so here we can see I'm using it here attention is all you need pdf all you need to do is come if you're using a different document when you clone this notebook you can go over to the left side panel here and drag in your document and upload it once you youve got it uploaded you can come back and change the name here so replace this with the name of your PDF and then you're ready to go the first main step we have is loading the PDFs and chunking the data with Lang chain so we've got two different methods here that I wanted to show you one is the very easy and straightforward version that Lang chain offers which is just using this uh simple page loader using pi PDF loader and that's just going to take the PDF that you've given it it's going to chop it into pages and then you're going to get all of those pages as documents ready to use in your in your system now this method is great if you're doing a quick test but I thought I'd show a more advanced method which is going to be splitting up your documents into roughly similar size chunks now there are a number of different factors that go into creating a custom knowledge chatbot system like this and the chunk size is actually one of those and it can determine a lot in terms of the quality of the output so this script we have here is going to allow you to split it by chunk and you can actually set the size of the chunks here so I've got it at 512 at the moment with our overlap of 24 now the first step in this Advanced chunking method is to use text track and text track is going to extract all of the information out of the PDF and save it to this stop and then second we're going to need to save it as a text file and then reopen that text file now this is just to get around some issues that uh can frequently come up depending on the documents you use so we uh save it to a new text file and then we reopen that text file and then you need to actually create a function that allows you to count the number of tokens so here you can see I've used a gpt2 tokenizer and then we've just made this little function uh count tokens this is going to take in some uh some text in the form of a string and it's going to return the number of tokens so this uh tokenizer here actually counts the number of tokens and then finally we create text splitter which is this Lang chain uh type called recursive character text splitter takes in a chunk size which is variable as I mentioned and then we need to put in the length function which we've just created here so final step is going to be uh creating the chunks objects by passing in the text that we got up here and we've opened up from our text file passing it into the create documents function and then that's going to create all of the chunks uh in type Lang change schema document now one quick best practice that I want to show you guys is actually to do a quick visualization of the uh distribution of the chunks to make sure that this chunking processor has done it correctly it's done it to the correct size that we've mentioned so if you just run the cell you don't need to know the specifics of it but this essentially shows you the distribution of these different chunks so we've got a couple that are over the limit but that comes down to this recursive splitter so on the most part we don't have anything that are thousands and thousands of tokens they're all roughly within the range that we wanted and then we need to create our Victor database which gain Lang chain made super simple with this FIS package and we're going to take in the chunks that we created and also this embedding model and then it's going to embed all of that store it in the vector database and then we're going to get this DB uh variable back out again Lang chain makes this super simple we just need to set up our query which is who created Transformers and then all we need to do is run a similarity search on the database using the query and then we're going to get that back so and there we go so if you put this little bit in here at the bottom which is L docs you can actually see that this is based on this query it's actually pulling back four different uh chunks that match the query so uh that's going to give you an idea of how much context is actually being grabbed from the vector database with each query then we essentially take that functionality that we've just created and combine it with a lang chain chain which is going to take in a query so we can do the same thing who created Transformers we're going to retrieve the docs and then we're going to run a chain and that's going to take in the query and the docs and then it's going to give us an output so that is combining the context that's being retrieved from the similarity search with the query and then answering it as you'd expect it to so if we run this who created Transformers it's going to do that similarity search bring in the documents then also take the user query and then say okay let's run a language model on this one of open ai's language models to answer the question and here we have the answer now I thought I'd throw in a little extra goodie for you guys here which is to convert this functionality into an actual chat bot so I get this a lot in my videos like yeah you showed us the functionality but how can I actually use this in some kind of chat bot so this is just a quick one that I've whipped up if we run this this is going to be using another Lang chain uh component which is going to be this conversational retrieval chain which takes in a u language model and it's going to take the database that we created and use that as a retriever function so um you don't need to know too much about it but just run the cell and then here is a little chatbot Loop that's going to allow us to interact with this uh knowledge base in a chat format so here I can go who created Transformers and there we have it it started to answer us were they [Music] smart we have a custom knowledge chatbot using Lang chain it takes in your own PDFs chunks them up embeds them creates a vector store and then allows you to retrieve those and answer questions based on that information and this does have chat memory included into it as you can see here who who created Transformers gives a name were they smart I don't know so here you can see that the chat memory is actually working you have a customized chatbot with chat memory that about wraps it up for the video guys thank you so much for watching all of this code is going to be available in the description for you to clone this notebook change the PDF out and start to use it for your own purposes now if you've enjoyed this video and want to see more content like this be sure to hit down below and subscribe to the channel I'm posting tutorials like this all the time and if you've enjoyed the video please leave me a like it would mean the world to me now as always if this has lit up some light bulbs in your head and you want to have a chat to me as a consultant you can book and a call with me in the description and in the pin comment so if you want to see some feasibility reports or talk through an idea with me you can reach me there and I also have my own AI development company so if you want to build something out like this but on a bigger scale for your business or for personal use then you can have a chat to me as a consultant and we can see if we can help you get that built and finally in the description and pin comment there also links to join my AI entrepreneurship Discord and to sign up to my AI newsletter which is all available down there so that's all for the video guys thank you so much for watching and I'll see you in the next one
Original Description
📚 Join the #1 community for AI entrepreneurs and connect with 200,000+ members: https://bit.ly/skool-ov
📈 We help industry experts, entrepreneurs & developers build and scale their AI Agency: https://bit.ly/aaa-accelerator-ov
🤝 Need AI Solutions Built? Work with me: https://bit.ly/morningside-ai-ov
⚒️ Build AI Agents Without Coding: https://agentivehub.com/
🚀 Apply to Join My Team at Morningside AI: https://bit.ly/ms-youtube-lo
🚀 Apply to Join My Team at AAA Accelerator: https://bit.ly/aaa-youtube-lo
My Vlog/BTS Channel: https://bit.ly/LiamOttleyVlogs
In this video I show you how to train ChatGPT on your own data in 5 minutes using LangChain so you can chat with your PDFs! This is a super beginner friendly guide that explains how these custom knowledge chatbots can be created in a few minutes using LangChain. This is similar to tools like ChatPDF which allow you to chat to your docs (https://chatpdf.com/).
If you've ever wanted to know how to chat with your PDFs or train ChatGPT on your own data, this is the video for you! Code available below.
Create a copy of my notebook (code):
https://colab.research.google.com/drive/1OZpmLgd5D_qmjTnL5AsD1_ZJDdb7LQZI?usp=sharing
Timestamps:
0:00 - What we're building
1:10 - System Explained
2:48 - Creating the chatbot
8:18 - Steal my code!
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Liam Ottley · Liam Ottley · 28 of 60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
▶
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
7 Best Businesses To Start With ChatGPT in 2023 [Early Mover Advantage]
Liam Ottley
The AI Bubble is Here: How to Profit Before It Pops
Liam Ottley
Building an AI Startup in 8 Minutes with ChatGPT [No-Code Method]
Liam Ottley
5 Best AI Powered Business Models for 2023 [Build & SCALE]
Liam Ottley
7 AI Tools for Entrepreneurs to Crush 2023 (10x Productivity)
Liam Ottley
Why 95% of Marketers Will Not Survive the AI Revolution
Liam Ottley
How to Fine Tune GPT3 | Beginner's Guide to Building Businesses w/ GPT-3
Liam Ottley
I Tried Selling AI Art on Etsy for 14 Days | Etsy Print on Demand [REAL RESULTS]
Liam Ottley
Copy.ai Founder Reveals His SECRETS to AI Startup Success | Founder Chat w/ Paul Yacoubian
Liam Ottley
Master Prompt Engineering (Full Guide)
Liam Ottley
5-Step AI Crash Course for Entrepreneurs | 12 Months in 12 Minutes
Liam Ottley
How to Create Your First AI Tool With ZERO Code - A Step-by-Step Guide
Liam Ottley
Mixo.io's REVOLUTIONARY Text-to-Website AI Explained | Founder Chat w/ Adam Arbolino
Liam Ottley
4 Easy Ways to Launch Your First AI App TODAY
Liam Ottley
The REAL Reason Your GPT-3 Fine Tunes Suck (and how to fix them)
Liam Ottley
This Simple AI Startup Idea Just Raised $5.3M (Copy This NOW)
Liam Ottley
How to Build a Custom Knowledge ChatGPT Clone in 5 Minutes
Liam Ottley
Build Your Own AI Chatbot: ChatGPT API for Beginners [FULL TUTORIAL]
Liam Ottley
GPT-4 Honest Review: Almost Perfect!
Liam Ottley
How To Use ChatGPT Plugins To Get RICH! [Early Mover Advantage]
Liam Ottley
ChatGPT Plugins: 10+ Untapped Business Ideas (Steal These NOW)
Liam Ottley
GPT-4: Marcus Aurelius Life Coach | Custom Knowledge Base Chatbot (100+ Pages)
Liam Ottley
How to Create Your First ChatGPT Plugin with ChatGPT (Step-by-Step Guide)
Liam Ottley
The ULTIMATE Advanced Guide to Prompt Engineering with GPT-4 | AI Core Skills
Liam Ottley
How to Train ChatGPT on Your Data and Chat via Slack (Chat w/ PDF, Docs + MORE)
Liam Ottley
This INSANE Prompt Builds ChatGPT Plugins in Minutes
Liam Ottley
How to Create LOCAL Chatbots with GPT4All and LangChain [Full Guide]
Liam Ottley
Create Your Own ChatGPT with PDF Data in 5 Minutes (LangChain Tutorial)
Liam Ottley
How to Chat with HUNDREDS of Files (500+ Hormozi Podcasts)
Liam Ottley
PrivateGPT: Chat to Your PDFs Offline and for FREE in Minutes (Full Tutorial)
Liam Ottley
Hybrid Chatbots: How to Chat with Multiple Data Sources (Pinecone, ChatGPT & More)
Liam Ottley
Making $45,000 Profit in 30 Days with AI (What I Learned)
Liam Ottley
How to Service Your First AI Automation Agency Client ($3000 EACH)
Liam Ottley
Why AI Automation Agencies are Obviously The Next Big Thing
Liam Ottley
The BEST Niches for AI Automation Agencies
Liam Ottley
Building a Client's $3,500 AI Chatbot LIVE
Liam Ottley
5 Best AI Automation Agency Services to Sell as a Beginner
Liam Ottley
Best AI Business to Start if You’re BROKE
Liam Ottley
How to Start an AI Business | STEP BY STEP
Liam Ottley
Complete Guide to AI Automation Agency Monthly Retainers
Liam Ottley
Building a $5,000 AI Persona Chatbot LIVE
Liam Ottley
Watch Me Build Chatbots That Make Money!
Liam Ottley
Thailand Q&A: Do I Have Daddy's Money & MORE
Liam Ottley
7 NEW AI Automation Services to Sell as a Beginner (2023)
Liam Ottley
My Journey From $0 to $160,000 Per Month (AI Entrepreneur)
Liam Ottley
The Massive Opportunity in Building AI Businesses | Alex Hormozi
Liam Ottley
ALEX HORMOZI on How To Make A Fortune With AI
Liam Ottley
Why You Can't Sign Your First Al Automation Agency Client
Liam Ottley
He Made $12,000 in Four Weeks Selling AI Solutions in FB Groups
Liam Ottley
Learnings From Making $57,000 w/ AI Business After 42 Failures
Liam Ottley
I Transformed an Email Marketing Agency with AI
Liam Ottley
OpenAI GPTs Just Killed AI Automation Agencies (My New Plan)
Liam Ottley
How to Create Custom GPTs in 5 Minutes (OpenAI GPTs Tutorial for Beginners)
Liam Ottley
How to Add Custom GPTs to Any Website in Minutes (OpenAI GPTs Tutorial)
Liam Ottley
3 Ways to Make Money With OpenAI GPTs in 2024
Liam Ottley
How to Create Advanced GPTs For Your Website (Custom Actions w/ Assistants API)
Liam Ottley
How to Add Custom GPTs to Instagram DMs (OpenAI GPTs Tutorial)
Liam Ottley
How to Add Custom GPTs to Whatsapp (OpenAI GPTs Tutorial)
Liam Ottley
What Sam Altman’s Firing Means for AI Businesses (trouble ahead…)
Liam Ottley
How to Export Transcripts From Custom GPTs (OpenAI GPTs Tutorial)
Liam Ottley
More on: LLM Foundations
View skill →Related Reads
📰
📰
📰
📰
Lost in the Cheese Aisle? Here’s How AI Can Identify Any Cheese From a Photo
Medium · Startup
Three ranking currencies and zero overlap: what 2025 Juejin AI roundups actually disagree about
Dev.to AI
How to Use Poe for Case Studies in 2026
Dev.to AI
10 Ways to Make Money Using AI Tools in 2026
Medium · AI
Chapters (4)
What we're building
1:10
System Explained
2:48
Creating the chatbot
8:18
Steal my code!
🎓
Tutor Explanation
DeepCamp AI