LangChain Retrieval QA with Instructor Embeddings & ChromaDB for PDFs
Key Takeaways
This video demonstrates the use of ChromaDB, Instructor Embeddings, and LangChain for retrieval augmented generation (RAG) search with multi-PDF handling, utilizing GPU for faster processing and incorporating custom vector representations.
Full Transcript
alright in this video we're going to continue looking at the multi-dock retriever we're still going to be using chroma DB for our database for our Vector store but the big thing that we're going to add in this one is we're going to add in embeddings that are actually running locally so to do this first off we need to have a GPU or it's ideal to have a GPU running so I've got just a T4 here not using a super powerful GPU you could run this on the CPU it's just going to take a fair bit more time to do this so that you'll see that I'm bringing in the same stuff we actually don't need at anymore that's the two new ones we're going to bring in the instructor embedding which I'll talk about in a sec and basically just the hugging face for using here so another difference I made in this one is a lot of people asking about PDF files multiple PDF files so I swapped out the text files for doing multiple PDF files in here and actually if we have a look in here you'll see that what I've done is just put in some papers so these are just some papers from archive about react to a former flash attention Alibi so just some stuff around the the topics that we've been looking at in the large language models recently the splitting and stuff like that is all the same so we've got you know basically we're just bringing it in we're just using the simple Pi PDF loader in this case bring things in and then the next key thing is we just get to the embedding so there's two ways of doing the embeddings you can use just the normal hugging face embeddings so this is using things like sentence Transformers and there's a whole bunch of different models around that they vary in degrees of quality and a lot of it will also depend on your data as well which ones sort of match this so an example of just using a standard sentence Transformer would be this one so this is one of that used to be one of the top models for doing this but when my testing I actually came across that a newer model that seems to be doing better so I decided to go with that and the new model that I'm going with is the instructor embeddings so I think these kind of deserve a whole video to themselves to explain the paper and stuff like that the idea here is that these are custom embeddings depending on what it is uh that you're using them for in this case though we're just using the instruction embeddings and we're using the Excel variety of this so we bring these basically into Lang Che there you can see that we're going to run them locally so it's downloading the model it's downloading all the files for this we're actually telling it here that we're going to put it on the GPU so this is what device Cuda is here um if you wanted to run them locally you could put it device CPU for doing that it's definitely going to make it a lot slower and you'll see it's going to basically load these up and bring them in and by default these are operating at a sequence length of 512 which is fine for the splitting that we're doing of a thousand characters that should be fine in this case okay once we've got the embedding set up we're then going to need to make our Vector store here so this is all exactly the same as the last video we're basically just passing in the new embeddings here so we're not using open AI embeddings anymore okay once we've got the embedding set up we're now going to basically just go along with what we were doing before so we need to set up our Vector store and here we're using chroma DB for setting up the vector store we've persist a directory we're going to need to create this from documents so we're going to pass in the in structure embeddings and we're going to pass in the document text that we've already got out from that so this is exactly the same as the previous video we haven't really changed anything the only thing we're doing now is we're using these instructor embeddings in there we now basically can do the same sorts of things of making a retriever and now obviously this retriever is using our new embeddings for that and now the retriever is going to be using the new embedding the instructor embeddings to actually find the various contexts that match based on a query in here next up we need to basically make a chain so this is again the same as before nothing really different in here we're passing in the retriever that's going to take care of the vector store the embeddings those parts there I've just added a little bit of code in here just to wrap the answers but when we get them out and we can see that if we look at this we can see that okay starting off what is Flash retention and it's going to go and get the three top documents and in this case not surprisingly the document that the embeddings have chosen as the similarity that's closest to what we want to know is going to be this in this flash attention paper or this PDF here and so basically it gives us back a definition for Flash attention we can then skim to different parts of this so here it mentioned IO aware so I wanted to ask out what is that it basically is able to go through and find again from that same paper mentioned tiling I go through can find out an answer for that as well so then I thought okay let's ask it some other questions just to see okay what's there by asking what is two former we're then able to see can it basically is it going to return the same thing what's going to get and sure enough here we're getting uh two former as a language model that learns and a self-supervised way and so this is basically just showing us the rewriting of the output from these three examples from tall football the three different contexts we can basically ask some more questions about it what tools can be used with tool former and use search engines calculators translation systems by a simple API calls and then we can even ask it more in the different examples and stuff so this is actually a good way to if you've gone through and skimmed a paper and you want to actually ask some specific questions you can get some things out of this it's interesting when we ask it this question though it's also getting its answer from the augmenting llms paper which I think from memory also is this is actually a survey paper so it can take some things about tool former in there as well so it's basically gone and looked and decided the top three contexts were from the survey paper two form of paper itself and then another one from the survey paper if we ask it some questions about retrieval augmentation now the only paper that we've got that relates to this is in the augmenting llm survey sure enough it's able to get some of those if we ask it some specifics about the differences between realm and rag models it's able to then tell us these kind of things so the idea here is that we're still using open AI for the actual language model part in the next video we'll have a look at trying to get rid of that and just go to fully running everything locally but we're now using the embedding system for actually using the instruction better we're not using open AI for this so the big Advantage for this means that your data never actually has to go all of it go up to the large language model to open AI now obviously the context as they come out are still going up to open AI so it's not like none of your data is going up but it's going all up in one shot just to do embeddings for this kind of thing but the key thing is it's not just putting all your data up as it's doing the embeddings in one shot so you do have a little bit more privacy here in doing it this way of course this is still not ideal if we want to basically never have our data touch a server so in the next video we'll look at using an actual language model to do the replying part as well as just the as well as the embedding part here okay the rest of the notebook is the same just going through deleting the chroma DB database and and bringing that back in that's the same as what we looked at before if you want to try out using just the open AI GPT 3.5 turbo you can do that here that's it for this notebook uh as always if you've got any questions please put them in the comments below if you found this useful please click like And subscribe in the next video we will look at using custom models for everything for this so okay I will talk to you in the next video bye for now
Original Description
Colab: https://colab.research.google.com/drive/17eByD88swEphf-1fvNOjf_C79k0h2DgF?usp=sharing
- Multi PDFs
- ChromaDB
- Instructor Embeddings
In this video I add-on to the previous project by converting to handle multi PDFs and using local Instructor Embeddings.
For more tutorials on using LLMs and building Agents, check out my Patreon:
Patreon: https://www.patreon.com/SamWitteveen
Twitter: https://twitter.com/Sam_Witteveen
My Links:
Linkedin: https://www.linkedin.com/in/samwitteveen/
Github:
https://github.com/samwit/langchain-tutorials
https://github.com/samwit/llm-tutorials
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Sam Witteveen · Sam Witteveen · 54 of 60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
▶
55
56
57
58
59
60
LangChain Basics Tutorial #1 - LLMs & PromptTemplates with Colab
Sam Witteveen
LangChain Basics Tutorial #2 Tools and Chains
Sam Witteveen
ChatGPT API Announcement & Code Walkthrough with LangChain
Sam Witteveen
Trying Out Flan 20B with UL2 - Working in Colab with 8Bit Inference
Sam Witteveen
LangChain - Conversations with Memory (explanation & code walkthrough)
Sam Witteveen
LangChain Chat with Flan20B
Sam Witteveen
LangChain - Using Hugging Face Models locally (code walkthrough)
Sam Witteveen
PAL : Program-aided Language Models with LangChain code
Sam Witteveen
Building a Summarization System with LangChain and GPT-3 - Part 1
Sam Witteveen
Building a Summarization System with LangChain and GPT-3 - Part 2
Sam Witteveen
Microsoft's Visual ChatGPT using LangChain
Sam Witteveen
Building a Summarization System with LangChain - Part 3 Using ChatGPT Turbo
Sam Witteveen
LangChain Agents - Joining Tools and Chains with Decisions
Sam Witteveen
Investigating Alpaca 7B - Finetuned LLaMa LLM
Sam Witteveen
Comparing LLMs with LangChain
Sam Witteveen
Running Alpaca7B in Colab
Sam Witteveen
How to finetune your own Alpaca 7B
Sam Witteveen
How to make a custom dataset like Alpaca7B
Sam Witteveen
Understanding Constitutional AI - the paper and key concepts
Sam Witteveen
Using Constitutional AI in LangChain
Sam Witteveen
Talking to Alpaca with LangChain - Creating an Alpaca Chatbot
Sam Witteveen
Text-to-video-synthesis with Diffusers and Colab
Sam Witteveen
Meet Dolly the new Alpaca model
Sam Witteveen
Checking out the Cerebras-GPT family of models
Sam Witteveen
A Step-by-Step Guide to Fine-Tuning Your Dolly Model (tutorial)
Sam Witteveen
Is GPT4All your new personal ChatGPT?
Sam Witteveen
Raven - RWKV-7B RNN's LLM Strikes Back
Sam Witteveen
Talk to your CSV & Excel with LangChain
Sam Witteveen
Vicuna - 90% of ChatGPT quality by using a new dataset?
Sam Witteveen
Koala Revealed: The ChatGPT Alternative You Need to Know! 🔍
Sam Witteveen
Running Koala for free in Colab. Your own personal ChatGPT? (tutorial)
Sam Witteveen
BabyAGI: Discover the Power of Task-Driven Autonomous Agents!
Sam Witteveen
Auto-GPT - How to Automate a Task Based AI with GPT-4
Sam Witteveen
Improve your BabyAGI with LangChain
Sam Witteveen
Generative Agents - Deep Dive and GPT-4 Recreation
Sam Witteveen
GPT4ALLv2: The Improvements and Drawbacks You Need to Know!
Sam Witteveen
Dolly 2.0 by Databricks: Open for Business but is it Ready to Impress!
Sam Witteveen
Red Pajama - Operation: Freeing LLaMA
Sam Witteveen
Investigating Open Assistant - Models, Datasets and Addons
Sam Witteveen
Investigating MiniGPT-4 - The Secret behind GPT-V?
Sam Witteveen
Stable LM 3B - The new tiny kid on the block.
Sam Witteveen
Bard can now code and put that code in Colab for you.
Sam Witteveen
Checking out Bark: a Text to Speech system by Suno AI
Sam Witteveen
Fine-tuning LLMs with PEFT and LoRA
Sam Witteveen
Master PDF Chat with LangChain - Your essential guide to queries on documents
Sam Witteveen
Using LangChain with DuckDuckGO Wikipedia & PythonREPL Tools
Sam Witteveen
Building Custom Tools and Agents with LangChain (gpt-3.5-turbo)
Sam Witteveen
StableVicuna: The New King of Open ChatGPTs?
Sam Witteveen
WizardLM: Evolving Instruction Datasets to Create a Better Model
Sam Witteveen
LaMini-LM - Mini Models Maxi Data!
Sam Witteveen
Finding the Best Free ChatGPT
Sam Witteveen
MPT-7B - The First Commercially Usable Fully Trained LLaMA Style Model
Sam Witteveen
LangChain Retrieval QA Over Multiple Files with ChromaDB
Sam Witteveen
LangChain Retrieval QA with Instructor Embeddings & ChromaDB for PDFs
Sam Witteveen
LangChain + Retrieval Local LLMs for Retrieval QA - No OpenAI!!!
Sam Witteveen
Transformers Agent - Is this Hugging Face's LangChain Competitor?
Sam Witteveen
StarCoder - The LLM to make you a coding star?
Sam Witteveen
Testing Starcoder for Reasoning with PAL
Sam Witteveen
The New Wizards - Unfiltered & Unaligned
Sam Witteveen
Camel + LangChain for Synthetic Data & Market Research
Sam Witteveen
More on: RAG Basics
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
Your AI Keeps Making Things Up. RAG Is How You Make It Use Real Facts Instead.
Medium · RAG
Evaluation Metrics for RAG: Measure Retrieval, Generation, and End-to-End Quality With Numbers That…
Medium · AI
Evaluation Metrics for RAG: Measure Retrieval, Generation, and End-to-End Quality With Numbers That…
Medium · Data Science
When Does HyDE Help RAG? I Tested 3 Query Types and It Failed on Two
Medium · AI
🎓
Tutor Explanation
DeepCamp AI