LangChain Retrieval QA with Instructor Embeddings & ChromaDB for PDFs

Sam Witteveen · Intermediate ·🔍 RAG & Vector Search ·3y ago

Skills: RAG Basics90%Vector Stores80%RAG Evaluation70%Advanced RAG60%

Key Takeaways

This video demonstrates the use of ChromaDB, Instructor Embeddings, and LangChain for retrieval augmented generation (RAG) search with multi-PDF handling, utilizing GPU for faster processing and incorporating custom vector representations.

Full Transcript

alright in this video we're going to continue looking at the multi-dock retriever we're still going to be using chroma DB for our database for our Vector store but the big thing that we're going to add in this one is we're going to add in embeddings that are actually running locally so to do this first off we need to have a GPU or it's ideal to have a GPU running so I've got just a T4 here not using a super powerful GPU you could run this on the CPU it's just going to take a fair bit more time to do this so that you'll see that I'm bringing in the same stuff we actually don't need at anymore that's the two new ones we're going to bring in the instructor embedding which I'll talk about in a sec and basically just the hugging face for using here so another difference I made in this one is a lot of people asking about PDF files multiple PDF files so I swapped out the text files for doing multiple PDF files in here and actually if we have a look in here you'll see that what I've done is just put in some papers so these are just some papers from archive about react to a former flash attention Alibi so just some stuff around the the topics that we've been looking at in the large language models recently the splitting and stuff like that is all the same so we've got you know basically we're just bringing it in we're just using the simple Pi PDF loader in this case bring things in and then the next key thing is we just get to the embedding so there's two ways of doing the embeddings you can use just the normal hugging face embeddings so this is using things like sentence Transformers and there's a whole bunch of different models around that they vary in degrees of quality and a lot of it will also depend on your data as well which ones sort of match this so an example of just using a standard sentence Transformer would be this one so this is one of that used to be one of the top models for doing this but when my testing I actually came across that a newer model that seems to be doing better so I decided to go with that and the new model that I'm going with is the instructor embeddings so I think these kind of deserve a whole video to themselves to explain the paper and stuff like that the idea here is that these are custom embeddings depending on what it is uh that you're using them for in this case though we're just using the instruction embeddings and we're using the Excel variety of this so we bring these basically into Lang Che there you can see that we're going to run them locally so it's downloading the model it's downloading all the files for this we're actually telling it here that we're going to put it on the GPU so this is what device Cuda is here um if you wanted to run them locally you could put it device CPU for doing that it's definitely going to make it a lot slower and you'll see it's going to basically load these up and bring them in and by default these are operating at a sequence length of 512 which is fine for the splitting that we're doing of a thousand characters that should be fine in this case okay once we've got the embedding set up we're then going to need to make our Vector store here so this is all exactly the same as the last video we're basically just passing in the new embeddings here so we're not using open AI embeddings anymore okay once we've got the embedding set up we're now going to basically just go along with what we were doing before so we need to set up our Vector store and here we're using chroma DB for setting up the vector store we've persist a directory we're going to need to create this from documents so we're going to pass in the in structure embeddings and we're going to pass in the document text that we've already got out from that so this is exactly the same as the previous video we haven't really changed anything the only thing we're doing now is we're using these instructor embeddings in there we now basically can do the same sorts of things of making a retriever and now obviously this retriever is using our new embeddings for that and now the retriever is going to be using the new embedding the instructor embeddings to actually find the various contexts that match based on a query in here next up we need to basically make a chain so this is again the same as before nothing really different in here we're passing in the retriever that's going to take care of the vector store the embeddings those parts there I've just added a little bit of code in here just to wrap the answers but when we get them out and we can see that if we look at this we can see that okay starting off what is Flash retention and it's going to go and get the three top documents and in this case not surprisingly the document that the embeddings have chosen as the similarity that's closest to what we want to know is going to be this in this flash attention paper or this PDF here and so basically it gives us back a definition for Flash attention we can then skim to different parts of this so here it mentioned IO aware so I wanted to ask out what is that it basically is able to go through and find again from that same paper mentioned tiling I go through can find out an answer for that as well so then I thought okay let's ask it some other questions just to see okay what's there by asking what is two former we're then able to see can it basically is it going to return the same thing what's going to get and sure enough here we're getting uh two former as a language model that learns and a self-supervised way and so this is basically just showing us the rewriting of the output from these three examples from tall football the three different contexts we can basically ask some more questions about it what tools can be used with tool former and use search engines calculators translation systems by a simple API calls and then we can even ask it more in the different examples and stuff so this is actually a good way to if you've gone through and skimmed a paper and you want to actually ask some specific questions you can get some things out of this it's interesting when we ask it this question though it's also getting its answer from the augmenting llms paper which I think from memory also is this is actually a survey paper so it can take some things about tool former in there as well so it's basically gone and looked and decided the top three contexts were from the survey paper two form of paper itself and then another one from the survey paper if we ask it some questions about retrieval augmentation now the only paper that we've got that relates to this is in the augmenting llm survey sure enough it's able to get some of those if we ask it some specifics about the differences between realm and rag models it's able to then tell us these kind of things so the idea here is that we're still using open AI for the actual language model part in the next video we'll have a look at trying to get rid of that and just go to fully running everything locally but we're now using the embedding system for actually using the instruction better we're not using open AI for this so the big Advantage for this means that your data never actually has to go all of it go up to the large language model to open AI now obviously the context as they come out are still going up to open AI so it's not like none of your data is going up but it's going all up in one shot just to do embeddings for this kind of thing but the key thing is it's not just putting all your data up as it's doing the embeddings in one shot so you do have a little bit more privacy here in doing it this way of course this is still not ideal if we want to basically never have our data touch a server so in the next video we'll look at using an actual language model to do the replying part as well as just the as well as the embedding part here okay the rest of the notebook is the same just going through deleting the chroma DB database and and bringing that back in that's the same as what we looked at before if you want to try out using just the open AI GPT 3.5 turbo you can do that here that's it for this notebook uh as always if you've got any questions please put them in the comments below if you found this useful please click like And subscribe in the next video we will look at using custom models for everything for this so okay I will talk to you in the next video bye for now

Original Description

Colab: https://colab.research.google.com/drive/17eByD88swEphf-1fvNOjf_C79k0h2DgF?usp=sharing - Multi PDFs - ChromaDB - Instructor Embeddings In this video I add-on to the previous project by converting to handle multi PDFs and using local Instructor Embeddings. For more tutorials on using LLMs and building Agents, check out my Patreon: Patreon: https://www.patreon.com/SamWitteveen Twitter: https://twitter.com/Sam_Witteveen My Links: Linkedin: https://www.linkedin.com/in/samwitteveen/ Github: https://github.com/samwit/langchain-tutorials https://github.com/samwit/llm-tutorials

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Sam Witteveen · Sam Witteveen · 54 of 60

← Previous Next →

LangChain Basics Tutorial #1 - LLMs & PromptTemplates with Colab

LangChain Basics Tutorial #1 - LLMs & PromptTemplates with Colab

LangChain Basics Tutorial #2 Tools and Chains

LangChain Basics Tutorial #2 Tools and Chains

ChatGPT API Announcement & Code Walkthrough with LangChain

ChatGPT API Announcement & Code Walkthrough with LangChain

Trying Out Flan 20B with UL2 - Working in Colab with 8Bit Inference

Trying Out Flan 20B with UL2 - Working in Colab with 8Bit Inference

LangChain - Conversations with Memory (explanation & code walkthrough)

LangChain - Conversations with Memory (explanation & code walkthrough)

LangChain Chat with Flan20B

LangChain Chat with Flan20B

LangChain - Using Hugging Face Models locally (code walkthrough)

LangChain - Using Hugging Face Models locally (code walkthrough)

PAL : Program-aided Language Models with LangChain code

PAL : Program-aided Language Models with LangChain code

Building a Summarization System with LangChain and GPT-3 - Part 1

Building a Summarization System with LangChain and GPT-3 - Part 1

Building a Summarization System with LangChain and GPT-3 - Part 2

Building a Summarization System with LangChain and GPT-3 - Part 2

Microsoft's Visual ChatGPT using LangChain

Microsoft's Visual ChatGPT using LangChain

Building a Summarization System with LangChain - Part 3 Using ChatGPT Turbo

Building a Summarization System with LangChain - Part 3 Using ChatGPT Turbo

LangChain Agents - Joining Tools and Chains with Decisions

LangChain Agents - Joining Tools and Chains with Decisions

Investigating Alpaca 7B - Finetuned LLaMa LLM

Investigating Alpaca 7B - Finetuned LLaMa LLM

Comparing LLMs with LangChain

Comparing LLMs with LangChain

Running Alpaca7B in Colab

Running Alpaca7B in Colab

How to finetune your own Alpaca 7B

How to finetune your own Alpaca 7B

How to make a custom dataset like Alpaca7B

How to make a custom dataset like Alpaca7B

Understanding Constitutional AI - the paper and key concepts

Understanding Constitutional AI - the paper and key concepts

Using Constitutional AI in LangChain

Using Constitutional AI in LangChain

Talking to Alpaca with LangChain - Creating an Alpaca Chatbot

Talking to Alpaca with LangChain - Creating an Alpaca Chatbot

Text-to-video-synthesis with Diffusers and Colab

Text-to-video-synthesis with Diffusers and Colab

Meet Dolly the new Alpaca model

Meet Dolly the new Alpaca model

Checking out the Cerebras-GPT family of models

Checking out the Cerebras-GPT family of models

A Step-by-Step Guide to Fine-Tuning Your Dolly Model (tutorial)

A Step-by-Step Guide to Fine-Tuning Your Dolly Model (tutorial)

Is GPT4All your new personal ChatGPT?

Is GPT4All your new personal ChatGPT?

Raven - RWKV-7B RNN's LLM Strikes Back

Raven - RWKV-7B RNN's LLM Strikes Back

Talk to your CSV & Excel with LangChain

Talk to your CSV & Excel with LangChain

Vicuna - 90% of ChatGPT quality by using a new dataset?

Vicuna - 90% of ChatGPT quality by using a new dataset?

Koala Revealed: The ChatGPT Alternative You Need to Know! 🔍

Koala Revealed: The ChatGPT Alternative You Need to Know! 🔍

Running Koala for free in Colab. Your own personal ChatGPT? (tutorial)

Running Koala for free in Colab. Your own personal ChatGPT? (tutorial)

BabyAGI: Discover the Power of Task-Driven Autonomous Agents!

BabyAGI: Discover the Power of Task-Driven Autonomous Agents!

Auto-GPT - How to Automate a Task Based AI with GPT-4

Auto-GPT - How to Automate a Task Based AI with GPT-4

Improve your BabyAGI with LangChain

Improve your BabyAGI with LangChain

Generative Agents - Deep Dive and GPT-4 Recreation

Generative Agents - Deep Dive and GPT-4 Recreation

GPT4ALLv2: The Improvements and Drawbacks You Need to Know!

GPT4ALLv2: The Improvements and Drawbacks You Need to Know!

Dolly 2.0 by Databricks: Open for Business but is it Ready to Impress!

Dolly 2.0 by Databricks: Open for Business but is it Ready to Impress!

Red Pajama - Operation: Freeing LLaMA

Red Pajama - Operation: Freeing LLaMA

Investigating Open Assistant - Models, Datasets and Addons

Investigating Open Assistant - Models, Datasets and Addons

Investigating MiniGPT-4 - The Secret behind GPT-V?

Investigating MiniGPT-4 - The Secret behind GPT-V?

Stable LM 3B - The new tiny kid on the block.

Stable LM 3B - The new tiny kid on the block.

Bard can now code and put that code in Colab for you.

Bard can now code and put that code in Colab for you.

Checking out Bark: a Text to Speech system by Suno AI

Checking out Bark: a Text to Speech system by Suno AI

Fine-tuning LLMs with PEFT and LoRA

Fine-tuning LLMs with PEFT and LoRA

Master PDF Chat with LangChain - Your essential guide to queries on documents

Master PDF Chat with LangChain - Your essential guide to queries on documents

Using LangChain with DuckDuckGO Wikipedia & PythonREPL Tools

Using LangChain with DuckDuckGO Wikipedia & PythonREPL Tools

Building Custom Tools and Agents with LangChain (gpt-3.5-turbo)

Building Custom Tools and Agents with LangChain (gpt-3.5-turbo)

StableVicuna: The New King of Open ChatGPTs?

StableVicuna: The New King of Open ChatGPTs?

WizardLM: Evolving Instruction Datasets to Create a Better Model

WizardLM: Evolving Instruction Datasets to Create a Better Model

LaMini-LM - Mini Models Maxi Data!

LaMini-LM - Mini Models Maxi Data!

Finding the Best Free ChatGPT

Finding the Best Free ChatGPT

MPT-7B - The First Commercially Usable Fully Trained LLaMA Style Model

MPT-7B - The First Commercially Usable Fully Trained LLaMA Style Model

LangChain Retrieval QA Over Multiple Files with ChromaDB

LangChain Retrieval QA Over Multiple Files with ChromaDB

LangChain Retrieval QA with Instructor Embeddings & ChromaDB for PDFs

LangChain Retrieval QA with Instructor Embeddings & ChromaDB for PDFs

LangChain + Retrieval Local LLMs for Retrieval QA - No OpenAI!!!

LangChain + Retrieval Local LLMs for Retrieval QA - No OpenAI!!!

Transformers Agent - Is this Hugging Face's LangChain Competitor?

Transformers Agent - Is this Hugging Face's LangChain Competitor?

StarCoder - The LLM to make you a coding star?

StarCoder - The LLM to make you a coding star?

Testing Starcoder for Reasoning with PAL

Testing Starcoder for Reasoning with PAL

The New Wizards - Unfiltered & Unaligned

The New Wizards - Unfiltered & Unaligned

Camel + LangChain for Synthetic Data & Market Research

Camel + LangChain for Synthetic Data & Market Research

This video teaches how to implement RAG search with multi-PDF handling using ChromaDB, Instructor Embeddings, and LangChain, and demonstrates how to utilize GPU for faster processing and incorporate custom vector representations. The video provides a comprehensive guide on how to create a vector store, utilize embeddings for similarity search, and evaluate the effectiveness of RAG search.

Key Takeaways

Download and configure instructor embeddings in LangChain
Create a Vector store using ChromaDB with instructor embeddings
Switch from text files to PDF files for data storage
Utilize a GPU for faster processing
Make a retriever using instructor embeddings
Pass in document text and embeddings to retriever
Make a chain using retriever and embeddings
Ask specific questions about a paper and get answers from relevant contexts

💡 The use of instructor embeddings for retrieval and similarity search can significantly improve the effectiveness of RAG search with multi-PDF handling.

🔒 Pro feature: Ask AI to explain this lesson →

More on: RAG Basics

View skill →

High Performance (Realtime) RAG Chains: From Basic to Advanced

High Performance (Realtime) RAG Chains: From Basic to Advanced

Coding the Ultimate RAG Engine from Zero

Coding the Ultimate RAG Engine from Zero

Building Agentic RAG From Scratch in Pure Python

Building Agentic RAG From Scratch in Pure Python

Build an LLM and RAG-based Chat Application using AlloyDB and LangChain

I Built a RAG App to Decode Airline Bureaucracy (So You Don't Have To)

I Built a RAG App to Decode Airline Bureaucracy (So You Don't Have To)

Akamai Developers

RAG Demo for Beginners: Full Hands-On Tutorial in Tamil | Build Your Own RAG AI | Karthik's Show

RAG Demo for Beginners: Full Hands-On Tutorial in Tamil | Build Your Own RAG AI | Karthik's Show

Related AI Lessons

Your AI Keeps Making Things Up. RAG Is How You Make It Use Real Facts Instead.

Learn how to use RAG to make your AI provide accurate answers based on real facts instead of making things up

Evaluation Metrics for RAG: Measure Retrieval, Generation, and End-to-End Quality With Numbers That…

Learn to evaluate RAG models using metrics that measure retrieval, generation, and end-to-end quality

Evaluation Metrics for RAG: Measure Retrieval, Generation, and End-to-End Quality With Numbers That…

Learn to evaluate RAG models using metrics that measure retrieval, generation, and end-to-end quality

Medium · Data Science

When Does HyDE Help RAG? I Tested 3 Query Types and It Failed on Two

Learn when HyDE retrieval helps or hinders RAG performance across different query types, and why it matters for improving search accuracy

RRF vs DBSF with Qdrant: Hybrid Retrieval Fusion for RAG in Python

Professor Py: AI Engineering