Cohere AI's LLM for Semantic Search in Python

James Briggs · Intermediate ·🧠 Large Language Models ·3y ago

Skills: LLM Foundations90%Prompt Craft80%LLM Engineering80%Fine-tuning LLMs70%RAG Basics70%

Key Takeaways

This video demonstrates how to use Cohere AI's LLM for semantic search in Python, utilizing the Cohere Embed API and Pinecone Vector Database to generate and index language embeddings for fast and scalable vector search. The video covers the implementation of a semantic search engine using these tools.

Full Transcript

today we are going to take a look at how to build a semantic Search tool using coheres embed API endpoints and pine cones Vector database we'll be using coheres large language model to embed sentences or paragraphs into a vector space and then we'll be using panco inspector database to actually search through that Vector space and retrieve relevant answers to our particular queries based on the semantics of those queries rather than just keyword matching now both of these Services together a pretty good combination and they make building this subtle incredibly easy as we'll see but before we start building it let's take a look at what the overall architecture will look like so we're going to be starting with our data it's just going to be a load of um text it can be split into sentences or roughly paragraph size chunks of text depending on what we're trying to do and what we're going to do is feed those into coheres embedding endpoint which is just going to go to a large language model and what that will do is encode each of the chunks of text that we feed into it into a single Vector now we're going to have a thousand of these chunks of text we're going to have like quite small questions from the track data set so we'll end up with a thousand of these vectors okay and once we have them we then take them we put them into pine cone and whilst they are still in Pinecone or even just the vectors by themselves we can think of them as being you know how this works is that they are represented in a vape space so two of these chunks of text are semantically similar I.E they have a similar meaning they would be very close together in that Vapor space whereas two sentences that have a very dissimilar meaning would be very far apart within that depth space pine cone is the the storage the database that stores all these vectors and also allows us to search through these vectors very efficiently so we can literally solve Millions tens of millions billions of vectors in here and search through them incredibly fast now all of this together is what we would call indexing and on the other side of this we have the querying phase so when we're making queries let's say we have a little search box here obviously this input can be anything we like but we have a search box here and our users are going to enter a query it's like Google search that query will go over to go here first using the same large language model and what we will get is a single what we call a query vector so here is our query Vector in code we usually refer to it as xq and we're going to pass that over to Pine Cone here and we're going to say to Pine Cone okay with this query Vector what are the top K so the number here top K maybe we say equal to three what are the top K most similar already in net selectors so with top k equal to three if this was our query Vector we would return the top three most similar items here so I think maybe number one would be this Vector here maybe number two would be this one and number three would be this one and the pine cone would return those tours so we'd have three vectors here but obviously we can't read or understand what these vectors mean all right they're just numbers so what we need to do is find the metadata that was attached to those so the metadata is going to contain the original text from up there so we would actually get that original text and we would return all of that to our user so that's what we're going to be building it probably looks much more complicated from this chart than it actually is it's incredibly easy as we'll see it won't take as long to go through so let's get started we are going to using this guide on pinecon so docs.pink.io here we're not going to actually go through this page here we're actually just going to go over to here opening collab okay and we get this little chart it's like pretty much exactly the same as what I just explained just a little simpler and we want to go down here and we just need to install a few prerequisites so we have to take a look up here we have coher and Pinecone you know you know why we're using those I just mentioned it and then we also have a home face data sets this is where we're going to be sourcing our data set from that we're going to be indexing and then querying against later on okay it looks good so come down to here and we need to sign up for API keys at cohere and Pinecone and both of these we can actually do this completely for free so go here has a trial amount that we can query with so we're going to be using that click on cohere here that will take us to OS Dot cohere.ai and it will also just redirect you to the dashboard if you're already logged in if this is your first time using career of course you will not be so you have to go up to the top right over here and create an account or log in once you have done that you can go to settings on the left go to API keys and you should have a trial key here so I've got the default key I'm going to copy that and I'm going to go ahead and put it in here okay so I've just put mine in and then for the pine con key and click here and that will take us to app.pinecone.io if you already logged in you'll see something like this otherwise you're going to see a little login screen or create an account page so you go ahead do that and then you should be redirected to here now if this first time using pine cone this will be empty you can see I have a few indexes in here already but of course if you haven't created any already this will be empty so all we need to do is head on over to API keys on the left we should have a default API key we can copy that and we will place it in here okay so I've just added mine so I've got my co here and pine cone keys in there first thing we want to do is create our embeddings for that we are going to need to use the cohere embed endpoint and we also need some data so let's get both of those so here we just initialize our connection to co here using that API key from before and then here we're going to use hook and face data sets and we're going to download the first 1000 questions from the Trek Data set or first 1000 rows actually and then the questions themselves are sold within this text feature which we can see down here okay cool after this we are going to be using the coheir embed endpoint uh we're going to be passing all of those items so we actually have a thousand items in there you can just pass them all to cohere it once and this client will just automatically batch loads and iterate through everything so we don't overload the API requests we're going to be using the small model and we're going to be truncating uh so we're going to be keeping everything on the left here and then after that we want to extract the embeddings from what we return there so we run this it should be pretty quick okay a second super fast and then let's just have a look at the dimensionality what we return there so we have 1000 vectors each one of those vectors is 1024 Dimensions which is just the output dimensionality of coheres small large language model okay and with those embeddings for created we can move on to actually initializing our Vector index which is why we're going to start everything so for that we initialize our connection Pinecone using API key we used before we are going to be using this index name you can change this to whatever you want I would just recommend that you keep it descriptive so that you're not getting confused if you have multiple indexes later on and then here if this is your first time using pen or going through its notebook this will just run here so it will create the index if you've already run this notebook and the index already exists within your Pinecone account then this is going to say if that aim that's name does exist do not create the index again because it already exists we don't need to create it again within that create index we have the index name the dimensionality so that's the 1024 that we sold before the cohere small model and also using cosine similarity as the similarity metric there as well after that we just go ahead and connect to the index so let's run that now once that is all done we're going to move on to actually up setting everything so adding all of those vectors the relevant metadata and some unique IDs to our index and that will be in this format so we're going to big list where each record content is within a tuple containing an ID a unique ID a vector that we've created from coheres embed endpoint and the metadata which is just going to contain the plain text version of the information so come down here and we will go ahead and create that structure so that's what we're doing here so bring a zip list of unique IDs which are just a count of the embeddings which we created before go here and the metadata which you can see we're creating here it's just a dictionary a metadata is also always within the dictionary format and we just have a key which is text and the value which is the original plain text of about data now up here we're using batch size 128 that is so that we're not overloading the API calls that we're making to Pinecone and we are actually upsetting in batches of 128. okay so we can run that at the end here we're going to describe the index statistics which is just so we can see the vector count so have we upserted everything into our Vector index there and we can see here that we have and from here we can also check the marginality of our index which again this should align to the model output dimensionality again the 1024 that we solved from before and okay so with that we've actually done all the indexing stage of our workflow so we can actually cross off this bit here so the indexing part this is all done now what we need to do is aquarium parts so we can see we have our playing text query we're going to take that to go here we're going to from here we're going to create that query embedding we query that with Pinecone and we return some vectors and the metadata with those vectors to the user so to do that it's pretty much the same process again we have our query we have what cause the 1929 Great Depression we are going to do the exact same thing that we did before with the track data we are going to use cohere embed use the small model which I'm going to create to the left and we get those embeddings we can also print the shape here so let me run this okay so the shape is just one vector this time which is our query vector and it's still 10 24 dimensionality because we're using that small cohere large language model now from there we query Pinecone with this we're saying we want to return the top 10 most similar results and we want to include the metadata that includes the plain text so that we can actually read the results and we get this sort of response funds which is we can read it relatively easily but let's clean it up a little bit more so we come down here run this so we're going to go through each of those matches that we returned here and we're just going to print out the score rounded so it'll be easier to read and we're going to return the metadata text and we print all of those out we get something like this now the top two results are they have much higher some large scores than the rest of our results and they are indeed far more relevant to our question these would clearly be counted as duplicates to our question or at least very similar and then the rest of these we can see that these are not directly relevant but I think most of these kind of occur within the same sort of time era so it's interesting that they it manages to kind of Identify some sort of relationship there and return those but of course these are the only two within that 1000 query data set that we have from Trek these are the only two items that refer to the Great Depression the rest of them are as you can see not elephant at least not directly relevant so they're very good results now let's adjust this a little bit so I mentioned before that we're searching based on the meaning of these queries not the keywords so what we're going to do is replace the keyword depression with the incorrect term recession now what else incorrect it's still you know we as humans would understand that it means the same thing someone is trying to ask about that specific event and indeed we can see that the results are pretty much exactly the same now the similarity scores dropped a bit because we'll be using that incorrect term but nonetheless it is identifying the the top two exactly the same I think most of these are also the same as just a little bit of a different order in that and a couple that have maybe dropped from the top there now in this case we still have a lot of similar keywords there so maybe recession is very clearly identified as depression major with gray and and so on so maybe we can modify this even more and just kind of drop all those similar words and we can be kind of more descriptive here as well so why was there a long-term economic downturn in the early 20th century so this is very different to the results that we would expect to find and yet again we see those two right at the top and the rest of these results are also very similar now interestingly I think because we are using more descriptive language here it's managing to identify these two as being more similar than we did with the previous where we used the incorrect result you can see that the similarity score here is lower than it is here so you can see already how easy it is to build a pretty high performing semantic search engine using very little code and literally no prior knowledge about this technology all we do is make a few API calls to go here make a few API calls to Pine Cone and we have this semantic Search tool now that's it for this video I hope it has been useful and interesting but for now thank you very much for watching and I'll see you again in the next one bye

Original Description

In this video, we will learn how to use the Cohere Embed API endpoint to generate language embeddings using a large language model (LLM) and then index those embeddings in the Pinecone vector database for fast and scalable vector search. Cohere is an AI company that allows us to use state-of-the-art large language models (LLMs) in NLP. The Cohere Embed endpoint we use in this video gives us access to models similar to other popular LLMs like OpenAI's GPT 3, particularly their recent offerings via OpenAI Embeddings like the text-embedding-ada-002 model. Pinecone is a vector database company allowing us to use state-of-the-art vector search through millions or even billions of data points. Both services together are a powerful and common combination for building semantic search, question-answering, advanced sentiment analysis, and other applications that rely on NLP and search over a large corpus of text data. 🌲 Pinecone docs: https://docs.pinecone.io/docs/cohere 🤖 AI Dev Studio: https://aurelio.ai 🎉 Subscribe for Article and Video Updates! https://jamescalam.medium.com/subscribe https://medium.com/@jamescalam/membership 👾 Discord: https://discord.gg/c5QtDB9RAP 00:00 Semantic search with Cohere LLM and Pinecone 00:45 Architecture overview 04:06 Getting code and prerequisites install 04:50 Cohere and Pinecone API keys 06:12 Initialize Cohere, get data, create embeddings 07:43 Creating Pinecone vector index 10:37 Querying with Cohere and Pinecone 12:56 Testing a few queries 14:35 Final notes

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from James Briggs · James Briggs · 0 of 60

← Previous Next →

Stoic Philosophy Text Generation with TensorFlow

Stoic Philosophy Text Generation with TensorFlow

How to Build TensorFlow Pipelines with tf.data.Dataset

How to Build TensorFlow Pipelines with tf.data.Dataset

Every New Feature in Python 3.10.0a2

Every New Feature in Python 3.10.0a2

How-to Build a Transformer for Language Classification in TensorFlow

How-to Build a Transformer for Language Classification in TensorFlow

How-to use the Kaggle API in Python

How-to use the Kaggle API in Python

Language Generation with OpenAI's GPT-2 in Python

Language Generation with OpenAI's GPT-2 in Python

Text Summarization with Google AI's T5 in Python

Text Summarization with Google AI's T5 in Python

How-to do Sentiment Analysis with Flair in Python

How-to do Sentiment Analysis with Flair in Python

Python Environment Setup for Machine Learning

Python Environment Setup for Machine Learning

Sequential Model - TensorFlow Essentials #1

Sequential Model - TensorFlow Essentials #1

Functional API - TensorFlow Essentials #2

Functional API - TensorFlow Essentials #2

Training Parameters - TensorFlow Essentials #3

Training Parameters - TensorFlow Essentials #3

Input Data Pipelines - TensorFlow Essentials #4

Input Data Pipelines - TensorFlow Essentials #4

6 of Python's Newest and Best Features (3.7-3.9)

6 of Python's Newest and Best Features (3.7-3.9)

Novice to Advanced RegEx in Less-than 30 Minutes + Python

Novice to Advanced RegEx in Less-than 30 Minutes + Python

Building a PlotLy $GME Chart in Python

Building a PlotLy $GME Chart in Python

How-to Use The Reddit API in Python

How-to Use The Reddit API in Python

How to Build Custom Q&A Transformer Models in Python

How to Build Custom Q&A Transformer Models in Python

How to Build Q&A Models in Python (Transformers)

How to Build Q&A Models in Python (Transformers)

How-to Decode Outputs From NLP Models (Python)

How-to Decode Outputs From NLP Models (Python)

Identify Stocks on Reddit with SpaCy (NER in Python)

Identify Stocks on Reddit with SpaCy (NER in Python)

Sentiment Analysis on ANY Length of Text With Transformers (Python)

Sentiment Analysis on ANY Length of Text With Transformers (Python)

Unicode Normalization for NLP in Python

Unicode Normalization for NLP in Python

The NEW Match-Case Statement in Python 3.10

The NEW Match-Case Statement in Python 3.10

Multi-Class Language Classification With BERT in TensorFlow

Multi-Class Language Classification With BERT in TensorFlow

How to Build Python Packages for Pip

How to Build Python Packages for Pip

How-to Structure a Q&A ML App

How-to Structure a Q&A ML App

How to Index Q&A Data With Haystack and Elasticsearch

How to Index Q&A Data With Haystack and Elasticsearch

Q&A Document Retrieval With DPR

Q&A Document Retrieval With DPR

How to Use Type Annotations in Python

How to Use Type Annotations in Python

Extractive Q&A With Haystack and FastAPI in Python

Extractive Q&A With Haystack and FastAPI in Python

Sentence Similarity With Sentence-Transformers in Python

Sentence Similarity With Sentence-Transformers in Python

Sentence Similarity With Transformers and PyTorch (Python)

Sentence Similarity With Transformers and PyTorch (Python)

NER With Transformers and spaCy (Python)

NER With Transformers and spaCy (Python)

Training BERT #1 - Masked-Language Modeling (MLM)

Training BERT #1 - Masked-Language Modeling (MLM)

Training BERT #2 - Train With Masked-Language Modeling (MLM)

Training BERT #2 - Train With Masked-Language Modeling (MLM)

Training BERT #3 - Next Sentence Prediction (NSP)

Training BERT #3 - Next Sentence Prediction (NSP)

Training BERT #4 - Train With Next Sentence Prediction (NSP)

Training BERT #4 - Train With Next Sentence Prediction (NSP)

FREE 11 Hour NLP Transformers Course (Next 3 Days Only)

FREE 11 Hour NLP Transformers Course (Next 3 Days Only)

New Features in Python 3.10

New Features in Python 3.10

Training BERT #5 - Training With BertForPretraining

Training BERT #5 - Training With BertForPretraining

How-to Use HuggingFace's Datasets - Transformers From Scratch #1

How-to Use HuggingFace's Datasets - Transformers From Scratch #1

Build a Custom Transformer Tokenizer - Transformers From Scratch #2

Build a Custom Transformer Tokenizer - Transformers From Scratch #2

3 Traditional Methods for Similarity Search (Jaccard, w-shingling, Levenshtein)

3 Traditional Methods for Similarity Search (Jaccard, w-shingling, Levenshtein)

3 Vector-based Methods for Similarity Search (TF-IDF, BM25, SBERT)

3 Vector-based Methods for Similarity Search (TF-IDF, BM25, SBERT)

Building MLM Training Input Pipeline - Transformers From Scratch #3

Building MLM Training Input Pipeline - Transformers From Scratch #3

Training and Testing an Italian BERT - Transformers From Scratch #4

Training and Testing an Italian BERT - Transformers From Scratch #4

Faiss - Introduction to Similarity Search

Faiss - Introduction to Similarity Search

Angular App Setup With Material - Stoic Q&A #5

Angular App Setup With Material - Stoic Q&A #5

Why are there so many Tokenization methods in HF Transformers?

Why are there so many Tokenization methods in HF Transformers?

Choosing Indexes for Similarity Search (Faiss in Python)

Choosing Indexes for Similarity Search (Faiss in Python)

Locality Sensitive Hashing (LSH) for Search with Shingling + MinHashing (Python)

Locality Sensitive Hashing (LSH) for Search with Shingling + MinHashing (Python)

How LSH Random Projection works in search (+Python)

How LSH Random Projection works in search (+Python)

IndexLSH for Fast Similarity Search in Faiss

IndexLSH for Fast Similarity Search in Faiss

Faiss - Vector Compression with PQ and IVFPQ (in Python)

Faiss - Vector Compression with PQ and IVFPQ (in Python)

Product Quantization for Vector Similarity Search (+ Python)

Product Quantization for Vector Similarity Search (+ Python)

How to Build a Bert WordPiece Tokenizer in Python and HuggingFace

How to Build a Bert WordPiece Tokenizer in Python and HuggingFace

Metadata Filtering for Vector Search + Latest Filter Tech

Metadata Filtering for Vector Search + Latest Filter Tech

Build NLP Pipelines with HuggingFace Datasets

Build NLP Pipelines with HuggingFace Datasets

Composite Indexes and the Faiss Index Factory

Composite Indexes and the Faiss Index Factory

This video teaches how to use Cohere AI's LLM for semantic search in Python, covering the implementation of a semantic search engine using the Cohere Embed API and Pinecone Vector Database. The video provides a hands-on approach to building a high-performing semantic search engine with minimal code.

Key Takeaways

Sign up for API keys at Cohere and Pinecone
Create embeddings using Cohere embed endpoint
Initialize connection to Cohere using API key
Initialize connection to Pinecone using API key
Create index with dimensionality and similarity metric
Upsert vectors and metadata into Pinecone index in batches
Query Pinecone with query embedding and return top 10 most similar results

💡 The video demonstrates how to use Cohere AI's LLM for semantic search in Python, providing a high-performing semantic search engine with minimal code, and handling synonyms and descriptive language.

🔒 Pro feature: Ask AI to explain this lesson →

More on: LLM Foundations

View skill →

Getting Started with Vertex AI Gemini 1.5 Flash

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

How to use the ChatGPT API with Python!!

How to use the ChatGPT API with Python!!

Nicholas Renotte

Gemini 2.5: Create an interactive plot of economic data

Gemini 2.5: Create an interactive plot of economic data

Google DeepMind

LangChain Chatbots: Building a Personalized AI Assistant

LangChain Chatbots: Building a Personalized AI Assistant

Analytics Vidhya

Auto-generating meeting notes with Python

Auto-generating meeting notes with Python

Related Reads

Kairos-4B: the open-source world model that just lapped the competition four times over

Learn about Kairos-4B, an open-source world model that surpasses competition four times over, and how it achieves real-time performance on edge devices

Medium · Machine Learning

Google’s Open Knowledge Format (OKF): Is This the Beginning of the End for RAG?

Google's Open Knowledge Format (OKF) might enhance Retrieval-Augmented Generation (RAG) rather than replace it, and understanding OKF is crucial for professionals working with AI and knowledge management

Medium · Programming

New AI tutor achieves 0.71-1.30 SD effect size in Dartmouth course [pdf]

Phosphor, an AI-powered learning platform, achieves significant learning gains by integrating LLM-graded formative assessments into instructional content, increasing student engagement and efficacy

Hacker News (AI)

Guardrails for LLM Apps in Java

Learn to secure LLM apps in Java with guardrails against prompt-injection and data breaches

Dev.to · Puneet Gupta

Chapters (9)

Semantic search with Cohere LLM and Pinecone

0:45 Architecture overview

4:06 Getting code and prerequisites install

4:50 Cohere and Pinecone API keys

6:12 Initialize Cohere, get data, create embeddings

7:43 Creating Pinecone vector index

10:37 Querying with Cohere and Pinecone

12:56 Testing a few queries

14:35 Final notes

5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems

Dave Ebbelaar (LLM Eng)