Medical Search Engine with SPLADE + Sentence Transformers in Python

James Briggs · Intermediate ·🔍 RAG & Vector Search ·3y ago

Skills: RAG Basics90%

Key Takeaways

This video teaches how to build a medical search engine using hybrid search with SPLADE and Sentence Transformers in Python

Full Transcript

today we're going to take a look at how to implement a hybrid search engine using both splayed and a sentence Transformer it's going to be very Hands-On so I'm going to outline the architecture of this thing and then we'll jump straight into building it so as for the process that we're going to be going through here we're going to start off with data obviously so we have our data over here and it's just going to be paragraph of text like loads and loads of text what we're going to have to do is we're going to have to create a chunking mechanism right so when we are encoding these in terms of text or just general information we can't encode too much information in any one go depending on you know which models you're using right so if you're using go here or open AI there's trunks that you can use will be much larger we're going to be using a sentence Transformer on one side and then displayed on the other side displayed and Ascension Transformer somewhat limited you're definitely not going to go over around 500 tokens and with the damage Vector embedding model the sentence Transformer I believe it's like 384 tokens so you think that's probably around a thousand characters at most so you're thinking a few sentences so we're going to take that through a trunking mechanism to just break it apart into of terms of text right and then those drinks for text we're going to take them on one side we're going to feed them into splade she's going to create our sparse vectors and then on the other side we have our sentence Transformer I can't remember which one we're using so I'll just put St for now and that will give us our dense vectors right with both of those we're going to take them both and we're going to create with both of them a spot sense Vector which we then take into Pinecone okay so that's our Vector database where we're going to store everything and then what we can do is we can ask like a question we ask a question up here that gets encoded Again by splayed so since this displayed this is our dents and bedding model put those both together to create a smart stems vector and we actually take it over here feelings putting Karen and get a ton of responses based on both the the spot and the dense information so that is what we're going to be building let's actually go ahead and build it all right so we're going to start the first thing we're using Transformers sentence Transformers here that means we're going to be creating these embeddings on our local machine in this case it's actually co-lab but I'll call it local rather than the alternative which would be called an API like open AI so what we'll do is we go to runtime we need change runtime type and we need to make sure that we're using a GPU here okay so save that run and these are just the libraries that we're going to be using so I'm going to face data sets for our data obviously Transformers tension Transformers for our encoder this is a dense encoder our database and also our display model Okay cool so we're going to first load the PubMed QA data set so this is a medical question answering data set so with medical things you'll find that there's a lot of kind of specific terminology and it's within that sort of domain that models like splayed or just general sparse embedding models will perform better however if you are able to train your sentence Transformer your Dent embedding model on the data set then in that case you can actually improve the performance to be on that of a sparse embedding model usually so let's have a look what we have all right so we just have our ID here and then we have this data right this context and we have all these paragraphs right and what we're going to need to do with these paragraphs is put them all together and then chunk them into the smaller trunks that will fit into our models okay so I think we mentioned here we're gonna yeah into digestible Insurance sorry models we are going to be using Bert which has you know the default that model has this Max sequence like 500 ringtone tokens which is is fairly big but your typical sentence Transformer actually limits this to 128. right so we're going to be pretty naive in our assumptions here but we're going to just assume this 128 token limit and we're actually going to assume that the average total length is three characters in reality it will vary we should realistically actually create our tokens and then count the number of tokens in there but it's just more complicated logic and I want this to be as simple as possible so we're just doing this for now but if you're interested in that let me know and I can send you a link to some code that does actually do that okay so to create these chunks we're going to create a processing function called jungko which is here which is going to feed in that list of contexts that we've got up here so this literally this list here and what it's going to do is join them and then split based on sentences so we're going to create our chunks a sentence level so what we do is we Loop through each sentence in here and we say okay if we add it to the chunk here and once the length of that exceeds our limit which is here the limit is 384 tokens we will say okay we've got enough here we're not going to go any higher so we then add that to the chunk okay and here what we're doing is let's say we have let's say we have four sentences up no five sentences in a single chunk what we're going to do is so that we're not cutting off between like sentences that are like relevant like have some continuing logic between them what we're going to do is between each of our trunks we're actually going to have some overlap so let's say we take a chunks zero to four and then what we're going to do for the next chunk is take chunks like two to seven or something like that so there's always a bit of overlap between the chunks okay so once we are done and we get to the end of our our like sentences we might still have like a smaller chunk that's left over so we just append that to our our trumps list all right so that's our chunking function all right so let's run that and we'll apply it to our first contacts all right and then we get these smaller chunks now okay and they've been split between sentences we're not just splitting in the middle of a sentence but one thing you all also notice is like here it says the leaves so fine consists of the lattice work of and then and we also have that here right so we always have like basically half the chunk is overlapped so we have a lot of repetition in there uh we we depending on what you're doing can minimize that just for this example it's fine but you should realistically have some overlap there so that you're not cutting between like sentences that have some logical continuation right we basically don't want to lose information so that's why we have those overlaps and uh yeah this is probably a more reasonable one so you have all this and then the overlap starts from around here okay cool so uh what we want to do is give each Trump a unique ID so with using the pub ID here followed by the chunk of number okay and what we do is we create the full so this is for the full data set I think let me uh okay yeah so we're going to go through the entire um PubMed data set here we're going to get the context and we're going to create our chunks okay again we're using that PubMed ID and the chunk number so we run that all right and we got all of this okay so looks good now what I want to do is move on to the creating our vectors all right so the first one I'm going to do is the dense vectors we're using a sentence transformer for this all right and the first thing we want to do is make sure that we're using Cuda if it's available otherwise you can use the future strain to be slower it's not going to be too slow it's not a huge data set that we're processing here but you know just be aware of that and the model that we're using is this verb base model that has been trained on Ms Marco which is like a information retrieval data set and specifically so this is important it has been trained to use dot product similarity and we need that with the for it to function with the sparse lens vectors that we are putting into pancake okay so they're basically they're compared in adult products similar to space so that is important and we initialize it on Cuda if we can right cool so we see the sentence Transformer details here and we can actually see here that the max sequence length that Ascension Transformer is 512 tokens so early on when we went for the 128 token limit with this one we can actually 512 so we could increase that quite a bit so I think we we set like 380 something for the character limit with this we could we could actually set like a hundred one thousand five hundred uh which is you know quite a bit more but anyway we're stick with what we have because with a lot of sentence Transformers they are restricted to that smaller smaller size and then we create a embedding like this so we have our dense model we encode and then we pass in our data right and we'll get a we'll see in a moment 768 dimensional dance selector cool uh you can also see that in the in the model get sentence and bending Dimension here as well this is important we'll need this one we're actually initializing our Vector index later so moving on to the sparse vectors we're using the splayed Coco condenser assembled this zone right so it's basically like an efficient splade model we who do we want I think this all looks good so we one thing we move it to Cuda if we can the aggregation here is Max so it's basically how it's creating its single vectors from the many vectors that it initially create you know I created a video on splade so you can you can go and take a look at that if you're interested there'll be a link to that in the video at the top somewhere okay so it takes tokenized inputs that need to be built with a tokenizer initialized with the same model ID okay so this model here right so we create our tokens like this we make sure to return Pi torch sensors and then to create our spa select as we do this so we're saying torch no grad which basically means like don't calculate the gradients of the model because it takes more time and we only need that for training the model right now we're just performing inference or prediction so it's not needed okay and what we do is we move the tokens to Cuda if we're using it uh and then we feed them into the model so the reason we moved to Cuda is because if we don't the the tokens we need to model are on CPU and the model is on GPU we're going to see an error so we need to make sure we include that in there and then here is the splayed Vector representations are output by the model and we use squeeze to reduce the dimensionality of that Vector so initially it's like I think it's like 30 000 comma one the shape we don't need that one so we just remove it like that all right so that gives us this damage converter which is huge right so 30.5 000 items right so that is actually the vocab size of the Bert model so every token that bird recognizes is represented by one of these values and uh essentially where creating a score for each one of those tokens through split right most of them are going to be zero right that's what makes it a sparse vector Now to create our you know the data format that we'll be feeding into Pinecone is essentially going to be like a dictionary of the position of the non-zero values to their the scores that they were assigned so what's that look like let me show you so here we can see we have 174 non-zero values here I should say that as well and we create this okay so let me show that is a kind of value example so we come up to here and we have our industry so position number 1000 in the score of that that token is this right and I think I have a little example of what that actually means here so we don't need to do this for processing things by Pinecone we are just doing this so like you know we can understand what this actually means so I'm going to create this this is a an index to token so like I said all of those those 30.5 000 values in that Vector that was output by splayed they all refer to a particular token right and in this these tokens are just numbers because that's what the transform model splayed will read which we can't read we don't understand that right we need actual text so this is mapping those positions those integer values to the actual text tokens that they represent and we process the dictionary that we just created up here through that and we get this right so let's um let's see so this is for can I see what this is for it's for this here right let's just have a look at what this is and then we'll see if it makes sense all right so program cell death and then is the regulated death of cells with an organism the lace plants produces so on so on a lattice work of longitude non-traverse transverse veins enclosing areolas no I don't know what I need that means but we can at least see that in this sparse dictionary we have so we have PC which is I think this is like it's coming from here it's not ideal but it's fine lace which is mentioned here programmed we have this up here uh Madagascar I don't know why that's coming from Death D is right so we have all of these and then I think we should also have some other words in here that are not actually from this because what splade does is actually identifies the words that are in the vet already or within this it identifies the most important words okay so I would say it's probably got that bright with like lace programmed the PC and the D here right and death lattice so all those probably the most important words in here it's not it's not giving us the word the or the word within right because it doesn't view those as being what are important but if we go down we'll probably see um we'll probably see some words that I'm not actually in here but are similar to words in here because part of what splade does is it expands it does term expansion which basically means based on the words it sees it adds other words that it doesn't see but that we might expect a document that we're searching for to contain so I think the word okay so the word die I don't think is in here right but you come down here and it is here regulated okay regulated is in there Lacy is probably not sort of lace plant all right so Lacy isn't I don't know if that is actually relevant I don't understand any of what this says we have plants and plants I wonder if both of those are in there so we've got plant plant okay we don't have plants right but that might be useful right so imagine in your in your document that this may well actually this is the document let's say in the query the user is searching for program cell death in plants or how do plants die from PCD right they would have the term dye and plants in there but they wouldn't have the term death or plant right so that's where the term expansion is really useful because then you do have that term overlap which is what traditional sparse Vector methods kind of lag so like bm25 they don't have that automatic term expansion so credos boss factors will create or we have seen how to create our dense photos and seen how to create our sparse vectors now let's have a look at how do this for everything so we're going to create a help function called Builder which is first going to transform a list of records from our data so the context in to this format here so this is the format that we're going to be feeding into pine cone all right so we have our Eid we have our dense Vector here we have our sparse Vector in the dictionary format that we saw already and then we have this metadata metadata just additional information that we can attach to our vectors in this case I'm going to include the text okay like the human readable text so what we'll do is we create Builder this is just going to go for everything right so let me let me go through everything here so we get our IDs from the records that we have there okay so we have our IDs so records is just everything um I believe yeah yeah so records is everything nowadays so it's going to extract the IDS for everything and then it's going to extract the context right so that's why we have like the pub ID followed by the chunk number that that's the ID right and then we have those kind of smaller sentence couple sentences troops and text and then from those terms of text what we're going to do is we're going to encode everything okay yeah that Chris I've done straighters then we're going to create our spa specters so we get our what is this but so input IDs that's creating our tokens and then we process our tokens through the the sparse or displayed model okay then what we do is we initialize an empty list which is where we're going to store everything to add to Pinecone and what we'll do is we go through the IDS the dance vectors the sparse vectors and the context that we've just created and we create this format here all right so this is for every record we have this format the ID values sparse values and metadata okay which is what I showed you just here right so with that we'll run this cell and let's try it with the first three records first okay so we'll just kind of loop through there we go so we get these there's a lot numbers in there but you know we have the metadata we have if I come up to here we have the these are the values and the indices for our splayed vector indices for the the spouse values we have our dense values our den Specter which is very big and then we have the ID all right cool so now we're going to do is initialize our connection to Pine Cone using free API keys so for that you will go here it's actually so app.pinecone.io you will end up on this page initially you go to API keys and you have your API key here it will probably say default click copy say that over here and you just put it into your API key I've stored mine in a variable called your API key and then for your environment you go back over to your console and you just copy whatever is in here so for me Us East 1 gcp yours is a good chance that'll be the same but it may it may vary all right cool so we run that so that just initializes our connection with pine cone and then what we want to do is actually create a index so we run this there's a few things that are important here so the index name is not so important you can kind of use whatever you want there but we do need to pass an index name dimensionality so that is the 768 dimensions of the dense Vector embedding model not not this display model the dense model we have to use the dark product metric to use the sparse lens vectors and for the Pod type we must use either S1 or P1 okay so that will just create the index and we can actually go to the console we go to indexes and we should see it if we refresh all right so we have this pubmap displayed the one in there now let's go to here and what we then need to do is initialize the connection to our index for this we can use either index or we can use grpc index which is just essentially faster and also a little bit more reliable in terms of your your connection it holds a safer connection to panko the index one is still very stable and still very fast but just not as good so we run those okay cool that will just give us some in the statistics of course our index is completely empty right now and the Damage our team is what we set before just happened during 68. now to add some vectors we just do this so index upset and we pass in what we created with Builder all right because Builder is outputting the format that we need to add things to Pinecone okay so we can see that we upset three items uh if we do that upset just means like insert like three items all right so cool we can repeat that for the full index so you can also increase the batch size depending on what Hardware you're using uh we'll stick with 64 which is pretty low just to be safe depending on what you're using and with this it's not going to take long right so we've got like a minute 20 here so I I'll skip ahead okay so that is complete it took one and a half minutes and then what we want to do is we're just going to check that the number of upside records aligns with the length of our original data okay so here is our original data and here's a number of items that we've just that are inside our index now so looks like everything is in there and we can move on to aquarium so our queries will need to contain both sparse and dense vectors so we're going to use this function here called encode and what that will allow us to do is we're just going to handle everything for us so we create our dense vectors we then create our sparse dictionary and we just return those okay so we're going to start with can clinicians use the phq-9 to assess suppression in people with vision loss Okay so we run this okay and we say Okay straight away I think to investigate we have a phq9 the essential psychometric characteristics to measure depressive symptoms in people with visual impairments so I would say that is probably uh correct so you see that we have depressive symptoms depression vision loss and visual impairment so it's not the word Zone align perfectly right but they they have the same meaning so my question here would be okay what is doing this is it the dense component or is it a sparse component and actually I mean we'll see that it's kind of both but what I wanted to show you is that we can actually scale the dense versus sparse components so the way that we do this is that we use this high risk scale function and what it's going to do is it's going to take a alpha value where the alpha when it is equal to one it will maximize the dense Vector but it will basically make the sparse Vector completely irrelevant if we use an alpha value of zero it means the sparse Vector is the only thing being used and the dense Vector is completely irrelevant and then we just want like an equal blend between the two of them we use 0.5 so let's first try a pure dense search and see what happens I need to run this okay and you see that we actually get the right answer up here straight away the score is different this is 181 whereas up here it is 203 it's not that much different but it's different okay so does that mean it's only the dense Vector doing this let's try Alpha value of 0.0 okay and we we actually get the same answer at the top again right so I think there is some variation I think that maybe this changes uh yeah so with the dense embedding I'm not sure if the performance on that is better or not but we do get slightly different results so when we have a mix of both we actually get the star resolve there so let's try some other questions that maybe will help us get slightly different responses you know what is going on here is that both models are actually very good for this data set so we don't see that much difference when we try and Vary them so does ibuprofen increase pre-operative blood loss during hip arthroplasty this is a sparse search and when we run it we get it to determine where the prior exposure of non okay this is ibuprofen from what I understand anti-inflammatory drugs increases this thing here perioperative blood loss associated with major orthopedic surgery right so I checked what this means and this basically means a hip replacement or sorry no this means a hip replacement and the the words I think both of them so this is like major surgery and this is a hip replacement which is major surgery that's what I understood it could be gone completely wrong but I'm not sure this one and then they mentioned hip replacement here so I think this one is relevant and this is using the pure sparse method right and then we we get this and this actually does talk about ibuprofen and this sort of stuff but I I don't know if that is it doesn't mention that the arthroplasty thing so I just assume it's it's not as relevant if we go pure dense okay we actually get the best answer at position number two which is so good right it's not it's not that it's not performing well that is a good performance but it's not quite as good as when we have the pure Spas right so what we'll find and and I put a ton of example questions in here from this PubMed QA paper so you can you can try a few of these but what we find is that some of them perform better with sparse some of them perform better with dents so what we what is a good approach to use here is to use a mix of both using the high research so we set it like Alpha two like 0.3 0.5 whatever seems to work best overall depending on your particular use case and overall we're going to get generally better performance now once you're done with all this if you've asked a couple more questions and so on what you need to do is just delete your index at the end save resources so that you're not using more than than what is needed so that's it for this video we've just kind of quickly been through an example of actually using hybrid search and Pinecone with splade and a dense director sentence transform model and I think the results are pretty good now this is just one example what we'll find is that the performance of hybrid search versus just pure dense or pure sparse search generally is a lot better so you know if you're able to to implement this in your search applications it's 100 worth doing but anyway for now we'll leave it there so I hope this video has been interesting and useful so thank you very much for watching and I will see you again in the next one bye

Original Description

In this video, we'll build a search engine for the medical field using hybrid search with NLP information retrieval models. We use hybrid search with sentence transformers and SPLADE for medical quesiton-answering. By using hybrid search we're able to search using both dense and sparse vectors. This allows us to cover semantics with the dense vectors, and features like exact matching and keyword search with the sparse vectors. For the sparse vectors we use SPLADE. SPLADE is the first sparse embedding method to outperform BM25 across a variety of tasks. It's an incredibly powerful technique that enables the typical sparse search advantages while also enabling learning term expansion to help minimize the vocabulary mismatch problem. The demo we work through here uses SPLADE and a sentence transformer model trained on MS-MARCO. These are all implemented via Hugging Face transformers. Finally, for the search component we use the Pinecone vector database. The only vector DB at the time of writing that natively supports SPLADE vectors. 🔗 Code notebook: https://github.com/pinecone-io/examples/blob/master/learn/search/hybrid-search/medical-qa/pubmed-splade.ipynb 🎙️ AI Dev Studio: https://aurelio.ai/ 🎉 Subscribe for Article and Video Updates! https://jamescalam.medium.com/subscribe https://medium.com/@jamescalam/membership 👾 Discord: https://discord.gg/c5QtDB9RAP 00:00 Hybrid search for medical field 00:18 Hybrid search process 02:42 Prerequisites and Installs 03:26 Pubmed QA data preprocessing step 08:25 Creating dense vectors with sentence-transformers 10:30 Creating sparse vector embeddings with SPLADE 18:12 Preparing sparse-dense format for Pinecone 21:02 Creating the Pinecone sparse-dense index 24:25 Making hybrid search queries 29:59 Final thoughts on sparse-dense with SPLADE #artificialintelligence #nlp #naturallanguageprocessing #machinelearning #searchengine

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from James Briggs · James Briggs · 0 of 60

← Previous Next →

Stoic Philosophy Text Generation with TensorFlow

Stoic Philosophy Text Generation with TensorFlow

How to Build TensorFlow Pipelines with tf.data.Dataset

How to Build TensorFlow Pipelines with tf.data.Dataset

Every New Feature in Python 3.10.0a2

Every New Feature in Python 3.10.0a2

How-to Build a Transformer for Language Classification in TensorFlow

How-to Build a Transformer for Language Classification in TensorFlow

How-to use the Kaggle API in Python

How-to use the Kaggle API in Python

Language Generation with OpenAI's GPT-2 in Python

Language Generation with OpenAI's GPT-2 in Python

Text Summarization with Google AI's T5 in Python

Text Summarization with Google AI's T5 in Python

How-to do Sentiment Analysis with Flair in Python

How-to do Sentiment Analysis with Flair in Python

Python Environment Setup for Machine Learning

Python Environment Setup for Machine Learning

Sequential Model - TensorFlow Essentials #1

Sequential Model - TensorFlow Essentials #1

Functional API - TensorFlow Essentials #2

Functional API - TensorFlow Essentials #2

Training Parameters - TensorFlow Essentials #3

Training Parameters - TensorFlow Essentials #3

Input Data Pipelines - TensorFlow Essentials #4

Input Data Pipelines - TensorFlow Essentials #4

6 of Python's Newest and Best Features (3.7-3.9)

6 of Python's Newest and Best Features (3.7-3.9)

Novice to Advanced RegEx in Less-than 30 Minutes + Python

Novice to Advanced RegEx in Less-than 30 Minutes + Python

Building a PlotLy $GME Chart in Python

Building a PlotLy $GME Chart in Python

How-to Use The Reddit API in Python

How-to Use The Reddit API in Python

How to Build Custom Q&A Transformer Models in Python

How to Build Custom Q&A Transformer Models in Python

How to Build Q&A Models in Python (Transformers)

How to Build Q&A Models in Python (Transformers)

How-to Decode Outputs From NLP Models (Python)

How-to Decode Outputs From NLP Models (Python)

Identify Stocks on Reddit with SpaCy (NER in Python)

Identify Stocks on Reddit with SpaCy (NER in Python)

Sentiment Analysis on ANY Length of Text With Transformers (Python)

Sentiment Analysis on ANY Length of Text With Transformers (Python)

Unicode Normalization for NLP in Python

Unicode Normalization for NLP in Python

The NEW Match-Case Statement in Python 3.10

The NEW Match-Case Statement in Python 3.10

Multi-Class Language Classification With BERT in TensorFlow

Multi-Class Language Classification With BERT in TensorFlow

How to Build Python Packages for Pip

How to Build Python Packages for Pip

How-to Structure a Q&A ML App

How-to Structure a Q&A ML App

How to Index Q&A Data With Haystack and Elasticsearch

How to Index Q&A Data With Haystack and Elasticsearch

Q&A Document Retrieval With DPR

Q&A Document Retrieval With DPR

How to Use Type Annotations in Python

How to Use Type Annotations in Python

Extractive Q&A With Haystack and FastAPI in Python

Extractive Q&A With Haystack and FastAPI in Python

Sentence Similarity With Sentence-Transformers in Python

Sentence Similarity With Sentence-Transformers in Python

Sentence Similarity With Transformers and PyTorch (Python)

Sentence Similarity With Transformers and PyTorch (Python)

NER With Transformers and spaCy (Python)

NER With Transformers and spaCy (Python)

Training BERT #1 - Masked-Language Modeling (MLM)

Training BERT #1 - Masked-Language Modeling (MLM)

Training BERT #2 - Train With Masked-Language Modeling (MLM)

Training BERT #2 - Train With Masked-Language Modeling (MLM)

Training BERT #3 - Next Sentence Prediction (NSP)

Training BERT #3 - Next Sentence Prediction (NSP)

Training BERT #4 - Train With Next Sentence Prediction (NSP)

Training BERT #4 - Train With Next Sentence Prediction (NSP)

FREE 11 Hour NLP Transformers Course (Next 3 Days Only)

FREE 11 Hour NLP Transformers Course (Next 3 Days Only)

New Features in Python 3.10

New Features in Python 3.10

Training BERT #5 - Training With BertForPretraining

Training BERT #5 - Training With BertForPretraining

How-to Use HuggingFace's Datasets - Transformers From Scratch #1

How-to Use HuggingFace's Datasets - Transformers From Scratch #1

Build a Custom Transformer Tokenizer - Transformers From Scratch #2

Build a Custom Transformer Tokenizer - Transformers From Scratch #2

3 Traditional Methods for Similarity Search (Jaccard, w-shingling, Levenshtein)

3 Traditional Methods for Similarity Search (Jaccard, w-shingling, Levenshtein)

3 Vector-based Methods for Similarity Search (TF-IDF, BM25, SBERT)

3 Vector-based Methods for Similarity Search (TF-IDF, BM25, SBERT)

Building MLM Training Input Pipeline - Transformers From Scratch #3

Building MLM Training Input Pipeline - Transformers From Scratch #3

Training and Testing an Italian BERT - Transformers From Scratch #4

Training and Testing an Italian BERT - Transformers From Scratch #4

Faiss - Introduction to Similarity Search

Faiss - Introduction to Similarity Search

Angular App Setup With Material - Stoic Q&A #5

Angular App Setup With Material - Stoic Q&A #5

Why are there so many Tokenization methods in HF Transformers?

Why are there so many Tokenization methods in HF Transformers?

Choosing Indexes for Similarity Search (Faiss in Python)

Choosing Indexes for Similarity Search (Faiss in Python)

Locality Sensitive Hashing (LSH) for Search with Shingling + MinHashing (Python)

Locality Sensitive Hashing (LSH) for Search with Shingling + MinHashing (Python)

How LSH Random Projection works in search (+Python)

How LSH Random Projection works in search (+Python)

IndexLSH for Fast Similarity Search in Faiss

IndexLSH for Fast Similarity Search in Faiss

Faiss - Vector Compression with PQ and IVFPQ (in Python)

Faiss - Vector Compression with PQ and IVFPQ (in Python)

Product Quantization for Vector Similarity Search (+ Python)

Product Quantization for Vector Similarity Search (+ Python)

How to Build a Bert WordPiece Tokenizer in Python and HuggingFace

How to Build a Bert WordPiece Tokenizer in Python and HuggingFace

Metadata Filtering for Vector Search + Latest Filter Tech

Metadata Filtering for Vector Search + Latest Filter Tech

Build NLP Pipelines with HuggingFace Datasets

Build NLP Pipelines with HuggingFace Datasets

Composite Indexes and the Faiss Index Factory

Composite Indexes and the Faiss Index Factory

More on: RAG Basics

View skill →

High Performance (Realtime) RAG Chains: From Basic to Advanced

High Performance (Realtime) RAG Chains: From Basic to Advanced

Coding the Ultimate RAG Engine from Zero

Coding the Ultimate RAG Engine from Zero

Building Agentic RAG From Scratch in Pure Python

Building Agentic RAG From Scratch in Pure Python

Build an LLM and RAG-based Chat Application using AlloyDB and LangChain

I Built a RAG App to Decode Airline Bureaucracy (So You Don't Have To)

I Built a RAG App to Decode Airline Bureaucracy (So You Don't Have To)

Akamai Developers

RAG Demo for Beginners: Full Hands-On Tutorial in Tamil | Build Your Own RAG AI | Karthik's Show

RAG Demo for Beginners: Full Hands-On Tutorial in Tamil | Build Your Own RAG AI | Karthik's Show

Related Reads

Add a Freshness Gate Before Your RAG Model Call

Learn to add a freshness gate before your RAG model call to ensure timely and valid responses

Optimizing RAG at Scale: Chunking, Retrieval, and the Bayesian Search That Cut Latency 40%

Learn how to optimize RAG at scale by implementing chunking, retrieval, and Bayesian search to reduce latency by 40% and achieve 95% recall@10

Optimizing RAG at Scale: Chunking, Retrieval, and the Bayesian Search That Cut Latency 40%

Optimize RAG at scale by implementing chunking, retrieval, and Bayesian search to reduce latency by 40% and achieve 95% recall@10

Optimizing RAG at Scale: Chunking, Retrieval, and the Bayesian Search That Cut Latency 40%

Learn how to optimize RAG at scale by implementing chunking, retrieval, and Bayesian search to reduce latency by 40% and achieve 95% recall@10

Chapters (10)

Hybrid search for medical field

0:18 Hybrid search process

2:42 Prerequisites and Installs

3:26 Pubmed QA data preprocessing step

8:25 Creating dense vectors with sentence-transformers

10:30 Creating sparse vector embeddings with SPLADE

18:12 Preparing sparse-dense format for Pinecone

21:02 Creating the Pinecone sparse-dense index

24:25 Making hybrid search queries

29:59 Final thoughts on sparse-dense with SPLADE

RAG for Your Docs