Advanced RAG techniques for developers
Key Takeaways
The video demonstrates advanced RAG techniques, including combining semantic and keyword search, task type embedding, and enhancing the quality of responses from large language models by using traditional relational databases or graph databases, reranking, and multiple calls to the language model.
Full Transcript
welcome back to real terms for AI where we break down modern AI concepts for professional developers we did one episode on Rag and it's a powerful architecture for improving the quality of responses from an llm but sometimes basic rag just isn't enough to reach the quality you want luckily when that happens there are many Advanced Techniques based on rag that you can try one of the most important things to remember is that the context you provide to the llm matters a lot context in this case is the material you retrieve and augment your prompt with if the context isn't highly relevant to the user's prompter question the LM May generate a response that isn't relevant to the user's prompter question and if there isn't enough context the llm May generate an incorrect response it might hallucinate or it may not generate a response at all because context is so critical to response quality many Advanced rag techniques focus on improving the context included with the user's prompt all right so to make it easier to follow along today we'll cover these techniques based on the stage of the rag information flow they are used at and we're going to start with pre-processing in basic rag we divided our knowledge base into chunks and stored the chunks in a vector database so what are some of our options that may improve accuracy well our first option is that we could store some metadata with each chunk like the main topic the category the chunk fits in or a specific product that the chunk is relevant to or whatever makes sense in your specific use case you can manually input this data at ingestion time based on what you know about the data sources you are ingesting perhaps all the chunks from a specific products user manual should have that product ID in the metadata or labeling the information with the country tags of the countries that the specific choke is actually relevant to you can also ask an llm to expand its own understanding of existing chunks by generating new metadata for example providing an llm a list of potential labels and asking which ones to apply to a specific chunk via classification when you go to retrieve your chunks you can then use the metadata to filter your vector database before finding similar chunks that was a lot can you give me an example okay let's say that you know a user is asking about a specific product if your chunks have the metadata for the product or maybe like a product ID you can filter to just data about that specific specific product before you do the similarity search to ensure you only return the information relevant to the specific product for the prompt another technique takes advantage of the fact that questions or prompts and answers or responses often use different words in different order in this technique when you pre-process your data you ask the llm to generate a hypothetical question or prompt that could be answered by a specific chunk of data that you are processing then you store this question along with the data when you need defined information relevant to a user's prompt you search for similarity in the hypothetical questions instead of or in addition to looking for similarity In The Raw chunks that's pretty cool a final option we'll talk about here is that you can also Implement at the pre-processing phase a way to change how you store the data and rag doesn't have to use a vector database you can use traditional relational databases or even a graph database as well if it fits your data if the data you're using for context is structure something like a relational database may be better suited for that than something like a vector database and you can also store the same data in different ways for example you can store data twice using two different chunk sizes and pull from both when generating the information to accompany the user's prompts to the LM and if you have data in multiple data stores you can combine them at the retrieval stage of rag I think we need another example fair enough if a customer's prompts was about I don't know delayed shipment you could combine information about shipping methods from documents in your vector database with information about a customer's recent orders from a postgress database and general information about the weather impacts on shipping from maybe the shipping company's API and then R lm's response would potentially contain information from all of those sources if it was needed to actually answer the question exactly we also should keep in mind that while most rag tutorials use Vector databases Vector databases are not a musthave or even the best retrieval method and storage method for every use case based on requirements you may consider using other retrieval methods like relational databases keyword search Hybrid search graph databases and any search API you already have in your systems once you've retrieved relevant data you can help the LM use that data more efficiently through a process called reranking and reranking can get complicated and it probably deserves its own video but we'll try to give you the specifics and a couple ways that can be useful here so when you add reranking to your rag application you're adding a step between retrieving the chunks and sending those chunks to the llm with the prompt in the reranking step you use an algorithm of some kind to score the chunks by which ones are most relevant or useful to the user's prompt you can then use those scores to reorder the chunks and choose only the best ones to send to the llm so let's say your ranking algorithm May return aore for each of the chunks and then you can program your rag system to only send those chunks with a score of at least. n as example to the L and your reranking algorithm can take many things into account as it determines which pieces of data are most useful for example maybe you want more recent information to be considered more relevant or maybe you track user feedback and you want that data taken into account when deciding what data is relevant or maybe you know you found through experimenting the the particular sources Like official documentation often produce higher quality answers and you want those sources to be considered more relevant when deciding which context to include with your prompt and of course you can use Ai and data science here there are a variety of algorithms that can score how relevant a given chunk of data is for a given prompt and potentially you can combine several reranking techniques to get the best reranking for your use case I think the most important thing about reranking to think about is that it gives you another chance to ensure the context you're sending to the LM is the most relevant and the most helpful it can be in context of answering the question that's a lot of ideas for how to tune your rag app we've still got a few more ideas that people can try most rag systems make one call to the llm per user prompt but you may be able to improve the accuracy by making multiple calls to the LM as we mentioned in a previous video it can help to have the LM optimize the user's prompt this can remove spelling mistakes and unnecessary words and maybe replace words with more common synonyms you can also ask the LM to summarize all the data chunks you retrieve from your data store which may also improve the quality or accuracy of the responses and speaking of accuracy you can also ask the LM to evaluate the accuracy and relevance of its own results once you've generated an answer I know it seems like the llm should always respond that the text it just generated was correct but that's not how it works in practice all right are we done now we're done there's a lot of different things you can change about the basic rag architecture to improve the quality of the results and which one works for you depends on your use case your data and the other standard constraints like budget and latency that we've discussed before all right and if you'd like to try some of these techniques we've included links to Cod Labs on semantic search and using methods like graph rag in the description below this is a and Jason signing off and happy prompting
Original Description
Advanced RAG Techniques→ https://goo.gle/4dQTxQP
Combining Semantic & Keyword Search → https://goo.gle/3NuYQuz
Task Type Embedding → https://goo.gle/3AfAlOS
Unlock the full potential of Retrieval Augmented Generation (RAG) with advanced techniques or enhancing the quality of responses from large language models (LLMs). Watch along as Aja and Jason from Googler delve into methods for optimizing LLM interactions, including data preprocessing, diverse retrieval methods, and leveraging multiple LLM calls for enhanced accuracy and relevance.
Chapters:
0:00 - Welcome to advanced RAG
0:29 - Context matters
1:03 - Pre-processing & storage
3:53 - Retrieval
4:47 - Reranking
6:38 - Calling the LLM
7:32 - Conclusion
Watch more Real Terms for AI → https://goo.gle/AIwordsExplained
Subscribe to Google Cloud Tech → https://goo.gle/GoogleCloudTech
#GoogleCloud #GenerativeAI
Speakers: Jason Davenport, Aja Hammerly
Products Mentioned: Cloud - AI and Machine Learning - AI building blocks
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Google Cloud Tech · Google Cloud Tech · 0 of 60
← Previous
Next →
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
I’m going for it #GoogleCloudCertified
Google Cloud Tech
I had to get #GoogleCloudCertified
Google Cloud Tech
Be better overall at what you do #GoogleCloudCertified
Google Cloud Tech
Cloud Monitoring on our radar #Analysis #Uptime
Google Cloud Tech
Introduction to Generative AI Studio
Google Cloud Tech
How to use Github Actions with Google's Workload Identity Federation
Google Cloud Tech
Introduction to Responsible AI
Google Cloud Tech
Networking updates and CDMC-certified architecture
Google Cloud Tech
Create and use a Cloud Storage bucket
Google Cloud Tech
How to digitize text from documents
Google Cloud Tech
Faster analytical queries with AlloyDB
Google Cloud Tech
Next ‘23 sessions and FaaS Wave
Google Cloud Tech
Introduction to Assured Open Source Software
Google Cloud Tech
BigQuery Cost Optimization: Storage
Google Cloud Tech
BigQuery Cost Optimization: Compute
Google Cloud Tech
BigQuery Cost Optimization: Select Queries
Google Cloud Tech
Remote Field Equipment Management with Manufacturing Data Engine
Google Cloud Tech
Supercharging your applications with Cloud SQL Enterprise Plus
Google Cloud Tech
Vector Support on our radar #GenAI
Google Cloud Tech
Architecting a blockchain startup with Google Cloud
Google Cloud Tech
Kubernetes and multitasking updates!
Google Cloud Tech
GKE: Using Kubernetes Events
Google Cloud Tech
How to configure firewall rules for Cloud Composer
Google Cloud Tech
Vertex AI Embeddings API + Matching Engine: Grounding LLMs made easy
Google Cloud Tech
Geospatial analytics on our radar #EarthEngine #BigQuery
Google Cloud Tech
Ensuring requests are set in Kubernetes
Google Cloud Tech
Cloud Next 2023, Google research program, and more!
Google Cloud Tech
How to migrate projects between organizations with Resource Manager
Google Cloud Tech
How to run #MySQL in Google Cloud
Google Cloud Tech
#GenerativeAI for enterprises and #Next2023
Google Cloud Tech
How Google Photos scales to store 4 trillion photos and videos
Google Cloud Tech
Google Cross-Cloud Interconnect (Demo 2)
Google Cloud Tech
GKE Cost Optimization Golden Signals: Introduction
Google Cloud Tech
GKE Cost Optimization Golden Signals: Workload Rightsizing
Google Cloud Tech
GKE Load Balancing: Overview
Google Cloud Tech
GKE Load Balancing: Best Practices
Google Cloud Tech
Disaster Recovery in GKE
Google Cloud Tech
How to configure IP masquerade agent in GKE Standard clusters
Google Cloud Tech
Enable and use GKE Control plane logs
Google Cloud Tech
Compliance in Australia with Assured Workloads
Google Cloud Tech
Creating budgets and budget alerts in Google Cloud #FinOps
Google Cloud Tech
Cloud SQL Enterprise Plus on our radar #mySQL
Google Cloud Tech
What's Next for Google Cloud?
Google Cloud Tech
How Loveholidays scaled with Contact Center AI
Google Cloud Tech
What is fleet team management in GKE?
Google Cloud Tech
Troubleshoot VPC Network Peering
Google Cloud Tech
Introduction to DocAI and Contact Center AI
Google Cloud Tech
Cloud Run Direct VPC egress explained
Google Cloud Tech
Database deployment options in GKE
Google Cloud Tech
Analyze cloud billing data with #BigQuery
Google Cloud Tech
Tips to becoming a world-class Prompt Engineer
Google Cloud Tech
Serverless is simple. Do I need CI/CD?
Google Cloud Tech
Accelerating model deployment with MLOps
Google Cloud Tech
How Hawaii's Department of Human Services scaled with CCAI
Google Cloud Tech
Pricing API on our #Radar
Google Cloud Tech
How Recommendations AI for Media can boost customer retention
Google Cloud Tech
Troubleshooting: Node Not Ready Status
Google Cloud Tech
One weekend until Cloud Next 2023!
Google Cloud Tech
#GoogleCloudNext starts tomorrow!
Google Cloud Tech
#GoogleCloudNext will be demand!
Google Cloud Tech
More on: Advanced RAG
View skill →Related Reads
📰
📰
📰
📰
Claude Sonnet 5 Just Launched. Is It Actually Better Or Just Newer?
Medium · AI
Claude Sonnet 5 Just Launched. Is It Actually Better Or Just Newer?
Medium · Machine Learning
Claude Sonnet 5 Just Launched. Is It Actually Better Or Just Newer?
Medium · LLM
Claude Sonnet 5 Didn’t Just Get Smarter. It Changed the Economics of AI.
Medium · LLM
Chapters (7)
Welcome to advanced RAG
0:29
Context matters
1:03
Pre-processing & storage
3:53
Retrieval
4:47
Reranking
6:38
Calling the LLM
7:32
Conclusion
🎓
Tutor Explanation
DeepCamp AI