Advanced RAG techniques for developers

Google Cloud Tech · Beginner ·🧠 Large Language Models ·1y ago

Key Takeaways

The video demonstrates advanced RAG techniques, including combining semantic and keyword search, task type embedding, and enhancing the quality of responses from large language models by using traditional relational databases or graph databases, reranking, and multiple calls to the language model.

Full Transcript

welcome back to real terms for AI where we break down modern AI concepts for professional developers we did one episode on Rag and it's a powerful architecture for improving the quality of responses from an llm but sometimes basic rag just isn't enough to reach the quality you want luckily when that happens there are many Advanced Techniques based on rag that you can try one of the most important things to remember is that the context you provide to the llm matters a lot context in this case is the material you retrieve and augment your prompt with if the context isn't highly relevant to the user's prompter question the LM May generate a response that isn't relevant to the user's prompter question and if there isn't enough context the llm May generate an incorrect response it might hallucinate or it may not generate a response at all because context is so critical to response quality many Advanced rag techniques focus on improving the context included with the user's prompt all right so to make it easier to follow along today we'll cover these techniques based on the stage of the rag information flow they are used at and we're going to start with pre-processing in basic rag we divided our knowledge base into chunks and stored the chunks in a vector database so what are some of our options that may improve accuracy well our first option is that we could store some metadata with each chunk like the main topic the category the chunk fits in or a specific product that the chunk is relevant to or whatever makes sense in your specific use case you can manually input this data at ingestion time based on what you know about the data sources you are ingesting perhaps all the chunks from a specific products user manual should have that product ID in the metadata or labeling the information with the country tags of the countries that the specific choke is actually relevant to you can also ask an llm to expand its own understanding of existing chunks by generating new metadata for example providing an llm a list of potential labels and asking which ones to apply to a specific chunk via classification when you go to retrieve your chunks you can then use the metadata to filter your vector database before finding similar chunks that was a lot can you give me an example okay let's say that you know a user is asking about a specific product if your chunks have the metadata for the product or maybe like a product ID you can filter to just data about that specific specific product before you do the similarity search to ensure you only return the information relevant to the specific product for the prompt another technique takes advantage of the fact that questions or prompts and answers or responses often use different words in different order in this technique when you pre-process your data you ask the llm to generate a hypothetical question or prompt that could be answered by a specific chunk of data that you are processing then you store this question along with the data when you need defined information relevant to a user's prompt you search for similarity in the hypothetical questions instead of or in addition to looking for similarity In The Raw chunks that's pretty cool a final option we'll talk about here is that you can also Implement at the pre-processing phase a way to change how you store the data and rag doesn't have to use a vector database you can use traditional relational databases or even a graph database as well if it fits your data if the data you're using for context is structure something like a relational database may be better suited for that than something like a vector database and you can also store the same data in different ways for example you can store data twice using two different chunk sizes and pull from both when generating the information to accompany the user's prompts to the LM and if you have data in multiple data stores you can combine them at the retrieval stage of rag I think we need another example fair enough if a customer's prompts was about I don't know delayed shipment you could combine information about shipping methods from documents in your vector database with information about a customer's recent orders from a postgress database and general information about the weather impacts on shipping from maybe the shipping company's API and then R lm's response would potentially contain information from all of those sources if it was needed to actually answer the question exactly we also should keep in mind that while most rag tutorials use Vector databases Vector databases are not a musthave or even the best retrieval method and storage method for every use case based on requirements you may consider using other retrieval methods like relational databases keyword search Hybrid search graph databases and any search API you already have in your systems once you've retrieved relevant data you can help the LM use that data more efficiently through a process called reranking and reranking can get complicated and it probably deserves its own video but we'll try to give you the specifics and a couple ways that can be useful here so when you add reranking to your rag application you're adding a step between retrieving the chunks and sending those chunks to the llm with the prompt in the reranking step you use an algorithm of some kind to score the chunks by which ones are most relevant or useful to the user's prompt you can then use those scores to reorder the chunks and choose only the best ones to send to the llm so let's say your ranking algorithm May return aore for each of the chunks and then you can program your rag system to only send those chunks with a score of at least. n as example to the L and your reranking algorithm can take many things into account as it determines which pieces of data are most useful for example maybe you want more recent information to be considered more relevant or maybe you track user feedback and you want that data taken into account when deciding what data is relevant or maybe you know you found through experimenting the the particular sources Like official documentation often produce higher quality answers and you want those sources to be considered more relevant when deciding which context to include with your prompt and of course you can use Ai and data science here there are a variety of algorithms that can score how relevant a given chunk of data is for a given prompt and potentially you can combine several reranking techniques to get the best reranking for your use case I think the most important thing about reranking to think about is that it gives you another chance to ensure the context you're sending to the LM is the most relevant and the most helpful it can be in context of answering the question that's a lot of ideas for how to tune your rag app we've still got a few more ideas that people can try most rag systems make one call to the llm per user prompt but you may be able to improve the accuracy by making multiple calls to the LM as we mentioned in a previous video it can help to have the LM optimize the user's prompt this can remove spelling mistakes and unnecessary words and maybe replace words with more common synonyms you can also ask the LM to summarize all the data chunks you retrieve from your data store which may also improve the quality or accuracy of the responses and speaking of accuracy you can also ask the LM to evaluate the accuracy and relevance of its own results once you've generated an answer I know it seems like the llm should always respond that the text it just generated was correct but that's not how it works in practice all right are we done now we're done there's a lot of different things you can change about the basic rag architecture to improve the quality of the results and which one works for you depends on your use case your data and the other standard constraints like budget and latency that we've discussed before all right and if you'd like to try some of these techniques we've included links to Cod Labs on semantic search and using methods like graph rag in the description below this is a and Jason signing off and happy prompting

Original Description

Advanced RAG Techniques→ https://goo.gle/4dQTxQP Combining Semantic & Keyword Search → https://goo.gle/3NuYQuz Task Type Embedding → https://goo.gle/3AfAlOS Unlock the full potential of Retrieval Augmented Generation (RAG) with advanced techniques or enhancing the quality of responses from large language models (LLMs). Watch along as Aja and Jason from Googler delve into methods for optimizing LLM interactions, including data preprocessing, diverse retrieval methods, and leveraging multiple LLM calls for enhanced accuracy and relevance. Chapters: 0:00 - Welcome to advanced RAG 0:29 - Context matters 1:03 - Pre-processing & storage 3:53 - Retrieval 4:47 - Reranking 6:38 - Calling the LLM 7:32 - Conclusion Watch more Real Terms for AI → https://goo.gle/AIwordsExplained Subscribe to Google Cloud Tech → https://goo.gle/GoogleCloudTech #GoogleCloud #GenerativeAI Speakers: Jason Davenport, Aja Hammerly Products Mentioned: Cloud - AI and Machine Learning - AI building blocks
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Google Cloud Tech · Google Cloud Tech · 0 of 60

← Previous Next →
1 I’m going for it #GoogleCloudCertified
I’m going for it #GoogleCloudCertified
Google Cloud Tech
2 I had to get #GoogleCloudCertified
I had to get #GoogleCloudCertified
Google Cloud Tech
3 Be better overall at what you do #GoogleCloudCertified
Be better overall at what you do #GoogleCloudCertified
Google Cloud Tech
4 Cloud Monitoring on our radar #Analysis #Uptime
Cloud Monitoring on our radar #Analysis #Uptime
Google Cloud Tech
5 Introduction to Generative AI Studio
Introduction to Generative AI Studio
Google Cloud Tech
6 How to use Github Actions with Google's Workload Identity Federation
How to use Github Actions with Google's Workload Identity Federation
Google Cloud Tech
7 Introduction to Responsible AI
Introduction to Responsible AI
Google Cloud Tech
8 Networking updates and CDMC-certified architecture
Networking updates and CDMC-certified architecture
Google Cloud Tech
9 Create and use a Cloud Storage bucket
Create and use a Cloud Storage bucket
Google Cloud Tech
10 How to digitize text from documents
How to digitize text from documents
Google Cloud Tech
11 Faster analytical queries with AlloyDB
Faster analytical queries with AlloyDB
Google Cloud Tech
12 Next ‘23 sessions and FaaS Wave
Next ‘23 sessions and FaaS Wave
Google Cloud Tech
13 Introduction to Assured Open Source Software
Introduction to Assured Open Source Software
Google Cloud Tech
14 BigQuery Cost Optimization: Storage
BigQuery Cost Optimization: Storage
Google Cloud Tech
15 BigQuery Cost Optimization: Compute
BigQuery Cost Optimization: Compute
Google Cloud Tech
16 BigQuery Cost Optimization: Select Queries
BigQuery Cost Optimization: Select Queries
Google Cloud Tech
17 Remote Field Equipment Management with Manufacturing Data Engine
Remote Field Equipment Management with Manufacturing Data Engine
Google Cloud Tech
18 Supercharging your applications with Cloud SQL Enterprise Plus
Supercharging your applications with Cloud SQL Enterprise Plus
Google Cloud Tech
19 Vector Support on our radar #GenAI
Vector Support on our radar #GenAI
Google Cloud Tech
20 Architecting a blockchain startup with Google Cloud
Architecting a blockchain startup with Google Cloud
Google Cloud Tech
21 Kubernetes and multitasking updates!
Kubernetes and multitasking updates!
Google Cloud Tech
22 GKE: Using Kubernetes Events
GKE: Using Kubernetes Events
Google Cloud Tech
23 How to configure firewall rules for Cloud Composer
How to configure firewall rules for Cloud Composer
Google Cloud Tech
24 Vertex AI Embeddings API + Matching Engine: Grounding LLMs made easy
Vertex AI Embeddings API + Matching Engine: Grounding LLMs made easy
Google Cloud Tech
25 Geospatial analytics on our radar #EarthEngine #BigQuery
Geospatial analytics on our radar #EarthEngine #BigQuery
Google Cloud Tech
26 Ensuring requests are set in Kubernetes
Ensuring requests are set in Kubernetes
Google Cloud Tech
27 Cloud Next 2023, Google research program, and more!
Cloud Next 2023, Google research program, and more!
Google Cloud Tech
28 How to migrate projects between organizations with Resource Manager
How to migrate projects between organizations with Resource Manager
Google Cloud Tech
29 How to run #MySQL in Google Cloud
How to run #MySQL in Google Cloud
Google Cloud Tech
30 #GenerativeAI for enterprises and #Next2023
#GenerativeAI for enterprises and #Next2023
Google Cloud Tech
31 How Google Photos scales to store 4 trillion photos and videos
How Google Photos scales to store 4 trillion photos and videos
Google Cloud Tech
32 Google Cross-Cloud Interconnect (Demo 2)
Google Cross-Cloud Interconnect (Demo 2)
Google Cloud Tech
33 GKE Cost Optimization Golden Signals: Introduction
GKE Cost Optimization Golden Signals: Introduction
Google Cloud Tech
34 GKE Cost Optimization Golden Signals: Workload Rightsizing
GKE Cost Optimization Golden Signals: Workload Rightsizing
Google Cloud Tech
35 GKE Load Balancing: Overview
GKE Load Balancing: Overview
Google Cloud Tech
36 GKE Load Balancing: Best Practices
GKE Load Balancing: Best Practices
Google Cloud Tech
37 Disaster Recovery in GKE
Disaster Recovery in GKE
Google Cloud Tech
38 How to configure IP masquerade agent in GKE Standard clusters
How to configure IP masquerade agent in GKE Standard clusters
Google Cloud Tech
39 Enable and use GKE Control plane logs
Enable and use GKE Control plane logs
Google Cloud Tech
40 Compliance in Australia with Assured Workloads
Compliance in Australia with Assured Workloads
Google Cloud Tech
41 Creating budgets and budget alerts in Google Cloud #FinOps
Creating budgets and budget alerts in Google Cloud #FinOps
Google Cloud Tech
42 Cloud SQL Enterprise Plus on our radar #mySQL
Cloud SQL Enterprise Plus on our radar #mySQL
Google Cloud Tech
43 What's Next for Google Cloud?
What's Next for Google Cloud?
Google Cloud Tech
44 How Loveholidays scaled with Contact Center AI
How Loveholidays scaled with Contact Center AI
Google Cloud Tech
45 What is fleet team management in GKE?
What is fleet team management in GKE?
Google Cloud Tech
46 Troubleshoot VPC Network Peering
Troubleshoot VPC Network Peering
Google Cloud Tech
47 Introduction to DocAI and Contact Center AI
Introduction to DocAI and Contact Center AI
Google Cloud Tech
48 Cloud Run Direct VPC egress explained
Cloud Run Direct VPC egress explained
Google Cloud Tech
49 Database deployment options in GKE
Database deployment options in GKE
Google Cloud Tech
50 Analyze cloud billing data with #BigQuery
Analyze cloud billing data with #BigQuery
Google Cloud Tech
51 Tips to becoming a world-class Prompt Engineer
Tips to becoming a world-class Prompt Engineer
Google Cloud Tech
52 Serverless is simple. Do I need CI/CD?
Serverless is simple. Do I need CI/CD?
Google Cloud Tech
53 Accelerating model deployment with MLOps
Accelerating model deployment with MLOps
Google Cloud Tech
54 How Hawaii's Department of Human Services scaled with CCAI
How Hawaii's Department of Human Services scaled with CCAI
Google Cloud Tech
55 Pricing API on our #Radar
Pricing API on our #Radar
Google Cloud Tech
56 How Recommendations AI for Media can boost customer retention
How Recommendations AI for Media can boost customer retention
Google Cloud Tech
57 Troubleshooting: Node Not Ready Status
Troubleshooting: Node Not Ready Status
Google Cloud Tech
58 One weekend until Cloud Next 2023!
One weekend until Cloud Next 2023!
Google Cloud Tech
59 #GoogleCloudNext starts tomorrow!
#GoogleCloudNext starts tomorrow!
Google Cloud Tech
60 #GoogleCloudNext will be demand!
#GoogleCloudNext will be demand!
Google Cloud Tech

This video teaches advanced RAG techniques for developers to enhance the quality of responses from large language models. It covers combining semantic and keyword search, task type embedding, and modifying RAG architecture to improve results quality. By applying these techniques, developers can improve response quality, enhance LLM accuracy, and optimize user prompts.

Key Takeaways
  1. Store metadata with each chunk
  2. Ask LLM to generate hypothetical questions
  3. Use traditional relational databases or graph databases
  4. Combine data from multiple sources at retrieval stage
  5. Use reranking to score and reorder chunks
  6. Make multiple calls to the LM for improved accuracy
  7. Optimize user prompts with LM
  8. Summarize data chunks with LM
💡 Reranking can be used to score and reorder chunks for more efficient LM use, taking into account user feedback, recency of information, and source quality.

Related Reads

📰
Claude Sonnet 5 Just Launched. Is It Actually Better Or Just Newer?
Learn how Claude Sonnet 5 compares to other models like Opus 4.8 and GPT 5.6 in terms of pricing, performance, and benchmarking, and understand what these differences mean for your projects
Medium · AI
📰
Claude Sonnet 5 Just Launched. Is It Actually Better Or Just Newer?
Learn how Claude Sonnet 5 compares to Frontier models in pricing, performance, and benchmarking, and what this means for your ML projects
Medium · Machine Learning
📰
Claude Sonnet 5 Just Launched. Is It Actually Better Or Just Newer?
Learn how Claude Sonnet 5 compares to Frontier models in terms of pricing, performance, and benchmarking, and understand what these differences mean for your projects
Medium · LLM
📰
Claude Sonnet 5 Didn’t Just Get Smarter. It Changed the Economics of AI.
Claude Sonnet 5's advancements have transformed the economics of AI, making it more viable for production
Medium · LLM

Chapters (7)

Welcome to advanced RAG
0:29 Context matters
1:03 Pre-processing & storage
3:53 Retrieval
4:47 Reranking
6:38 Calling the LLM
7:32 Conclusion
Up next
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Dave Ebbelaar (LLM Eng)
Watch →