Advanced RAG techniques for developers

Google Cloud Tech · Beginner ·🧠 Large Language Models ·1y ago

Skills: Advanced RAG90%Vector Stores60%RAG Evaluation50%

Key Takeaways

The video demonstrates advanced RAG techniques, including combining semantic and keyword search, task type embedding, and enhancing the quality of responses from large language models by using traditional relational databases or graph databases, reranking, and multiple calls to the language model.

Full Transcript

welcome back to real terms for AI where we break down modern AI concepts for professional developers we did one episode on Rag and it's a powerful architecture for improving the quality of responses from an llm but sometimes basic rag just isn't enough to reach the quality you want luckily when that happens there are many Advanced Techniques based on rag that you can try one of the most important things to remember is that the context you provide to the llm matters a lot context in this case is the material you retrieve and augment your prompt with if the context isn't highly relevant to the user's prompter question the LM May generate a response that isn't relevant to the user's prompter question and if there isn't enough context the llm May generate an incorrect response it might hallucinate or it may not generate a response at all because context is so critical to response quality many Advanced rag techniques focus on improving the context included with the user's prompt all right so to make it easier to follow along today we'll cover these techniques based on the stage of the rag information flow they are used at and we're going to start with pre-processing in basic rag we divided our knowledge base into chunks and stored the chunks in a vector database so what are some of our options that may improve accuracy well our first option is that we could store some metadata with each chunk like the main topic the category the chunk fits in or a specific product that the chunk is relevant to or whatever makes sense in your specific use case you can manually input this data at ingestion time based on what you know about the data sources you are ingesting perhaps all the chunks from a specific products user manual should have that product ID in the metadata or labeling the information with the country tags of the countries that the specific choke is actually relevant to you can also ask an llm to expand its own understanding of existing chunks by generating new metadata for example providing an llm a list of potential labels and asking which ones to apply to a specific chunk via classification when you go to retrieve your chunks you can then use the metadata to filter your vector database before finding similar chunks that was a lot can you give me an example okay let's say that you know a user is asking about a specific product if your chunks have the metadata for the product or maybe like a product ID you can filter to just data about that specific specific product before you do the similarity search to ensure you only return the information relevant to the specific product for the prompt another technique takes advantage of the fact that questions or prompts and answers or responses often use different words in different order in this technique when you pre-process your data you ask the llm to generate a hypothetical question or prompt that could be answered by a specific chunk of data that you are processing then you store this question along with the data when you need defined information relevant to a user's prompt you search for similarity in the hypothetical questions instead of or in addition to looking for similarity In The Raw chunks that's pretty cool a final option we'll talk about here is that you can also Implement at the pre-processing phase a way to change how you store the data and rag doesn't have to use a vector database you can use traditional relational databases or even a graph database as well if it fits your data if the data you're using for context is structure something like a relational database may be better suited for that than something like a vector database and you can also store the same data in different ways for example you can store data twice using two different chunk sizes and pull from both when generating the information to accompany the user's prompts to the LM and if you have data in multiple data stores you can combine them at the retrieval stage of rag I think we need another example fair enough if a customer's prompts was about I don't know delayed shipment you could combine information about shipping methods from documents in your vector database with information about a customer's recent orders from a postgress database and general information about the weather impacts on shipping from maybe the shipping company's API and then R lm's response would potentially contain information from all of those sources if it was needed to actually answer the question exactly we also should keep in mind that while most rag tutorials use Vector databases Vector databases are not a musthave or even the best retrieval method and storage method for every use case based on requirements you may consider using other retrieval methods like relational databases keyword search Hybrid search graph databases and any search API you already have in your systems once you've retrieved relevant data you can help the LM use that data more efficiently through a process called reranking and reranking can get complicated and it probably deserves its own video but we'll try to give you the specifics and a couple ways that can be useful here so when you add reranking to your rag application you're adding a step between retrieving the chunks and sending those chunks to the llm with the prompt in the reranking step you use an algorithm of some kind to score the chunks by which ones are most relevant or useful to the user's prompt you can then use those scores to reorder the chunks and choose only the best ones to send to the llm so let's say your ranking algorithm May return aore for each of the chunks and then you can program your rag system to only send those chunks with a score of at least. n as example to the L and your reranking algorithm can take many things into account as it determines which pieces of data are most useful for example maybe you want more recent information to be considered more relevant or maybe you track user feedback and you want that data taken into account when deciding what data is relevant or maybe you know you found through experimenting the the particular sources Like official documentation often produce higher quality answers and you want those sources to be considered more relevant when deciding which context to include with your prompt and of course you can use Ai and data science here there are a variety of algorithms that can score how relevant a given chunk of data is for a given prompt and potentially you can combine several reranking techniques to get the best reranking for your use case I think the most important thing about reranking to think about is that it gives you another chance to ensure the context you're sending to the LM is the most relevant and the most helpful it can be in context of answering the question that's a lot of ideas for how to tune your rag app we've still got a few more ideas that people can try most rag systems make one call to the llm per user prompt but you may be able to improve the accuracy by making multiple calls to the LM as we mentioned in a previous video it can help to have the LM optimize the user's prompt this can remove spelling mistakes and unnecessary words and maybe replace words with more common synonyms you can also ask the LM to summarize all the data chunks you retrieve from your data store which may also improve the quality or accuracy of the responses and speaking of accuracy you can also ask the LM to evaluate the accuracy and relevance of its own results once you've generated an answer I know it seems like the llm should always respond that the text it just generated was correct but that's not how it works in practice all right are we done now we're done there's a lot of different things you can change about the basic rag architecture to improve the quality of the results and which one works for you depends on your use case your data and the other standard constraints like budget and latency that we've discussed before all right and if you'd like to try some of these techniques we've included links to Cod Labs on semantic search and using methods like graph rag in the description below this is a and Jason signing off and happy prompting

Original Description

Advanced RAG Techniques→ https://goo.gle/4dQTxQP Combining Semantic & Keyword Search → https://goo.gle/3NuYQuz Task Type Embedding → https://goo.gle/3AfAlOS Unlock the full potential of Retrieval Augmented Generation (RAG) with advanced techniques or enhancing the quality of responses from large language models (LLMs). Watch along as Aja and Jason from Googler delve into methods for optimizing LLM interactions, including data preprocessing, diverse retrieval methods, and leveraging multiple LLM calls for enhanced accuracy and relevance. Chapters: 0:00 - Welcome to advanced RAG 0:29 - Context matters 1:03 - Pre-processing & storage 3:53 - Retrieval 4:47 - Reranking 6:38 - Calling the LLM 7:32 - Conclusion Watch more Real Terms for AI → https://goo.gle/AIwordsExplained Subscribe to Google Cloud Tech → https://goo.gle/GoogleCloudTech #GoogleCloud #GenerativeAI Speakers: Jason Davenport, Aja Hammerly Products Mentioned: Cloud - AI and Machine Learning - AI building blocks

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Google Cloud Tech · Google Cloud Tech · 0 of 60

← Previous Next →

I’m going for it #GoogleCloudCertified

I’m going for it #GoogleCloudCertified

Google Cloud Tech

I had to get #GoogleCloudCertified

I had to get #GoogleCloudCertified

Google Cloud Tech

Be better overall at what you do #GoogleCloudCertified

Be better overall at what you do #GoogleCloudCertified

Google Cloud Tech

Cloud Monitoring on our radar #Analysis #Uptime

Cloud Monitoring on our radar #Analysis #Uptime

Google Cloud Tech

Introduction to Generative AI Studio

Introduction to Generative AI Studio

Google Cloud Tech

How to use Github Actions with Google's Workload Identity Federation

How to use Github Actions with Google's Workload Identity Federation

Google Cloud Tech

Introduction to Responsible AI

Introduction to Responsible AI

Google Cloud Tech

Networking updates and CDMC-certified architecture

Networking updates and CDMC-certified architecture

Google Cloud Tech

Create and use a Cloud Storage bucket

Create and use a Cloud Storage bucket

Google Cloud Tech

How to digitize text from documents

How to digitize text from documents

Google Cloud Tech

Faster analytical queries with AlloyDB

Faster analytical queries with AlloyDB

Google Cloud Tech

Next ‘23 sessions and FaaS Wave

Next ‘23 sessions and FaaS Wave

Google Cloud Tech

Introduction to Assured Open Source Software

Introduction to Assured Open Source Software

Google Cloud Tech

BigQuery Cost Optimization: Storage

BigQuery Cost Optimization: Storage

Google Cloud Tech

BigQuery Cost Optimization: Compute

BigQuery Cost Optimization: Compute

Google Cloud Tech

BigQuery Cost Optimization: Select Queries

BigQuery Cost Optimization: Select Queries

Google Cloud Tech

Remote Field Equipment Management with Manufacturing Data Engine

Remote Field Equipment Management with Manufacturing Data Engine

Google Cloud Tech

Supercharging your applications with Cloud SQL Enterprise Plus

Supercharging your applications with Cloud SQL Enterprise Plus

Google Cloud Tech

Vector Support on our radar #GenAI

Vector Support on our radar #GenAI

Google Cloud Tech

Architecting a blockchain startup with Google Cloud

Architecting a blockchain startup with Google Cloud

Google Cloud Tech

Kubernetes and multitasking updates!

Kubernetes and multitasking updates!

Google Cloud Tech

GKE: Using Kubernetes Events

GKE: Using Kubernetes Events

Google Cloud Tech

How to configure firewall rules for Cloud Composer

How to configure firewall rules for Cloud Composer

Google Cloud Tech

Vertex AI Embeddings API + Matching Engine: Grounding LLMs made easy

Vertex AI Embeddings API + Matching Engine: Grounding LLMs made easy

Google Cloud Tech

Geospatial analytics on our radar #EarthEngine #BigQuery

Geospatial analytics on our radar #EarthEngine #BigQuery

Google Cloud Tech

Ensuring requests are set in Kubernetes

Ensuring requests are set in Kubernetes

Google Cloud Tech

Cloud Next 2023, Google research program, and more!

Cloud Next 2023, Google research program, and more!

Google Cloud Tech

How to migrate projects between organizations with Resource Manager

How to migrate projects between organizations with Resource Manager

Google Cloud Tech

How to run #MySQL in Google Cloud

How to run #MySQL in Google Cloud

Google Cloud Tech

#GenerativeAI for enterprises and #Next2023

#GenerativeAI for enterprises and #Next2023

Google Cloud Tech

How Google Photos scales to store 4 trillion photos and videos

How Google Photos scales to store 4 trillion photos and videos

Google Cloud Tech

Google Cross-Cloud Interconnect (Demo 2)

Google Cross-Cloud Interconnect (Demo 2)

Google Cloud Tech

GKE Cost Optimization Golden Signals: Introduction

GKE Cost Optimization Golden Signals: Introduction

Google Cloud Tech

GKE Cost Optimization Golden Signals: Workload Rightsizing

GKE Cost Optimization Golden Signals: Workload Rightsizing

Google Cloud Tech

GKE Load Balancing: Overview

GKE Load Balancing: Overview

Google Cloud Tech

GKE Load Balancing: Best Practices

GKE Load Balancing: Best Practices

Google Cloud Tech

Disaster Recovery in GKE

Disaster Recovery in GKE

Google Cloud Tech

How to configure IP masquerade agent in GKE Standard clusters

How to configure IP masquerade agent in GKE Standard clusters

Google Cloud Tech

Enable and use GKE Control plane logs

Enable and use GKE Control plane logs

Google Cloud Tech

Compliance in Australia with Assured Workloads

Compliance in Australia with Assured Workloads

Google Cloud Tech

Creating budgets and budget alerts in Google Cloud #FinOps

Creating budgets and budget alerts in Google Cloud #FinOps

Google Cloud Tech

Cloud SQL Enterprise Plus on our radar #mySQL

Cloud SQL Enterprise Plus on our radar #mySQL

Google Cloud Tech

What's Next for Google Cloud?

What's Next for Google Cloud?

Google Cloud Tech

How Loveholidays scaled with Contact Center AI

How Loveholidays scaled with Contact Center AI

Google Cloud Tech

What is fleet team management in GKE?

What is fleet team management in GKE?

Google Cloud Tech

Troubleshoot VPC Network Peering

Troubleshoot VPC Network Peering

Google Cloud Tech

Introduction to DocAI and Contact Center AI

Introduction to DocAI and Contact Center AI

Google Cloud Tech

Cloud Run Direct VPC egress explained

Cloud Run Direct VPC egress explained

Google Cloud Tech

Database deployment options in GKE

Database deployment options in GKE

Google Cloud Tech

Analyze cloud billing data with #BigQuery

Analyze cloud billing data with #BigQuery

Google Cloud Tech

Tips to becoming a world-class Prompt Engineer

Tips to becoming a world-class Prompt Engineer

Google Cloud Tech

Serverless is simple. Do I need CI/CD?

Serverless is simple. Do I need CI/CD?

Google Cloud Tech

Accelerating model deployment with MLOps

Accelerating model deployment with MLOps

Google Cloud Tech

How Hawaii's Department of Human Services scaled with CCAI

How Hawaii's Department of Human Services scaled with CCAI

Google Cloud Tech

Pricing API on our #Radar

Pricing API on our #Radar

Google Cloud Tech

How Recommendations AI for Media can boost customer retention

How Recommendations AI for Media can boost customer retention

Google Cloud Tech

Troubleshooting: Node Not Ready Status

Troubleshooting: Node Not Ready Status

Google Cloud Tech

One weekend until Cloud Next 2023!

One weekend until Cloud Next 2023!

Google Cloud Tech

#GoogleCloudNext starts tomorrow!

#GoogleCloudNext starts tomorrow!

Google Cloud Tech

#GoogleCloudNext will be demand!

#GoogleCloudNext will be demand!

Google Cloud Tech

This video teaches advanced RAG techniques for developers to enhance the quality of responses from large language models. It covers combining semantic and keyword search, task type embedding, and modifying RAG architecture to improve results quality. By applying these techniques, developers can improve response quality, enhance LLM accuracy, and optimize user prompts.

Key Takeaways

Store metadata with each chunk
Ask LLM to generate hypothetical questions
Use traditional relational databases or graph databases
Combine data from multiple sources at retrieval stage
Use reranking to score and reorder chunks
Make multiple calls to the LM for improved accuracy
Optimize user prompts with LM
Summarize data chunks with LM

💡 Reranking can be used to score and reorder chunks for more efficient LM use, taking into account user feedback, recency of information, and source quality.

🔒 Pro feature: Ask AI to explain this lesson →

More on: Advanced RAG

View skill →

Advanced RAG Patterns

Advanced RAG Patterns

Advanced RAG 06 - RAG Fusion

Advanced RAG 06 - RAG Fusion

Advanced RAG 01: Small-to-Big Retrieval with LlamaIndex

Advanced RAG 01: Small-to-Big Retrieval with LlamaIndex

Sophia Yang (AI)

Toy Augmented Generation Project to a Production-Ready AI System

Toy Augmented Generation Project to a Production-Ready AI System

LLMOPS 02: RAG Analysis & Evaluation Strategy Part-2 | Advanced RAG Pipeline in LLMOPS

LLMOPS 02: RAG Analysis & Evaluation Strategy Part-2 | Advanced RAG Pipeline in LLMOPS

Advanced RAG Starts Today 🚀 | Day 54/180 AI Engineering Challenge | RAG Series Day 13

Advanced RAG Starts Today 🚀 | Day 54/180 AI Engineering Challenge | RAG Series Day 13

CodeWithPrashant

Related Reads

Claude Sonnet 5 Just Launched. Is It Actually Better Or Just Newer?

Learn how Claude Sonnet 5 compares to other models like Opus 4.8 and GPT 5.6 in terms of pricing, performance, and benchmarking, and understand what these differences mean for your projects

Claude Sonnet 5 Just Launched. Is It Actually Better Or Just Newer?

Learn how Claude Sonnet 5 compares to Frontier models in pricing, performance, and benchmarking, and what this means for your ML projects

Medium · Machine Learning

Claude Sonnet 5 Just Launched. Is It Actually Better Or Just Newer?

Learn how Claude Sonnet 5 compares to Frontier models in terms of pricing, performance, and benchmarking, and understand what these differences mean for your projects

Claude Sonnet 5 Didn’t Just Get Smarter. It Changed the Economics of AI.

Claude Sonnet 5's advancements have transformed the economics of AI, making it more viable for production

Chapters (7)

Welcome to advanced RAG

0:29 Context matters

1:03 Pre-processing & storage

3:53 Retrieval

4:47 Reranking

6:38 Calling the LLM

7:32 Conclusion

5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems

Dave Ebbelaar (LLM Eng)