Early days of RAG and LlamaIndex - Jerry Liu

Alejandro AO · Beginner ·🔍 RAG & Vector Search ·2y ago

Skills: LLM Engineering90%LLM Foundations80%Prompt Craft60%

Key Takeaways

The video discusses the history and evolution of Retrieval Augmented Generation (RAG) and its application in AI, specifically with LlamaIndex, highlighting the potential of using Large Language Models (LLMs) for data transformation and decision-making.

Full Transcript

All right, so about that history of rag, modern rag was actually kind of born in 2022 then, right? Um, yeah, I mean I think like I forgot the exact date of the like the original retrieval augmented generation paper, which is not by me. Obviously it was by others. And so this is either in 2021 or 2022 or even 2020. Um, and the like it basically proposed this overall idea of oh, you know, you take in some set of like documents and then you watch it embed them, like put them through an embedding model and then put it into some some storage system that's able to like serve like the relevant documents through retrieval, right? And so that's why it's called like retrieval augmented generation. Like you want to do like a retrieval pass over some storage system before you actually put it into the LLM prompt. So that kind of that resurfaced basically as more and more people start building with LLM apps. People like kind of start discovering that oh, hey, this thing is like a cool idea. And my initial version was not doing that. It wasn't using embeddings because at the time, I don't know why. I think it might have just been like due to like a design like my my goal at the time when I first started was not necessarily to make this useful. It was to just like do something cool. And so to me like doing something cool was like oh, what if we just didn't have embeddings or I thought about it briefly, but I was like what I really wanted to do is just have the LLM figure it out completely on its own, right? And I I I still think that would be a quite an interesting concept. Like instead of just like, you know, relying on a separate model, just have a language model completely similar to like a human, just completely figure out how to reason, organize things and then also traverse them via text. Um, yeah. And that kind of reflects in the current state of Llama Index, right? Because I mean, you it's kind of the central, I mean, as far as I can see, it's kind of the central part of Llama Index that you use language models as well in during the ingestion process, not only in the in the generation process. Yeah, so the default rag paradigm really only uses the LLM at the very end. So you have like ingestion, like ingestion doesn't need LLMs. You just take in you know, some data, parse it and then you just like chunk it using an algorithm. And of course you use an embedding model to put it into some vector store. And then the retrieval process doesn't use an LLM cuz at its simplest it's just like top K embedding look up. So like, you know, you look up stuff by embedding similarity. And so in a standard rag pipeline, the general the the way the place where LLMs actually come in is at the very end. And it's only responsible for um kind of just like synthesizing an answer from a piece of unstructured text. And to be totally honest, like, you know, it like even like at the start when we were just like implementing this, I thought it was a little basic and it didn't really like make use LLMs to its full potential, right? Cuz like LLMs are not just for generation and and simple reasoning. They can actually help you make decisions. They can actually help you like like a greater layer of like just like understanding and decision making. And so if you really wanted to make these systems more interesting, you could use LLMs kind of like at the beginning. So for instance, during the data ingestion phase or you know, during query time. Instead of just using it at the very end for generation, use it for like query understanding, use it for like processing, like evaluating like the quality of your retrieved context. And then for instance, like not only just retrieving from vector store, actually using a variety of different tools. And so on the ingestion side, the places that you can use LLMs. And so this this this overall concept is pretty interesting, which is um the whole point of like ingestion is to process data for your LLM app. And so that's kind of like ETL for LLMs, right? But you can also use LLMs for ETL because, you know, LLMs have an inherent capability of understanding unstructured data and transforming it. And that part I think is interesting. Like so for instance, let's say, you know, for each unstructured document, you wanted to extract like a summary, the table of contents, like, you know, extract like a set of like topics or tags for each Basically, you can figure out a clever way to prompt the LLM by feeding it in a bunch of data from the document to basically first extract out a set of like structured annotations or tags. And this represents like a data transformation basically cuz you're trying to like feed in some input unstructured data and transform that into structured data. And then you can basically attach those tags on top of the unstructured data as well. And so these like this is just an example of like metadata extraction that's also powered by LLMs. And this is something that, you know, uses LLMs, but is also like, you know, useful for just like the any sort of downstream application you want to build. Because if you're trying to build a rag system over this, having metadata tags is often times very useful. It gives you like better retrieval results, better generation quality and and all this types of types of things. And so I I think that interplay between LLMs and kind of like data transformation is very interesting cuz you can use it for like in the middle, but also you it helps for any sort of like applications you want to build later on. Yeah, that's pretty some advanced rag techniques over there. And it kind of brings goes back to a little bit to your I mean, it it's a little bit adjacent to your original idea of creating this kind of systems, right? Yeah, I think I think the, you know, I I thought about this in the at the beginning of the project, but the project was not really um like close to kind of like realizing that vision at the time. But if you think about like the overall picture of like where I think LLM power software will evolve, it's basically like um there's a new type of like data like a data stack that's emerging, a new set of operations within that data stack to basically power like like AI software. And to really like kind of like we want to basically provide the right tooling to help developers build that data stack. And so this is helping them figure out how do you like, you know, move data from one place to another specifically for LLMs to use. And that could include like LLMs in the middle as well. And this also includes the orchestration piece on top of that data. How do you figure out how to get LLMs to interact with the data through these different types of interfaces? Awesome. Hello everyone. Thank you for watching the clip. If you enjoyed the clip, you can click the link in the description or somewhere right here on the screen to watch the full conversation for free. If the podcast is not up yet, you can always subscribe to my Patreon to get early access and support the channel and this podcast. Or alternatively, you can subscribe here on YouTube, click the bell icon and you will be notified when the full episode comes out next week. Thank you very much for watching and I will see you next time.

Original Description

Jerry Liu discusses the history of RAG (retrieval augmented generation) and the use of LlamaIndex in AI applications. Learn about the evolution of RAG and how LLMs can be used for data transformation and decision-making. LINKS: 🦙 Check out LlamaIndex: https://docs.llamaindex.ai/en/stable/ ❤️ For early access to the podcast, you can either: - Support on Patreon: https://www.patreon.com/posts/podcast-early-105418000?utm_medium=clipboard_copy&utm_source=copyLink&utm_campaign=postshare_creator&utm_content=join_link ✉️ Get the Newsletter: https://link.alejandro-ao.com/AIIguB ----------------------------- #aiagents #machinelearning #LLM #LlamaIndex #RAG #OpenAI #aipodcast #podcast

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Alejandro AO · Alejandro AO · 34 of 60

← Previous Next →

Linear Regression in R - Full Project for Beginners

Linear Regression in R - Full Project for Beginners

Configure Webpack 5 in Wordpress (2025) with Typescript and SASS

Configure Webpack 5 in Wordpress (2025) with Typescript and SASS

R Programming 101 - Crash Course for beginners

R Programming 101 - Crash Course for beginners

Convert HTML template to WordPress Theme (2025) - Full Course

Convert HTML template to WordPress Theme (2025) - Full Course

Javascript Interactive Map with Leaflet EASY (with Marker Clusters & Popups)

Javascript Interactive Map with Leaflet EASY (with Marker Clusters & Popups)

Vanilla JS Project: Multi Step form in HTML, CSS & OOP Javascript

Vanilla JS Project: Multi Step form in HTML, CSS & OOP Javascript

How to do AJAX in WordPress correctly (2025)

How to do AJAX in WordPress correctly (2025)

React Leaflet Tutorial for Beginners (2025)

React Leaflet Tutorial for Beginners (2025)

Linear Regression in Python - Full Project for Beginners

Linear Regression in Python - Full Project for Beginners

Logistic Regression Project: Cancer Prediction with Python

Logistic Regression Project: Cancer Prediction with Python

Display Equations in ChatGPT

Display Equations in ChatGPT

Create a Chrome Extension (Manifest V3) for ChatGPT

Create a Chrome Extension (Manifest V3) for ChatGPT

Full-Stack Project | ChatGPT API, React, Node.js, Express

Full-Stack Project | ChatGPT API, React, Node.js, Express

Streamlit Python Course: Build a Machine Learning App to Predict Cancer

Streamlit Python Course: Build a Machine Learning App to Predict Cancer

Langchain PDF App (GUI) | Create a ChatGPT For Your PDF in Python

Langchain PDF App (GUI) | Create a ChatGPT For Your PDF in Python

LangChain Memory Tutorial | Building a ChatGPT Clone in Python

LangChain Memory Tutorial | Building a ChatGPT Clone in Python

Chat with a CSV | LangChain Agents Tutorial (Beginners)

Chat with a CSV | LangChain Agents Tutorial (Beginners)

Create a ChatGPT clone using Streamlit and LangChain

Create a ChatGPT clone using Streamlit and LangChain

Chat with Multiple PDFs | LangChain App Tutorial in Python (Free LLMs and Embeddings)

Chat with Multiple PDFs | LangChain App Tutorial in Python (Free LLMs and Embeddings)

Full Python Environment Setup for AI (or other) Apps + Virtual Environments

Full Python Environment Setup for AI (or other) Apps + Virtual Environments

Langchain + Qdrant Cloud | Pinecone FREE Alternative (20GB) | Tutorial

Langchain + Qdrant Cloud | Pinecone FREE Alternative (20GB) | Tutorial

LangChain Version 0.1 Explained | New Features & Changes

LangChain Version 0.1 Explained | New Features & Changes

Create a RAG Chain using LangChain 0.1 (New version)

Create a RAG Chain using LangChain 0.1 (New version)

Tutorial | Chat with any Website using Python and Langchain (LATEST VERSION)

Tutorial | Chat with any Website using Python and Langchain (LATEST VERSION)

Deploy Your AI Streamlit App for FREE | Step-by-Step (Heroku Alternative)

Deploy Your AI Streamlit App for FREE | Step-by-Step (Heroku Alternative)

What is Google's Gemini 1.5 Pro | 10 Million Token Window

What is Google's Gemini 1.5 Pro | 10 Million Token Window

Chat with MySQL Database with Python | LangChain Tutorial

Chat with MySQL Database with Python | LangChain Tutorial

Stream LLMs with LangChain + Streamlit | Tutorial

Stream LLMs with LangChain + Streamlit | Tutorial

Chat with MySQL Database using GPT-4 and Mistral AI | Python GUI App

Chat with MySQL Database using GPT-4 and Mistral AI | Python GUI App

#1 Harrison Chase: LangChain and The Future of LLM Applications | Alejandro AO

#1 Harrison Chase: LangChain and The Future of LLM Applications | Alejandro AO

CrewAI Step-by-Step | Complete Course for Beginners

CrewAI Step-by-Step | Complete Course for Beginners

Python: Automating a Marketing Team with AI Agents | Planning and Implementing CrewAI

Python: Automating a Marketing Team with AI Agents | Planning and Implementing CrewAI

Build a Web App (GUI) for your CrewAI Automation (Easy with Python)

Build a Web App (GUI) for your CrewAI Automation (Easy with Python)

Early days of RAG and LlamaIndex - Jerry Liu

Early days of RAG and LlamaIndex - Jerry Liu

LlamaParse: Convert PDF (with tables) to Markdown

LlamaParse: Convert PDF (with tables) to Markdown

#2 Jerry Liu - What is LlamaIndex, Agents & Advice for AI Engineers

#2 Jerry Liu - What is LlamaIndex, Agents & Advice for AI Engineers

CrewAI + Exa: Generate a Newsletter with Research Agents (Part 1)

CrewAI + Exa: Generate a Newsletter with Research Agents (Part 1)

#3 Joe Moura | Multi Agent Systems and CrewAI

#3 Joe Moura | Multi Agent Systems and CrewAI

Python: Create a ReAct Agent from Scratch

Python: Create a ReAct Agent from Scratch

New Groq Models: Best for Function-Calling Agents

New Groq Models: Best for Function-Calling Agents

Introduction to LlamaIndex with Python (2025)

Introduction to LlamaIndex with Python (2025)

LlamaIndex: How to use LLMs

LlamaIndex: How to use LLMs

LlamaIndex: How to Get Structured Data from LLMs

LlamaIndex: How to Get Structured Data from LLMs

Multimodal RAG: Chat with PDFs (Images & Tables) [2025]

Multimodal RAG: Chat with PDFs (Images & Tables) [2025]

Advanced RAG with LlamaIndex - Metadata Extraction [2025]

Advanced RAG with LlamaIndex - Metadata Extraction [2025]

Learn MCP Servers with Python (EASY)

Learn MCP Servers with Python (EASY)

Create MCP Clients in JavaScript - Tutorial

Create MCP Clients in JavaScript - Tutorial

Create an MCP Client in Python - FastAPI Tutorial

Create an MCP Client in Python - FastAPI Tutorial

How to Build an MCP Client GUI with Streamlit and FastAPI

How to Build an MCP Client GUI with Streamlit and FastAPI

Vibe Coding For Engineers (make it ACTUALLY work)

Vibe Coding For Engineers (make it ACTUALLY work)

LlamaExtract Tutorial: Convert PDF & Images into JSON

LlamaExtract Tutorial: Convert PDF & Images into JSON

Local MCP Servers for Cursor (Step by step)

Local MCP Servers for Cursor (Step by step)

Anthropic: How to Build Multi Agent Systems

Anthropic: How to Build Multi Agent Systems

Deploy Remote MCP Servers in Python (Step by Step)

Deploy Remote MCP Servers in Python (Step by Step)

GPT-5 for Developers: API Changes, Pricing, Model Router & Security

GPT-5 for Developers: API Changes, Pricing, Model Router & Security

Tutorial: Auth for Remote MCP Servers (Step by Step) | OAuth 2.1 with ScaleKit

Tutorial: Auth for Remote MCP Servers (Step by Step) | OAuth 2.1 with ScaleKit

Generate UI Tests with TestSprite MCP Server + TRAE

Generate UI Tests with TestSprite MCP Server + TRAE

#4 Allan Guo | 19-yo YC Founder - Willow Voice

#4 Allan Guo | 19-yo YC Founder - Willow Voice

RAG Project: Build an AI Onboarding Chatbot with Streamlit, LangChain, and ChromaDB

RAG Project: Build an AI Onboarding Chatbot with Streamlit, LangChain, and ChromaDB

MCP Security | Malicious MCP Servers (Protect Yourself)

MCP Security | Malicious MCP Servers (Protect Yourself)

The video discusses the history and evolution of RAG and its application in AI, specifically with LlamaIndex, highlighting the potential of using LLMs for data transformation and decision-making. Viewers can learn about the basics of LLMs, RAG, and how to apply them to real-world problems. The video also covers advanced topics such as using LLMs for metadata extraction and data transformation.

Key Takeaways

Understand the basics of RAG and LLMs
Learn about the evolution of RAG and its application in AI
Apply LLMs to data transformation and decision-making
Use LLMs for metadata extraction and data transformation
Design and implement RAG pipelines
Craft effective prompts for LLMs

💡 The video highlights the potential of using LLMs for data transformation and decision-making, and how RAG can be used to improve the performance of LLMs in these tasks.

🔒 Pro feature: Ask AI to explain this lesson →

More on: LLM Engineering

View skill →

Build an LLM and RAG-based Chat Application using AlloyDB and LangChain

FULLY LOCAL Mistral AI PDF Processing [Hands-on Tutorial]

FULLY LOCAL Mistral AI PDF Processing [Hands-on Tutorial]

Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation

Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation

Ultimate Guide: Deploy Google ADK Agents to Vertex AI & Cloud Run (Step-by-Step Tutorial)

Ultimate Guide: Deploy Google ADK Agents to Vertex AI & Cloud Run (Step-by-Step Tutorial)

Shane | LLM Implementation

How to Make an Asteroids Game Bot (LIVE)

How to Make an Asteroids Game Bot (LIVE)

Using Claude Code + Nano Banana Pro To Create a Dataset of Engineering Drawings

Using Claude Code + Nano Banana Pro To Create a Dataset of Engineering Drawings

Automata Learning Lab

Related AI Lessons

Why you shouldn’t search your documents directly with AI

Learn why directly searching documents with AI can be inefficient and how retrieval-augmented systems can improve the process

Medium · Programming

Your AI Keeps Making Things Up. RAG Is How You Make It Use Real Facts Instead.

Learn how to use RAG to make your AI provide accurate answers based on real facts instead of making things up

Evaluation Metrics for RAG: Measure Retrieval, Generation, and End-to-End Quality With Numbers That…

Learn to evaluate RAG models using metrics that measure retrieval, generation, and end-to-end quality

Evaluation Metrics for RAG: Measure Retrieval, Generation, and End-to-End Quality With Numbers That…

Learn to evaluate RAG models using metrics that measure retrieval, generation, and end-to-end quality

Medium · Data Science

RRF vs DBSF with Qdrant: Hybrid Retrieval Fusion for RAG in Python

Professor Py: AI Engineering