LLMs for Advanced Question-Answering over Tabular/CSV/SQL Data (Building Advanced RAG, Part 2)

LlamaIndex · Beginner ·✍️ Prompt Engineering ·2y ago

Key Takeaways

This video demonstrates how to build an advanced question-answering system over tabular data using LLMs, including the use of pandas operations, SQL queries, and RAG concepts. The system is composed of a query pipeline that takes in user queries and data frames, generates pandas operations, and synthesizes responses.

Full Transcript

hey everyone uh Jerry here from llama index and in this video we'll teach you how to build query pipelines over structured or tabular data so when we try to build for the use case of question answering over tabular data the tabular QA stack looks quite different from the traditional rag stack which is primarily over unstructured data rag typically looks like you have a input query and then you do topk retrieval of text chunks from a vector database you take those text Trunks and you stuff it into the prompt to do response synthesis now there's been a few Stacks that have emerged for quaring over tabular data and they typically take the form of either pandas data frames or as a SQL database or a SQL database collection and so in those settings let's say for a pandas data frame um we call this text a pandas but typically you know you would take in an input query convert this to a set of pandas operations through an llm prompt so actually ask the LM to generate a set of operations because assuming the LM is trained over uh the Panda's like documentation and Library it can generate Panda's code you then execute these operations against a data frame and in L index the implementation um actually does this in a relatively safe manner to avoid uh security vulnerabilities and then after you get back the results then you also do response synthesis you plug in this context and so you have a similar stack for text SQL too um you take in a query um do a prompt to convert that query to SQL using both the query as well as the schema of the tables and some sample rows as input you execute this against a SQL database and then you do response synthesis so this is a very rough picture of of what this looks like um there are some differences with the ragsac mostly that you have to actually write some sort of symbolic operations to execute against an existing database and we'll basically show you how to build both a basic to Advanced version of QA over tabular data so if we start with text to pandas you know following these steps a basic workflow would look something like the following again take in the user query um input this query and the data frame information into an LM prompt to generate Panda's operations then you need to somehow be able to parse these Panda operations and run evl on them you know again typically you know somewhat of a risky operation but we'll make it work and then run those run run these operations against a data frame uh once you get back the relevant results you want then you synthesize the response with the query to get back the final answer and again Texas SQL looks something similar you have a query feed it to a text SQL prompt um get back SQL parse it execute it and then you get back a response this is a very basic stack of what text to SQL looks like um if you're building text SQL in the Enterprise chances are you'll likely have built something more advanced than this but actually one of the purposes of this video is to show you how to build custom workflows that not only express the basic Taxas SQL pipeline but actually more advanced components as well and so what are some of these more advanced components an additional Step Beyond the basic Taxas SQL stack is to add table retrieval in into this process and so what this means is that in a lot of settings your SQL database often has an entire collection of many different tables and so many of many times this will actually overflow the the Taxas SQL prompt because the Taxas SQL prompt takes in all the tables within your database and tries to generate a SQL query of course if you have hundreds thousands of tables you first need to do some query time retrieval of the relevant tables um to then put into the prompt and then generate the SQL and get back the answer this process actually looks very similar to rag because to do retrieval uh we're actually just going to do standard embedding search uh put this into a vector index and get back table objects that you can then put into the text seq prompt an additional Step Beyond that which we found actually helps performance is row base retrieval so actually once you found the relevant set of tables it often times helps to have you know some sample rows from that table in the prompt as well of course dumping all the rows in the table will overflow The Prompt and one of the issues with just dumping for instance like the first five rows or the last five rows is that that's somewhat context on aware right it given a user query you kind of want the llm or or you want this retrieval system to fetch relevant in data that this query would likely touch just to at least give the llm a sense of you know for instance the capitalization of different values uh or the the format of the dates for instance and so having these as few shot examples actually helps the Texas SQL Generation Um operate more robustly so that it's less susceptible to different types of Errors so for instance you know if you have a country column should it be where country equals Japan with a capital j or where country equals J with a lowercase j this really depends on having some sample values from the table to look at first because then you can actually get a better sense of what to look at and of course you know we'll be able to visualize this with our query pipeline syntax and also put this into arise Phoenix uh for um observability and of course we have a lot of different observability partners this is just an example to Showcase how we integrate by logging all the tra to the downstream observability provider so let's go through a notebook that shows you actually both taex to pandas as well as taex to SQL and different stages of taex SQL as well um you can find these guides in the docs right here so we have a dock on query pipeline over pandas data frames we also of course have an outof thebox pandas query engine that you can use if you don't want to just learn the individual steps and we also have a query pipeline for advanced Tech taql which shows you and we'll go through this to build your own Advanced Texas SQL stack and of course we also have an outof thebox Texas SQL engine that you can use if you don't want to worry about the internals great that said let's transition to our notebook so here is the notebook and we are going to show you how to build both texop pandas and text and we'll start with text dep pandas so in this example we'll show you how to build a pipeline over Panda data frames um this is going to be a relatively simple example uh which will show you how to build a pipeline from scratch to learn how to generate structured operations over a pandas data frame to satisfy a user query and we'll execute these Ops um and against the data frame and synthesize the response first we'll Define our Imports as you can see we import our query pipeline module um some additional helper components and we'll also import this thing called a pandas instruction parser this is actually a module that given a data frame and a generated instruction string parses that into operations and runs it in a safe and secure Manner and the data set we're actually just going to use here is a very basic data set it's the classic Titanic CSV data set that you can find on kagle I'm pretty sure it's the most popular data set on kagle um and then we'll load this in as a data frame right and this is just the information about Titanic passengers um different attributes and whether or not they have survived so now let's define the query pipeline let's define the set of modules um the first step is a pandas prompt to infer the panda instructions from the user query um and that you can see right here the pandas prompt is you're working with a panda data frame in Python the name of the data frame is this this is what the result of the you know the first five rows looks like um and then we insert that as a prompt uh as a template string um we also say please follow these instructions input the query and then we await the output the instruction string here is actually just hardcoded I mean I think this is uh you can customize this instruction string as well um but here you know we just tell the llm to generate a sequence of executable python code that can be run via eil um GPT 3.5 and four are Rel Adept at being able to understand this here we're not doing row based retrieval we're just directly dumping the first five rows in of the data frame into the prompt but that's okay we'll worry about that later uh and besides a pandas prompt we also have response synthesis prompt which you know just given the original instructions and the pandas like the output of the executing Panda's operations and the input query just actually generate a texal response to give back the final answer so we have you know the Pand is prompt the response synthesis prompt the LM and the pandas output parser um this pandas output parser is designed to safely execute python code right um and so it includes a bunch of safety checks that given again kind of an input string will parse out that string into uh python um and check that this python is relatively safe to uh execute it can only import for from a set of predefined approved modules so you can't import like um and you also can't uh access anything other than public methods so that's what this does and let's build the query pipeline um the flow should be roughly pretty intuitive but you take in an input feed it through the pandis prompt uh and then the LM parse it and then feed it into the response syis prompt so here we just Define these modules um pandas prompt output parser response synthesis and here we Define the links um it turns out the first half of this can just be represented as a sequential chain from input to prompt to llm to Output parser and the output is a panda's instructions now the response synthesis prompt of course takes in a bunch of stuff it takes in the original query um the llm output uh which is the the Panda's operations right this the Panda's instructions as well as the Panda's output we link the Panda's output parer to respon sythesis so let's execute that okay let's run this again so we executed that and then we can try to run the query right and if you hold on for just a bit I can actually just show you what this looks like I'm going to copy and paste the visualization code um go back into here and then show you what this looks like so all I did was we are just going to um visualize this you have an input I'm going to drag this around um and we see that this takes in the input um passes it to the pandis prompt to the LM generates the instructions parses it feeds everything into this response synthesis prompt which then gives you back the output okay so let's run this um if we ask what is a correlation between survival and age um we'll go through all the different modules right um get to the part where we actually generate the uh Panda's instructions this thing um get back uh or feed it into the response say prompt and then get back the final answer and you can see that this is the final answer okay so that is text to pandas now let's go over um two query pipelines for advanced text to SQL we'll actually just skip the basic one um because we already you know kind of showed it in the slides and we'll go straight to two components one is build building just a tax SQL pipeline that contains query time table retrieval where you can dynamically retrieve relevant tables in the tax SQL prompt and then the next addition to that is to also add query time sample row retrieval where um you know if you embed it an index each row of each table then when you ask a question over any relevant tables you can do um kind of row retrieval to also put that as few shot examples into the tax SQL prompt the data set we're going to be using is this thing called uh wey table questions data set you know this came out in 2015 but it's a pretty popular data set for just a table evaluation um none of the tables are actually that big but you can see that you know a lot of times there's just like a lot of different CSV files across a lot of different domains um you know we're just going through this this is about Towers this is about um actually this is about car this about population and they just scraped a bunch of tables from Wikipedia and aggregated it as a data set and so it's nice because you know it's obviously not a complete production setting but there's a lot of tables and it really stress tests uh what happens if you try to launch any sort of textas SQL against something with a lot of tables so let's download this file and unzip it and let's load all the files in as data frames luckily it you know it's not super big um we'll load in everything from the folder called 200- CSV and there's a bunch of files it's not the entire data set but it's like a subset of it um you'll see this probably pop up in this uh folder but um we want to create a directory called Wiki table questions table info um and this will allow us to not only store the csvs which already downloaded but actually store metadata uh for each of the csvs because this the goal of this section is we want to extract a table name and summary from each of the tables um this will help us better index it for table time retrieval um and this will better help our Texas SQL uh pipeline so what we do here is we Define a Pena class called table info which is information regarding a structured table there are two Fields one is table name and table summary because there isn't existing metadata attached uh to each of these csvs we're actually just going to have an llm infer it you know we're going to have trut infer it the table name must be underscores and no spaces just for convenience purposes and um the table summary is a short concise summary and we um input this prompt you know give me a summary of the table with the following Json format and the goal is given um information about a table uh we want to Output a structured pantic object this is done through a module called a pantic program um here it's a form of that which is using llms and prompts there's also a form that uses open AI function calling directly um but really you plug in a few different components you plug in the output class the LM you want to use and the prompt template string um and then given an input we'll try to extract out this output and so this is the prompt string um the input that we expect is the actual table string and of course the output that's expected is the table info so what we're going to do once we Define this program we're just going to go through every single table and then um try to extract out summary um we uh for the purposes of this video uh have basically saved uh all the table infos already um otherwise it takes a little bit of time and you can see this logic here you know if the table info already exists then we're just going to it otherwise actually run the program um you know using LM to extract out the summaries so we have all this stuff the next step is to since we're testing Tex a SQL and we have all these csps um just put some of this data in a SQL database and we'll use SQL Alchemy uh to connect to a simple and memory SQL light instance and then put all these csvs represent them as tables um and so there's a bunch of code uh we'll link the collab notebook if you want to dig through it but roughly speaking uh it's just creating a bunch of tables in the SQL database corresponding to the csvs and the names of these tables are actually the table infos that we extracted okay and next step is to set up an observability provider so we we can actually take a look at the traces um uh similarly to the previous video we'll use arise Phoenix again you can use many of our different observability ations um and we'll showcase stuff in just a bit and as you can see here the currently the trace view is blank so let's first Implement Texas SQL with query time table retrieval and express that as a set of query pipelines so we'll show you how to set up an end to end Tex SQL Pipeline with table retrieval and Define the core modules um one of the first things we'll do is we'll Define what we call an object index and retriever to store and index the table schemas uh an object index is just a simple wrapper on top of our popular index indexes within llama index um such as the vector index uh which is backed by a vector store um all an object index does is a translates anything that's stored in an index to an actual object so instead of just returning text uh when you retrieve stuff it actually returns entire objects um this is important because we want to actually store the table schemas and return those add what we call like a SQL table schema object um so therefore when you get back that object um you can directly plug it into your SQL database later on so we're just defining the object index the Retriever and the SQL database in this block of code um we just do from llama index import SQL database and Define the SQL database based on the SQL aquam andron and Def to define a object index and retriever um you'll see we need a mapping um on top of the SQL database and what this does is it um stores a mapping from the underlying node um to the expected SQL table schema object um so it can translate stuff back and forth between the node and the object this is the set of all the objects that we want to index these are the SQL table schemas and then we just call object index the from objects on the table schema objects the table node mapping and the class that using under the hood is a simple Vector store index to get a retriever we just do object retriever equals object index as retriever um so we fetch the top three table schemas every time we run this let's try this out now that we've defined the object retriever um the next step is to define a component which given these uh schemas can you know given the set of retrieve schemas can actually just translate it into a string right um because the goal is actually stuff all these schemas into a prompt and so this is a simple function that takes in a list of SQL table schemas and the goal is to Output a string that's just new line separated information um and each table schema is represented by this piece of text here we introduced something that we didn't introduced in the previous video which is just a function component this is a very simple component that allows you to Define an arbitrary function um and pass that in and then this just becomes a component you can plug into the query pipeline so this is one of the most flexible ways you can you can compose a query pipeline by just defining functions and string them together in this stag so we defined that table retrieval piece in mapping to context and so now we can piece together the full uh taxis SQL prompt as well as the output parser the taxis SQL prompt you see here um we are going to let me first run this actually okay the tax SQL prompt we just import from the default prompts that we use in La index uh so you know we we skipped a step in in terms of actually showing it in the notebook but this is what it looks like um actually you can see this in the output over here um given an input question first create a syntactically correct uh query to run look at the results you know here is an example question here SQL query to run result of f SQL query Final Answer here and you see the template variables include the table schemas the query string and then we await the output SQL query this template variable is precisely where those retriev table contexts are going to go and this is from the input the only other component we're going to add is after this text SQL prompt runs we need to actually um take the text and translate it into a SQL query right um because after this you know there could be some messy formatting uh you could like the LM sometimes just includes this entire block SQL query and so we just have some text parsing logic that looks at the response and tries to specifically uh fetch the SQL statement that immediately follows this line um and this is another function component it's called a SQL parser component so let's run this again and finally we have a response synthesis prompt which given the input query the SQL statement and the SQL response um generates the final textual response so this just gives you back the final answer now we can Define the query pipeline right so now the components are in place let's define the query Pipeline and um this looks like a decent number of components but it should be pretty intuitive uh we first have the input with the input component we then want to run table retrieval and table output parsing to extract out that table context string from relevant tables we then have the text SQL prompt that's uh you know we we load the table string into as well as the LM we want to part the SQL and you know do retrieval from it uh from the SQL database and then after getting back the result we want to do response synthesis so let's run this um and we can express these as links basically according to what I just talked about um I can explain these links but it's actually just nice to visualize what happens after we we uh run this and this is a more visual example of all the relationships I described you have this input the next step is table retrieval table output parsing this plus the input goes in the text to SQL prompt which then goes through the LM parses out the SQL right here and this um uh goes into the SQL database to retrieve the relevant tables this goes into the response synthesis prompt along with the SQL and input and gives you back a final answer so you can express this entire workflow as a tag and that's exactly what we did now let's run some queries we're going to ask what was the year that the notorious b was signed to Bad Boy this is information that is specifically within a single table out of this collection of a few dozen tables that we input into the SQL light database and so let's see given this entire pipeline if it can find it the input um we run through you know table retrieval hopefully we find the right table hopefully we go through a Texas SQL prompt LM Alpha parser blah blah blah and then we get back this response the notorious b was signed to Bad Boy in 1993 if we click back into um the Phoenix Trace view um you can actually see this entire Trace in action um so if we click into this query you know what was the EUR that inbig was signed to Bad Boy you see that we first launched this this first retrieval call is against the table Retriever and it's getting back two tables uh one of which is actually relevant and the other one of which is not so first the first schema is bad boy artist which is exactly the thing that we're looking for you know it contains you know the you're assigned you know that that's the column we're looking for as well as the artist and then the second table is Renaissance discography which is not relevant that's okay that's just because we we set the topk equals to two and so this is what the Texas SQL prompt looks like um you know given an input question and when we say only use tables listed below we injected the table context into the prompt and so given the question we output the SQL query this next retrieve call just runs the SQL statement against a DB and gives you back the relevant context right and so this retriever really is just running SQL statement and giving back a set of context nodes and then finally this last LM call is a response synthesis so given an input question synthesize response from the query results the query is what was the year that the Notorious BIG was signed a bad boy you have the SQL statement SQL response and then finally you have 1993 okay so that was step one and being able to run an advanced text SQL Pipeline and we had table retrieval baked in so that's already a decent start but actually you know the last part of this video uh takes this uh even one step further and this is in the second Advance capability which is text a SQL with both table time retrieval as well as query time row retrieval um one issue in the previous example is that if the user asks a query that say asks for something like the notorious b without the periods but the artist is stored as notorious b the generated select statement is not going to return any matches because if you go back into this Phoenix Trace view you see the generated SQL statement is where act equals the Notorious B.I.G but if you ask you know what would remove the periods it would also remove the periods in the generated SQL and then you would get back nothing so how do we actually uh prevent that how do we make this a little bit more robust and that's where the example that's where the few shot row examples come into play um we can alleviate this problem by fetching a small number of example rows uh per table and instead of just taking the top few rows we actually try to fetch rows embed index and retrievable rows that are relevant to the query so you're not getting back stuff from random artists you're getting back specifically stuff around um you know theb so let's build this um we extend the query pipeline we'll redefine it and and add some more stuff um and actually just for various purposes um we'll kind of we'll Define this query pipeline in the beginning we can always add modules to it the main reason is so that we can um pass through the the query pipeline's callback manager um to all Downstream modules um which allows us to um uh look at the comprehensive Trace throughout this entire system um note that the service context object this thing we're defining here is deprecated uh and in a V10 release coming out soon and so afterwards you just won't need this part at all and you just need to pass qpac manager to all Downstream modules but for the sake of this video uh we'll have this for now so let's run this and what we're going to do to prep for this is that for each table we are going to store that actually not only in the SQL database but actually as a vector index and we're going to embed and index every single row of each table um and you can see this code over here um this function just indexes all the tables and we see this Vector index dictionary it's a mapping from the table name to the index that Vector index corresponding to that table um and each row is just represented as a string of text right and this of course like it it the the goal really is to just give you um some sort of embedding representation of each row that you can use DSE retrieval for um to actually you know put into the Texas SQL prompt um this entire logic just does indexing right um for each uh table and it will persist it like build it and persist it if it doesn't exist or load from an existing storage system so let's run this and you can see you know it's indexing rows in all of these tables and as an example let's take um bad boy Artist as retriever so for this table you know we index all the rows in this table um and let's take a look at you know what what happens if we just put an artist name what what did we get outside we put in p. Diddy and you see the artist is stored in the table as just Diddy right with a year 1993 so this is good if we can do this for you know any of the queries we want to run this means that we can input stuff that's not does not syntactically match but uh we can still Gap back the the examples the actual examples to therefore craft a better SQL statement so to plug this component in we just wanted to find an expanded table parser um this table parser is responsible for again taking in the set of table schema objects and translating it into context and so we just want to expand this function um to not only convert the table schema itself to a string uh as with the previous example but also run retrieval on the index corresponding to that table to look up some relevant rows and insert those rows into the context to so this is the table info right for the given table um this is just the table context but also for the table we'll look up the vector index for that table to return relevant table rows um and we'll inject it here like here are some relevant example rows for that given table so the each table will contain both a context string as well as the row string and we'll wrap this as before in a function component um so given again the set of retriev table schemas and queries will get back the table and rower table context we just ran that and now let's define this expanded query pipeline um I actually won't spend as much time here um as before because this is actually literally the exact same pipeline as the previous one um the main difference is the table parser component um now just has a different function underneath it but we have actually the same set of nodes and links so let's run this visualize this this is actually the exact same as what we just showed before right input retrieval parsing text to SQL LM SQL retrieval response synthesis and now let's run some queries and let's run what was the year that the notorious b was signed to Bad Bo but without the periods we run through this entire module and somewhere in this area of generating textas SQL um you'll see that and and we'll see this in in more detail uh in the trace view that it's actually generating the correct SQL statement right it is actually inserting the valid value um instead of just using the input that we provided and of course we get back the right results if we take a look at the trace view for what's going on and click into this query now we see there's three retrieval calls um the first step is table retrieval so given the the input we want to return the relevant tables from the SQL database and so here we return the schemas um bad boy artists and also here it's football team records which is a totally irrelevant table but again doesn't really matter the next retrieval call actually the next two retrieval calls um go into each table and return relevant rows for each table um and this this is exactly where we see that when we fetch the Inus big we get back two example rows uh one of which is Diddy right which is irrelevant but the second is actually just the example like the value um showing the correct entry with 1993 and five um the second retrieval call is irrelevant because it's uh fetching rows from an irrelevant table but then we call Taxas SQL um and this is the expanded Taxas SQL prompt where you know you not only see the table schemas but you also see some relevant example rows and given all this information you then try to generate the SQL query uh and the next steps are standard you do SQL retrieval against a ddb and then finally response synthesis so that's all the sections we wanted to show you today I hope you enjoyed this video uh thanks and leave your comments in the in the comment section below okay

Original Description

In the second video of this series we show you how to compose an simple-to-advanced query pipeline over tabular data. This includes using LLMs to infer both Pandas operations and SQL queries. This also includes pulling in RAG concepts for advanced capabilities, such as few-shot table and row selection over multiple tables. LlamaIndex Query Pipelines makes it possible to express these complex pipeline DAGs in a concise, readable, and visual manner. It's very easy to add few-shot examples, link prompts, LLMs, custom functions, retrievers, and more. Colab notebook used in this video: https://colab.research.google.com/drive/1fRkgSn2PSlXSMgLk32beldVnLMLtI1Pc?usp=sharing This presentation was taken from our documentation guides - check them out 👇 Text-to-SQL: https://docs.llamaindex.ai/en/stable/examples/pipeline/query_pipeline_sql.html Text-to-Pandas: https://docs.llamaindex.ai/en/stable/examples/pipeline/query_pipeline_pandas.html Timeline: 00:00-06:18 - Intro 6:18-12:13 - Text-to-Pandas (Basic) 12:13-27:05 - Query-Time Table Retrieval for Advanced Text-to-SQL 27:05 - Query-Time Row Retrieval for Advanced Text-to-SQL
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from LlamaIndex · LlamaIndex · 48 of 60

1 LlamaIndex Virtual Meetup (May 4th, 2023)
LlamaIndex Virtual Meetup (May 4th, 2023)
LlamaIndex
2 LlamaIndex + MongoDB Workshop/Fireside Chat
LlamaIndex + MongoDB Workshop/Fireside Chat
LlamaIndex
3 Discover LlamaIndex: Ask Complex Queries over Multiple Documents
Discover LlamaIndex: Ask Complex Queries over Multiple Documents
LlamaIndex
4 Discover LlamaIndex: Document Management
Discover LlamaIndex: Document Management
LlamaIndex
5 Discover LlamaIndex: Joint Text to SQL and Semantic Search
Discover LlamaIndex: Joint Text to SQL and Semantic Search
LlamaIndex
6 Discover LlamaIndex: JSON Query Engine
Discover LlamaIndex: JSON Query Engine
LlamaIndex
7 LlamaIndex Webinar: Active Retrieval Augmented Generation
LlamaIndex Webinar: Active Retrieval Augmented Generation
LlamaIndex
8 LlamaIndex Webinar: Demonstrate-Search-Predict (DSP) with Omar Khattab
LlamaIndex Webinar: Demonstrate-Search-Predict (DSP) with Omar Khattab
LlamaIndex
9 LlamaIndex Sessions: Practical challenges of building a Legal Chatbot over your PDFs
LlamaIndex Sessions: Practical challenges of building a Legal Chatbot over your PDFs
LlamaIndex
10 LlamaIndex Webinar: Graph Databases, Knowledge Graphs, and RAG with Wey (NebulaGraph)
LlamaIndex Webinar: Graph Databases, Knowledge Graphs, and RAG with Wey (NebulaGraph)
LlamaIndex
11 LlamaIndex Webinar: Community Project Showcase (07/07/2023)
LlamaIndex Webinar: Community Project Showcase (07/07/2023)
LlamaIndex
12 LlamaIndex Webinar: LLMs for Investment Research (with Didier Lopes, co-founder/CEO at OpenBB)
LlamaIndex Webinar: LLMs for Investment Research (with Didier Lopes, co-founder/CEO at OpenBB)
LlamaIndex
13 Discover LlamaIndex: Bottoms-Up Development With LLMs (Part 1, LLMs and Prompts)
Discover LlamaIndex: Bottoms-Up Development With LLMs (Part 1, LLMs and Prompts)
LlamaIndex
14 Discover LlamaIndex: Bottoms-Up Development With LLMs (Part 2, Documents and Metadata)
Discover LlamaIndex: Bottoms-Up Development With LLMs (Part 2, Documents and Metadata)
LlamaIndex
15 Discover LlamaIndex: Key Components to build QA Systems
Discover LlamaIndex: Key Components to build QA Systems
LlamaIndex
16 Discover LlamaIndex: Bottoms-Up Development with LLMs (Part 3, Evaluation)
Discover LlamaIndex: Bottoms-Up Development with LLMs (Part 3, Evaluation)
LlamaIndex
17 LlamaIndex Webinar: From Prompt to Schema Engineering with Pydantic  (with @jxnlco)
LlamaIndex Webinar: From Prompt to Schema Engineering with Pydantic (with @jxnlco)
LlamaIndex
18 Discover LlamaIndex: Bottoms-Up Development with LLMs (Part 4, Embeddings)
Discover LlamaIndex: Bottoms-Up Development with LLMs (Part 4, Embeddings)
LlamaIndex
19 Discover LlamaIndex: Custom Retrievers + Hybrid Search
Discover LlamaIndex: Custom Retrievers + Hybrid Search
LlamaIndex
20 LlamaIndex Webinar: Document Metadata and Local Models for Better, Faster Retrieval
LlamaIndex Webinar: Document Metadata and Local Models for Better, Faster Retrieval
LlamaIndex
21 LlamaIndex Webinar: Build Personalized AI Characters with RealChar
LlamaIndex Webinar: Build Personalized AI Characters with RealChar
LlamaIndex
22 LlamaIndex Webinar: Make RAG Production-Ready
LlamaIndex Webinar: Make RAG Production-Ready
LlamaIndex
23 LlamaIndex Workshop: Building RAG with Knowledge Graphs
LlamaIndex Workshop: Building RAG with Knowledge Graphs
LlamaIndex
24 Discover LlamaIndex: Introduction to Data Agents for Developers
Discover LlamaIndex: Introduction to Data Agents for Developers
LlamaIndex
25 LlamaIndex Webinar: Finetuning + RAG
LlamaIndex Webinar: Finetuning + RAG
LlamaIndex
26 Discover LlamaIndex: SEC Insights, End-to-End Guide
Discover LlamaIndex: SEC Insights, End-to-End Guide
LlamaIndex
27 Discover LlamaIndex: Custom Tools for Data Agents
Discover LlamaIndex: Custom Tools for Data Agents
LlamaIndex
28 LlamaIndex Sessions: Building a Lending Criteria Chatbot in Production
LlamaIndex Sessions: Building a Lending Criteria Chatbot in Production
LlamaIndex
29 Discover LlamaIndex: Bottoms-Up Development with LLMs (Part 5, Retrievers + Node Postprocessors)
Discover LlamaIndex: Bottoms-Up Development with LLMs (Part 5, Retrievers + Node Postprocessors)
LlamaIndex
30 LlamaIndex Webinar: How to Win a LLM Hackathon
LlamaIndex Webinar: How to Win a LLM Hackathon
LlamaIndex
31 LlamaIndex Webinar: LLM Challenges in Production (w/ Mayo Oshin, AI Jason, Dylan from Starmorph)
LlamaIndex Webinar: LLM Challenges in Production (w/ Mayo Oshin, AI Jason, Dylan from Starmorph)
LlamaIndex
32 LlamaIndex Webinar: Agents Showcase!
LlamaIndex Webinar: Agents Showcase!
LlamaIndex
33 LlamaIndex Webinar: Learn about DSPy
LlamaIndex Webinar: Learn about DSPy
LlamaIndex
34 LlamaIndex Webinar: Time-based retrieval for RAG (with Timescale)
LlamaIndex Webinar: Time-based retrieval for RAG (with Timescale)
LlamaIndex
35 LlamaIndex Webinar: Build/Break/Test LLM Apps Showcase (co-hosted with TrueEra, Pinecone)
LlamaIndex Webinar: Build/Break/Test LLM Apps Showcase (co-hosted with TrueEra, Pinecone)
LlamaIndex
36 LlamaIndex Workshop: Evaluation-Driven Development (EDD)
LlamaIndex Workshop: Evaluation-Driven Development (EDD)
LlamaIndex
37 LlamaIndex Webinar: Building LLM Apps for Production, Part 1 (co-hosted with Anyscale)
LlamaIndex Webinar: Building LLM Apps for Production, Part 1 (co-hosted with Anyscale)
LlamaIndex
38 LlamaIndex Webinar: Learn about Fine-tuning + RAG (w/ Victoria Lin, author of RA-DIT)
LlamaIndex Webinar: Learn about Fine-tuning + RAG (w/ Victoria Lin, author of RA-DIT)
LlamaIndex
39 LlamaIndex Webinar: What's next for AI after OpenAI Dev Day?
LlamaIndex Webinar: What's next for AI after OpenAI Dev Day?
LlamaIndex
40 Introducing create-llama
Introducing create-llama
LlamaIndex
41 LlamaIndex Webinar: PrivateGPT - Production RAG with Local Models
LlamaIndex Webinar: PrivateGPT - Production RAG with Local Models
LlamaIndex
42 Multi-modal Retrieval Augmented Generation with LlamaIndex
Multi-modal Retrieval Augmented Generation with LlamaIndex
LlamaIndex
43 LlamaIndex Webinar: LLaVa Deep Dive
LlamaIndex Webinar: LLaVa Deep Dive
LlamaIndex
44 A deep dive into Retrieval-Augmented Generation with Llamaindex
A deep dive into Retrieval-Augmented Generation with Llamaindex
LlamaIndex
45 LlamaIndex Workshop: Multimodal + Advanced RAG Workhop with Gemini
LlamaIndex Workshop: Multimodal + Advanced RAG Workhop with Gemini
LlamaIndex
46 LlamaIndex Webinar: Efficient Parallel Function Calling Agents with LLMCompiler
LlamaIndex Webinar: Efficient Parallel Function Calling Agents with LLMCompiler
LlamaIndex
47 Introduction to Query Pipelines (Building Advanced RAG, Part 1)
Introduction to Query Pipelines (Building Advanced RAG, Part 1)
LlamaIndex
LLMs for Advanced Question-Answering over Tabular/CSV/SQL Data (Building Advanced RAG, Part 2)
LLMs for Advanced Question-Answering over Tabular/CSV/SQL Data (Building Advanced RAG, Part 2)
LlamaIndex
49 LlamaIndex Webinar: Advanced Tabular Data Understanding with LLMs
LlamaIndex Webinar: Advanced Tabular Data Understanding with LLMs
LlamaIndex
50 Ollama X LlamaIndex Multi-Modal
Ollama X LlamaIndex Multi-Modal
LlamaIndex
51 Build Agents from Scratch (Building Advanced RAG, Part 3)
Build Agents from Scratch (Building Advanced RAG, Part 3)
LlamaIndex
52 LlamaIndex Webinar: Build No-Code RAG with Flowise
LlamaIndex Webinar: Build No-Code RAG with Flowise
LlamaIndex
53 LlamaIndex Sessions: Practical Tips and Tricks for Productionizing RAG (feat. Sisil @ Jasper)
LlamaIndex Sessions: Practical Tips and Tricks for Productionizing RAG (feat. Sisil @ Jasper)
LlamaIndex
54 Introduction to LlamaIndex v0.10
Introduction to LlamaIndex v0.10
LlamaIndex
55 Build SELF-DISCOVER from Scratch with LlamaIndex
Build SELF-DISCOVER from Scratch with LlamaIndex
LlamaIndex
56 Introducing LlamaCloud (and LlamaParse)
Introducing LlamaCloud (and LlamaParse)
LlamaIndex
57 LlamaIndex Sessions: 12 RAG Pain Points and Solutions
LlamaIndex Sessions: 12 RAG Pain Points and Solutions
LlamaIndex
58 LlamaIndex Webinar: RAG Beyond Basic Chatbots
LlamaIndex Webinar: RAG Beyond Basic Chatbots
LlamaIndex
59 A Comprehensive Cookbook for Claude 3
A Comprehensive Cookbook for Claude 3
LlamaIndex
60 LlamaIndex Webinar: RAPTOR - Tree-Structured Indexing and Retrieval
LlamaIndex Webinar: RAPTOR - Tree-Structured Indexing and Retrieval
LlamaIndex

This video teaches how to build an advanced question-answering system over tabular data using LLMs, including the use of pandas operations, SQL queries, and RAG concepts. The system is composed of a query pipeline that takes in user queries and data frames, generates pandas operations, and synthesizes responses. The video covers the implementation of RAG concepts, fine-tuning, and vector stores for efficient retrieval.

Key Takeaways
  1. Define the query pipeline
  2. Use a pandas prompt to generate pandas operations
  3. Run the query pipeline against the data frame
  4. Synthesize the response using a response synthesis prompt
  5. Implement RAG concepts for advanced capabilities
  6. Fine-tune the LLM for specific tasks
  7. Use vector stores for efficient retrieval
💡 The use of RAG concepts and fine-tuning can significantly improve the performance of question-answering systems over tabular data.

Related AI Lessons

Up next
I Built an AI Agent in 6 Minutes (No Code, No Developer)
HubSpot Marketing
Watch →