LlamaIndex Webinar: Learn about DSPy
Key Takeaways
This video webinar introduces DSPy, a framework for LLMs that emphasizes programming over prompting, and demonstrates its capabilities in composing modules, optimizing pipelines, and fine-tuning language models. The webinar covers various topics, including retrieval-augmented generation, prompt engineering, and self-improvement, and showcases the use of DSPy in building complex systems and optimizing model performance.
Full Transcript
hey everyone uh my name is Jerry welcome back to another episode of The Llama index webinar series uh today we're joined by Omar and Thomas uh from the dpy project uh and the title of the slide slides are compiling declarative language model calls into self-improving pipelines um it's a cool framework that that um has been under development for for a few months now and and there's a lot of new Concepts in there that you don't really find in other Frameworks especially the stuff around like Auto optimization of of the pipeline through the compiler as well as like removing the need for prompt engineering and the importance of few shot examples so I'm personally super excited to learn a little bit more about this and and so we'll do about like 20 25 minutes of of Q&A um and that or not Q&A uh slides and then we'll do about like 20 25 minutes of Q&A um so if you guys have any questions please feel free to drop it in the chat uh and then all if I'll probably ask some questions as well towards the end uh cool without further Ado passing it over to uh Omar and Thomas perfect okay well thank thank you so much everyone H thank you for coming out today thank you so much to Omar kabi the Creator and leader of BP um the advisor Chris pots and mat zarya and thank you to the wonderful dsb team um what I'll be doing at first is just kind of setting up um the LM ecosystem at large so that Omar can then introduce DSP and how it solves those challenges so you know one of the most common questions we get is where does dsv fit into the landscape of tools so let's zoom out we have all these emerging theories and techniques on how to improve LMS you you know you have adaption which includes different prompty techniques like fuse shot where we enable in learning where we provide demonstrations in the prompt to steer the model you have reasoning so includes chain Chain of Thought where we enable complex reasoning capabilities through S intermediate reasoning steps you have augmentation so that's including so providing the LM information is a piece of text that could generate the corrector useful response and there's also you know specific designs into how you can retrieve this information obviously we have we're preferential to Omar's Co bear model but uh you know obviously there's a lot of different other retrieval methodologies um and so you DSP really focus on coordinating your language model and your reval model so we first translate string based promp techniques including complex and task dependent ones like China thought and react into declarative modules that carry natural language type signatures So currently these tasks are solved using domain specific pipelines so some examples of this are flare so the study introduced flare is it's a retrieval augment generation method method that acely determines when and what to retrieve during text generation using predicted upcoming content as a query to gather relevant information um and you have then SQL offers a blend of hand prompted LMS combined with a SQL database for prot Texas SQL you have the RAR system that automatically provides attribution for language model outputs and corrects unsupport content enhancing kind of the trustworthiness of the system and each of these pipelines uh exemplifies dedication to task specific design and exchange which exchange flexibility with performance and so if you're an engineer building out these pipelines or you're a researcher you want to set up yourself up for Success later down the road uh in case something changes with your foundation model your retrieval model user Behavior Etc so while pipelines offer great potential crafting them effectively is quite challenging firstly you know figuring out how to seamlessly connect and optimize every element of a pipeline can be elusive moreover prompts used in these pipelines can be very delicate and don't easily adapt across different scenarios and if that wasn't enough fine-tuning them can be an ingrate process adding another layer of complexity so as you can see while pipelines that really tailored to a specific problem are promising they come with their own hurdles so this SC screenshot is from D SQL for a paper for converting natural language to SQL and you can see these massive handwritten prompts and right now many NLP researchers are writing demonstrations and often these long free form prompts by hand so you know it's you know a good a good way to to kind of give yourself a nice chuckle is to read from really brilliant paper the paper uh looking at kind of the prompt and it's sort of an extensive uh prompt that you know we don't need to go into today but the point is is that um currently researchers are kind of getting around these issues trying to Leverage The Power of L damage models um by kind of creating by creating these really long winded textual prompts and with that Omar please take it away thank you Thomas uh let me set up my screen sharing as well right cool so um I guess this is time to introduce you to the dspi Paradigm and how we tackle the challenges of prompt engineering these days and our main sort of um um statement is that we'd like you to program and not prompt language models so our goal is to help you shift Focus from uh tweaking the language model or you know uh hacking on prompts to focusing on good uh architecture design for your system so the kinds of pipelines that um uh Thomas was introducing and we do this by by basically uh bringing in three key Concepts so these are signatures and we'll discuss this in a little bit these are going to abstract the role of prompts and also a fine-tune language modeles basically whatever you condition the language model on in order to teach it a subtask in your pipeline the second uh sort of class of Concepts we introduce is that of modules signature based modules in particular and these will generalize all the prompting techniques you see coming out in the literature um and will allow them to be sort of uh naturally composed into multi-stage systems and the key thing about modules is that these will um be internally parameterized such that they can learn from data and we'll see what that means exactly in a little bit um the third concept we introduced is that of teleprompters so these are optimizers for um your entire program that will uh essentially look at your data and your modules and will push the behavior of the system incrementally or iteratively towards any arbitrary metric you'd like to optimize for um so to sort of make this more concrete things like Chain of Thought react agents um and other sort of advanced things there like program of thought other other things you you've heard of um these in dpy are uh modules that you can build pretty easily and the ones listed here are things we have built in um and what the these are going to do is that you are going to declare them with a signature so a signature is simply going to say I'd like this module um could be any of these to um essentially exhibit the behavior the input output behavior um that semantically matches the those roles so for example you could have a Chain of Thought um um element that takes questions in and generates answers and instead of sort of thinking about how you would condition a language model on a long prompt um or few shot examples or very specific instructions um in order to do that or how you can find a fine-tuned model or maybe construct it yourself to do that you're simply going to say I'd like the behavior of taking in a question and outputting an answer um to be U sort of assigned to a Chain of Thought module that I'll then use in my system and that I'm going to rely um on the spies compiler in particular a specific teleprompter like one of those at the bottom here um which will take my system sort of take the context that's working in so maybe your you know your data and whatnot and is going to work to optimize these components towards you know Notions of self-improvement so let's kind of um uh see this concretely but before that if you're familiar with neuron networks you can easily sort of assume anything is the way it would be in by torch if the language model was your device and nine times out of 10 by Design uh your your guesses will be right and this is sort of one of the reasons I think people pick up DSP pretty quickly so um if you're familiar with neuron networks that's going to help if not it's not an issue uh but you can basically think of modules in dpy as your layers um so you know um in in something like byor you might have layers like linear layers or convolution layers um Dropout and other sorts of Transformations these are exactly what modules here are going to play the role of um but they're instead working at the level of um high level Transformations that the language model is conducting signatures are essentially simply specifying the uh kind of the function Behavior the the input output behavior of a module so in pyour this might be like the dimension of your um you know of your layer or the number of channels and whatnot and teleprompters simply optimizers so you know Adam SGD or popular ones there here we'd have things that bootstrap few shot examples or um optimize with um you know hyper parameter optimizers and whatnot so let's get concrete uh let's say you want to solve a math word problem you could do all kinds of things in dsy um things that user retrieval and and things that don't so maybe you want to work with this gsmk data set pretty popular in the in the literature so it has questions like this uh math word problems and the answer is uh you know a number and you know it's usually of the sort that requires at least a little bit of bookkeeping you know it's not intuitively a number you can immediately extract from the answer but they are great school math so not it's not too advanced or anything um so you might say I'd like to have a system that could solve these kinds of questions you know it's a fairly arbitrary task you could plug in something else here um so your first attempt at this you're starting you know in a clean systematic um kind of engineering style so you're starting simple so you say maybe I want to build a module that's going to take the question go to the language model simply um interact with it once and ask it to commit to an answer so the way this would look in dpy is that you can define a module called vanilla and all this is going to do is it's going to um instantiate the dspi predict module and it's going to tell it the signature we want so I'd like it to take a question and give me an answer um and then you can test this uh but maybe we can come up with more interesting designs so a different design we could have is to basically say um I'd like a system that can do a chain of thought so I'd like it to you know think step by step before it commits to an answer and in Dy this is also built in so you could just say I'd like a chain of thought that can take a question and output an answer notice that the role of this signature here is important if you are to replace for example question and make this search query it's going to generate search queries for you um so it's a it's basically working on interpreting um you know the uh semantic um elements of the signature to determine the behavior that you're trying to describe um so this is 2023 and it's October in fact so maybe you'd like to build something more sophisticated maybe you'd like your system to do Chain of Thought giving the question but maybe it should not commit to one answer maybe it should sort of uh prototype three different answers and given that uh you know you could take a majority vote but maybe you could do something more sophisticated maybe you could ask the model to look at all three rationals and their answers and essentially compare them um before committing to a final answer with another Chain of Thought So in dsy it would be this block of code and what I want to emphasize is that in each of these three programs this is the entire specification of the program there are no prompts anywhere you know we didn't write the prompts for this internally you're not going to write the prompts for this internally this is the complete specification of your behavior so let's go through this step by step um what we are going to do is we're going to define a class in Python um this is very similar to the way you would define neural network in P torch by the way um and you're basically going to say it's going to this is a DSP module with with that you're going to Define two key u methods the first is initialization so here you're simply going to declare the uh components of the module that you are interested in so these are basically the boxes that we have so we'd like to have a change of thought um component that's going to take a question and output an answer and um this is sort of the general behavior that we are going to expect from it but we also want it to Output multiple things so we can give it you know the number of outputs um another uh component here or another module is the multi-chain comparison and ultimately it's still working in the space where you know we have a an input question that it wants to answer um but it also can take um sort of these intermediate things by Design and so it will take you know whatever number of attempts is the second uh element that you define is your forward function and this is sort of inspired by the way things work in P torch where you can basically write any uh code you like you know we're not going to give you weird sort of handicapping abstractions for chains just write python code you could have loops you could have exceptions you could have um you know uh you could call other functions you could do whatever you want just make sure you use these modules that you declared in the places where you want to interact with the language model um and with that uh you know you can just instantiate it by saying I'd like a version with three atep and this is um simply going to declare the architecture you have in the right hand notice that here we didn't say I'd like this to use open AI models or I'd like this to do prompting or I'd like this to do fine-tuning all of these are low-level details that can be figured out after you've designed your network and in many cases can be figured out pretty uh effectively in an automatic way so let's actually kind of work on compiling one of these um so once you build a network in and or a program in DSP what you do is you compile it uh this is kind of like training a model but uh when working with these programs uh that interact with language models so the general signature or the general sort of interface to a compiler is that there's this teleprompter that you pick so there's this Optimizer um you give it your dsy program and you give it some number of examples generally with very sparse labels so you know you could you could have a small number of examples many of them without any labels and maybe a few labels on the final output that you expect from your system depends on your metric really um and um generally also admits a lot of unlabeled examples depending on your use case so you give it your program and a few examples and you select a specific teleprompter so we have a lot of these uh for different kind of applications they're all general purpose though so you could use any of them um and what you get out of that is sort of a specific implementation of your dsy program that's generally optimized for you know uh whatever metric you'd like to uh to to Target generally along axis of quality or um or cost cost so we just defined three programs and I promise you that you know this was the set the entire definition of these programs now we need to actually be able to execute them so uh um we could take our Chain of Thought program or really any of them and we need a training set of examples so maybe we have some of these questions and they're answers and notice that our answers here are simply going to be numbers no one is going to bother to write annotations for sort of all of the intermediate stages of our pipeline so in particular we're not writing a Chain of Thought by hand because like how do you do that well for the model you don't know what exactly you know what what what works for it uh what works for llama is not what works for GPT 3.5 and whatnot so you write your training set could be a handful of examples could be large also if you want to sort if you have a lot of data and you want to optimize sort of more aggressively um you declare your your module um essentially your program you pick a metric and you simply ask it to compile and what you get out of that is a bootstrap program where every module in this case there's only one module but where every module has sort of been mapped into an implementation in accordance with the teleprompter that you have so here this is a bootstrap F shot um um uh teleprompter so what this is going to do is it's going to work on on um fot prompting um so under the hood if you're working with llama uh two 13 billion chat you know itself bootstraps to this uh set of uh you know chains chains of thought and you know there's a bunch of other demonstrations here skipped for space um but internally it's able to sort of find uh or really create uh this this prompt that works really well under the hood and we'll see U how that works uh in a little bit so there's a lot of other teleprompter if you have labels that you'd like to use you could use a labeled F shot teleprompter you can bring your own labels if you have them not generally the way we like to do things but you know um you could um ask the model to sort of bootstrap or sort of kind of self-create um um and potentially iteratively improve a few short examples uh with random search you could do it also with uh kind of fancier search algorithms we have uh supports for optuna um you could also Nest these things so you could bootstrap based on the bootstrap program and this is kind of an interesting form of distillation where you're not distilling a model to another but you're distilling a whole program which can sort of be composed of many modules um into another through U you know through a particular teleprompter um you could also have higher level um sort of higher order manipulations of your program where here this thing says Hey I just boostrapped a lot of programs in particular I could take the top seven um I'd like to have an ensemble program so this is going to function as a kind of a a similar solver that has the same overall signature that we have but I'd like it to basically run all seven and pick a majority vote you know there's a lot of other things you could you know that fit this pattern so things like you know K&N uh manipulations where you say I'd like my program to actually bootstrap um demonstrations based on things that are similar to the question I'm trying to answer now a lot of other um sort of uh possibilities there so with this same program the vanilla program the first one we wrote we have results here with GPT 3.5 and llama 2 13 billion chat and and what you see is that exact same oneliner um compiled with the you know these oneliner or two ler prompts uh sorry compilers um you you can see that you can take a model like Lama chat which absolutely sucks at at this task if it's done zero shot um especially if you know if there aren't human um um Chain of Thought you know that you're giving to the model so if you're just prompting it with the with the numbers um to a formidable you know 39% score um with this very basic program that's not even doing uh a Chain of Thought um obviously if you add a chain of thought you could you could get really good results you know out of GPT 3.5 and llama 2 um and if you add this reflection module that took us like five or six lines I forget um and you simply use one of these compilers you can push llama 2 to close to 50% and you know GPT 3.5 close to 90 uh percent um on this set so um this is this was sort of an overview of us building three um you know programs of varying complexity the last one was you know somewhat somewhat complex um and compiling it into really excellent sort of um systems by um allowing them to self- improve internally through a built in teleprompter in dsy now you could move on to uh you know a number of other tasks we could take on multihop questions a task that I like because it sort of helps uh look at various pieces of this so these are questions like you know which award did Gary zuk Cav's first book receive um and there are two subtasks here really one is finding passages from Wikipedia that can help um sort of answer this task and the and the other is actually generating the answer um so you could build all kinds of models here but let's say we want to build a program that looks like this something that can take the original question ask the language model to generate a search query so a question might be something like hey how many stories are in the castle David Gregory inherited and this is a question that generally stumps language models if you go to GPT 3.5 GPT 4 they'll either abstain or they'll give you something silly like David Gregory inherited Castle Gregory which is which is kind of a madeup thing um so we'd like the first step to be hey language model please generate a search query for us so what can CLE that David Gregory inherit something that the retriever could take something small enough that the retriever could take and actually do a really good job at just identifying that this guy inherited this particular kardi Castle then you can go back to the language model and essentially issue another query um and you know the language model might say something like how many stories are in this Castle you're talking about canori castle again you know it's it's it can be something or it should be something simple enough for the retriever to to immediately find it um such that the language model could holistically take all of this and answer the question so we're seeing a lot of these workflows and Lama index has a lot of these workflows where there is kind of back and forth between um um or multiple H you could say between the retrieval and the language model and our goal is to make you make building arbitrary things of various level levels of complexity of this sort as easy as just programming the workflow um that you see on the left so if you are to do this in DSP you're going to build a class and it only has two methods um so in intialization we just need to declare the comp components so this thing has in fact um three unique components the red stuff like generating search queries the orange stuff which is the retriever um and the blue one which is going to generate the answer so let's actually just do that so we can have a Chain of Thought module or you could have any other type of module here um that's going to implement this signature so this is a signature that's going to take a context and a question and it's going to generate a query um notice that you know this is different from taking a question and outputting an answer it's a different signature um then you have the retriever um and let's say it's going to retrieve three paragraphs at a time and then we can have the generate the answer generation so this is going to take the context that we've gathered so far and a question and generate you know the final answer answer in the forward function here we can chain them and again you know you're not restricted to chaining in sort of fragile ways or sort of rigid ways um you could write a loop um you could have the loop sort of depend on the particular question that you're trying to solve for Simplicity let's say we're going to have two hops um as the diagram on the left shows so we're going to use our modules so first we're going to generate a query we'll give it the components we have we're going to take the search query out of it we're going to take the stuff and append it to the context that we're building so we this is going to retrieve some passages we're just going to throw them into the context this this is all sort of imperative code so you could easily sort of inject the print statement here or whatever to inspect what's going on um and you could simply U generate uh the answer at the end and return that as the output of your system because this is essentially a DSL or like a domain spe specific language uh in Python you are basically free to do all the kinds of things that you would want to do in a in a in a pipeline like this like tracing or inspection or you know putting in a debugger on or um using exceptions Loops Rec cursion um it's basically free form python code as long as you follow us a few um high level things so yeah just some color coding about the stuff on the left and how they map on the on the right so under the hood though all all all the all what we have here is that we just we're just saying I'd like a Chain of Thought Behavior with a signature of context question becomes s um we're not saying how that should be done but the compiler given your choice of teleprompter your choice of metric and other things uh is going to have to map that into something that can actually be executed given your language model so U on the left here I have assuming your language model is set to be llama 2 chat um the 13 billion variant so that's something you could run locally potentially um also works with with 7 billion parameter models um so so it it would compile this first step um into a prompt like what you have here on the left um and you know interestingly it would do that without any labels from your end on you know what makes for a good search quity or what makes for a good Chain of Thought for generating a search quy or what makes for good passages with hard negatives for generating the sort of thing this is all internally bootstrapped in accordance with whatever metrics you're trying to optimize for this is optimized purely for having uh generated the correct answer with no other sort of um um elements so what does a teleprompter do I mean it sounds like it's kind of the at this point it it does come across potentially as this you know main element of of doing magic here um so the the teleprompter generally does three things um it needs to generate candidates right so you've seen we've generated these prompts um and so they have you know different elements or quote unquote parameters that can be learned the key one really is um demonstrations of the behavior that you'd like to see there um so if we have something that generates search queries you know we' like examples of us generating search queries and maybe I'll I'll try to to wrap up after this so um generating good examples of those using the same language model so this is you know if you're using llama it's llama Bo strapping itself if you're using GPT uh it's GPT doing the same um although you could also do sort of cross model um essentially uh teacher student setups pretty easily um the second is parameter optimization like okay now you have all these potential candidates how do you select a few that actually work well and the third is higher order pipeline optimizations so hey I have this program that's pretty linear but I'd like something that can do backtracking like if if some assertions fail or something fails you know um I'd like it to basically go back and retry things in an automatic way um and you know the the the DSP Paradigm you know allows for teleprompters to do this sort of thing and we're exploring a lot of that so the question is like how does this work in practice um and to to uh basically test that we have here some results with um a few programs I'm not discussing them here but you could take a question answering vanilla thing that relies in the model itself you could build a Chain of Thought rag in one in a like two line thing in the aspy um and what you can see is that you know the the the few shot version uh of these is pretty decent but if you ask the model to bootstrap its own examples you know you could potentially get some really uh decent gains uh out of them um agents are another class of things that's highly sort of supported so you can start with something like react zero shot which is how it's usually done in like you know langing Shain and other places because it doesn't know about your tools doesn't know about your task um if you have human labels you can add them in and it does help you know a decent bit but if you simply use the compiler you can generally do quite a bit better than um you know the human labels will will allow you to do and this is kind of all general purpose here we don't need to hard code um prompts for these for these um approaches you use this sort of easy to code multihop thing very basic design uh you able to to push these you know way higher um in including with you know a model as as small and local as llama 2 chat you know and can certainly do even better than that interestingly you're not limited to prompting so something as Tiny as T5 large you could probably run it you know on your phone pretty quickly um with 800 unlabeled questions and a few labels 200 labels in this case um you can get get it to score 39% answer exact match and 46% passage recall which is actually really good like this is um you know it's around 10 points better than what you get out of kar V2 by itself um and this is mainly leveraging unlabeled questions so um I'll wrap up here um I have some other slides that we can use for questions and whatnot but the takeaway is is don't mess with the weights of your system don't mess with the prompts by hand Define new modules that Express the behavior that you want or Define new teleprompters that optimize things in the way that you want um don't expect I guess you guys don't need to hear that but many other folks do don't expect one layer to do all the work you know think about multi-stage pipelines and how you can add depth in your in your in your system so that you can sort of solve more complex tasks through these you know um um pipelines and programs and you know uh if you do research or if you care about reproducibility or cost and whatnot this stuff allow you to use models like D5 large or um flan or you know llama 7 billion or or or 13 billion and get really high quality systems um by simply you know through this notion of of of compiling um and so um this is this stuff is open source you know feel free to check it out I'll stop here and we can start taking questions um where we can also kind of use some of the slides as needed so I'll I'll turn back to you Jer great thanks Omar and Thomas for the presentation um this is great the super interesting idea and i' love to dive in a little bit more and for those of you who don't know we actually heered Omar like a few months ago actually on on DSP at the time and so now I think the main change is the why I was going to ask why why you added a why at the end but like it seems like one of the main um additions is this idea of like compiling optimization which I'm really excited to dive into U maybe to start off with like a hollow question is what are your general takes on like prompt engineering like it seems like you've said like loud and clear that you know in the first point like you don't you should abstract away they need to manually tune your prompts um what are your thoughts on whether like prompt engineering should uh be a thing maybe part of it maybe all of it um right and and how would this like evolve over time uh great question uh you might expect me to say it should die and no one should ever do it but the answer really is there's a lot of interesting kind of developments research and I don't mean research only in an academic setting all the kinds of things you guys are doing and lots of other cool people are doing um it's it's important to discover new capabilities of models but once we understand U what the capability looks like I think it should be generalized into a module that can be reused in a well-defined way so here Chain of Thought someone had to come up with that you know the you know the team that discovered that had to had to come up with it or program of thought or other things and I think a lot of that comes through you know free form not not all of it for sure but some of it comes through interacting with the models in a fairly free form way and uh you know some of that is hard to replace otherwise once you understand that particular pattern of behavior though it's pretty straightforward to just say okay here's what a general purpose task independent module should look like in in practice and I can show you guys chain of how Chain of Thought sort of is implemented in dspi it's actually really simple um and um you know under the hood that thing should be parameterized such that on any new task that you have um you simply kind of you can simply compile it optimize it for your data instead of the usual thing the usual thing is we all like Chain of Thought So every time I'm starting a new sort of pipeline I'll sit down and write a few chains of thought for that particular step um and if you want to sort of be meticulous you could say like for that particular step for this particular ele because what works here is know what works there too well and this quickly I think becomes unmanageable so you have to compromise somewhere and we're saying you don't need to compromise just build the general purpose module forget about it you hit a case where you need more uh developments do your due diligence implement the simple thing it's extremely easy to do this in DSP like Chain of Thought is a line right or program of thought is a line react is a line uh compile it for your thing and that gives you at the very least a really strong Baseline based on that you can write your other prompt and see if you're doing better uh and if you are make it a module people can then use it for new things and you know you can take things from there so hopefully this sort of helps characterize this another element here is um what's never going to go away I I believe is good system design so you know we what we did here is that we said like what's the problem like well we have questions we like to generate search queries we have those interactions so that good system design is is really key and all we're saying is if you think at the higher level of abstraction where language models um can simply s ort of be you know um these devices that you can interact with with with high LEL signatures and with modules then you can start exploring a much larger space of design where if things fail um you know still need to you know uh look at look at the traces you still need to understand things but your first intuition is usually should I add some internal logic here or should I kind of you know add treat some exception at some routers thinking at a high level of architectural design as opposed to like should I maybe f the last word in the prompt and I kind of hope it works on a couple of examples and pretend maybe that's okay um so that's that's the general picture here makes sense um the so yeah I guess going along that point like I I think one of the core components is assuming you can abstract this away in a module a core component is optimization piece and just talking a little bit more about how that works like uh when you when you think about like compiling a program uh and you try to optimize like the the bits what are the main bits that you can optimize just very concretely is it adding like futron examples I see a bit in there about like fine-tuning a small LM as well uh what what exactly exists like at least today and then also what do you see kind of existing yes that's a that's a great question so um the real sort of optimization logic exists in two places one is uh the predict module which is kind of like the key core prediction module and that defines what sort of and and anything else that you would add that would play the similar role that defines what parameters you'd like to optimize at the moment the key parameter and I think this will forever be the key parameter um is demonstrations so this is a slightly more General thing than fot example demonstrations are basically good examples of input and output to the module that you have within the constraints of the signature that that you that you're working with so if you're generating search queries it's hey here's the question I I saw as an input here's the Chain of Thought I generated here is the search quy I I produced and maybe here are the passages I conditioned on in the meanwhile whatever other elements right whatever other elements um and the idea here is that um you can sort of create a lot of these in an iterative fashion by simply using the logic or the scaffolding um of your of your program so that's the key parameter other parameters we have people sort of U explore and um sort of we have additional teleprompters for there are the instructions there's been a a bunch of recent papers on sort of tun the instructions for models usually what's happening there is people are looking at like a prompt as the logic um and so they're usually trying to change the last few words mainly before like the prompt model to get the answer so generally basically generalizations of Chain of Thought but what we're working at at the level here is one where you generally uh have a more complex system with multiple stages which is what you have in Lama index and in kind of many practical settings um and you know the interactions between these things are not things for which you have a lot of labels you don't know a priori what kind of search quties you want to generate so you can't really optimize this sort of by um you know more standard techniques and so you um you know teleprompters would allow you to um to do that if you define instructions as a as a as one of the parameters there um another thing is um this is kind of a special case of thinking about the instruction is uh the exact signature to um U prompt mapping so you said you want to generate cities but maybe the language model knows a better you know keyword to use for these for these types that you have that's another thing that could be automated um although demonstrations generally take most of the burden um out of that um you could also have hyper parameters in in your logic like how many passages do I retrieve or other fact or like what thresholds should I use U these are all things that are not you know our Focus right now but are pretty trivial to add because the infrastructure around the automatic optimization is there so you basically have the scolding um for this stuff but I'd like to emphasize that you know demonstrations are the key part because they allow you to find tun models they allow you The Prompt models allow you to get really exellent results pretty pretty quickly um so yeah got it uh thanks for the thanks for the answers um and then next piece here is like what are the how do you like Define the metrics that you optimize for do you predefine this do you let the users Define this like is this some concept of like accuracy cost latency like like um do users Define these objectives or do you do it yourself I think it's it's it's completely user uh defined but you know we have a lot of buil-in things that you could just use if you're doing something fairly standard like answer correctness is a common thing especially in academic benchmarks they're kind of reusable things but in practice um what we've seen and what we've built a few times is um usually your metric is going to be another DSP program so um we were doing this task where we had to do data mutation like we had a specific data format and we had we wanted to generate something completely different and it wasn't something we can deterministically check so we wrote a different dsy program that does the checking and then we compiled that on some on a couple of labels you know we just wrote a few things like you know here you should say yes here you should say no we compiled that that was pretty easy to compile because the output of the evaluator is deterministic it's pretty you know it's pretty easy to evaluate um so the metric there was easy that g gives us a program that we can use as a metric and then we compile the bigger thing with the uh DSi metric so it's completely open-ended what you're given in your metric is the um kind of the output prediction object from your program could be like your answer your query your other things and you're given a trace of all the stages like you know I saw the following Chain of Thought here's what I output it here's the input here's the output and you're given whatever um labels exist on your end maybe that's empty maybe you don't have no labels um or maybe you have the right answer for example and you like to check that the answer is correct is correct maybe you have the right passages and you want to check that you know you're getting the right passages and teleprompters will work we work with any of these to um basically Hill Climb uh on the space of parameters that you have until the you know it it plateaus and it can't optimize your metrics um you know even further fairly standard fairly standard optimization nothing special there so so I want these lines of like time and cost uh one of the questions from the audience is how expensive is this compilation process both maybe in terms of time uh and then like Benchmark like ballpark 3.5 turbo like a dollars like like just how how how much would that cost you yeah great question as cheap or as expensive as you'd like it to be so we've done a lot of compilations that take 20 seconds um and cost are virtually free especially if you're doing this with a local model teaching itself I mean nothing is free but you know costs that are sort of pre paid in some sense um and you know takes a few minutes um we've done things that where you're you know you're going to find tune the local model and for that you're going to um ask llama to annotate you know a couple thousand examples um so and then you're gon to train a model on that so it might take an hour right so that's the usual range I think the longest we've done are things that might take like five or six hours where we're training like flan x large on thousands of you know generated you know internally generated data and what it takes time is you got to go to llama and you know bootstrap all this these examples and whatnot um but you know we you can run if you're doing this with GPT 3.5 and you're compiling on a you know a couple dozen examples couple hundred examples in the in in a large case you never you never need more than that um you know you can expect to pay uh $10 is a good is a good range maybe less uh5 to $10 depending on particular details that you can configure so you know you have all the control over the settings but basically the main setting here is how many trials do you want to conduct like do you want to try 100 things which is a lot you just want to try a couple like four or five and um you generally get diminishing returns so if you're you'd like to get something very quickly four or five trials avoids kind of worst case scenarios and gives you really excellent performance most of the time I see and and just to maybe just to double check my understanding is this something where like the user specifies the different things that you can do like to find tuna language model to to add like fut shot prompts like a bunch of different options or oh sorry is this something the user what oh is this something that the user figures out on their o like manually specifies like B that's by um hey go and find to an llm or go and and um like figure out like the best F shot prompts or is something that you automatically decide for them um there are many different teleprompters so there's a teleprompter called which set fine tune and there you specify well I want to F tune into what like I'd like to fine tune into open AI GPT 3.5 fine tuning or maybe T5 large t5x large um and the idea is that once you've written your program what I think a lot of us are realizing in this space is like hey maybe I should find I mean should I find tune or should I do rag and the answer is like why should you choose I mean it's the same program just say find tune with drag and you know you just literally say that it's one liner you got to find tune um maybe you would like to do prompting instead so um is it something that can be trivially automated by having like a teleprompter that tries different things and gives it back to you yes but I don't think that's the right mode because that's such an important decision that you want to be at least saying yes I want to find tune or yes I'd like to prompt but the steps of fine tuning are not something you're going to do you know you just say here's a program here is my sort of uh metric do a good job right um yeah that that makes a lot of sense uh so going back to the question and the audience um I think one question uh is how how much like does does DS so I think in the experiment like you demonstrate a lot of like performance gains um can like this idea of compilation actually reduce like latency or time compared to a like vanilla QA pipeline that you could Implement in lond Dax SL training other Frameworks great question uh we are not focused on like systems level optimizations although there are people that stford who might have interesting things there to add to the to the framework um but that's not sort of the core Focus um the main way we can cut costs here is two things um we can compile simple pipelines to really good quality like you could do just basic Rag and you know without multiple steps and when it's bootstrapped when it's compiled it does better than the I think it does better in some cases than the multihop thing you know without all of the compilation obviously if you compile it could do even better um but the other element of this is that we can usually get you get things working really well with very small models so the pipeline itself has all the steps we're not making any steps faster it's just that you don't have to do like GPT 4 for for almost anything really because if you compile it with llama you know or 13 billion or distill that you know into T5 you can get really excellent performance as long as you understand your task like meaning if your if your T is Task is very opaque like you know you just have an input and you have no clue what to do with it um obviously no free lunch right but if you have a pipeline that you understand well you can get really excellent performance from very small models so that's the main way latency improvements come in I see I feel like given that you just have let the user Define a bunch of metrics like just from a general optimization problem if you let them Define like I don't know like leny or or time as like a metric to optimize for it you could hypothetically just go in and pick some combination of techniques that so it's easy to optimize for it's easy to optimize for proxies to that like I'd like my prompt to be under a thousand tokens uh assuming that affects your cost which it does um that that's easy to do um actual latency is hard because usually these things are highly paralyzed like when you say compile it doesn't it's not it's not going to do one thing at a time it's going to launch a bunch of different things at once so it's going to be actually fairly non-trivial to think about how to get like those profiling elements of latency KRA um how how much does uh this is a question from the audience um like uh how do you like do you have plans to integrate the teleprompt like the compilation with kind of like other evaluation Frameworks things going on um like if the idea is just so that users can Define like arbitrary metrics is there a way for them to kind of integrate with other you know there's a lot of like eval toolkits these days for for LMS um and to Som like pipe the result and create that integration something like what um sorry like there's a lot of eal toolkits these days for LM so is there a way to kind of like um integrate dpy with some sort of like Downstream evaluation provider so that that would provide signal back into what to optimize for so um we DSP is pretty modular and small it's not meant to be you know uh an overarching thing that you have to use alone it's generally pretty it's it's plugin play uh as long as you write your core pipeline logic there can do whatever you want for for instrumentation or for bigger things um and if you go through the frequently asked questions on the page we actually explicitly say like we I it would be cool for people to build Frameworks on top of this our mental model for this is pytorch you know there's lots of Frameworks built on top of py torch T either you know for for lower level things like by torch lightninging and whatnot but I think uh you know I usually think of like hugging face Transformers where you know you have this huge ecosystem um you can do instrumentation with things like um you know um also tensor board and other stuff like that um as long as as long as I think this is something good for all of us um as long as your inputs or outputs are clean you know we're we're gonna give you strings at the end of the day like the output of just a data structure so as long as the other tool that you're working with has good their faces it's going to be pretty trivial to integrate it we're not making any like weird decisions that make Integrations hard tracing is easy Integrations are easy are we going to have anything built in per se um not really on the road map for the for the near future uh in inside insid
Original Description
In this webinar, we're excited to host Omar Khattab and Thomas Joshi to present DSPy - a framework for LLMs that emphasizes programming over prompting.
Instead of hand-crafting prompts for LLMs, with DSPy you can compose modules together in a declarative, Pythonic syntax (instead of manually writing prompts). You can then feed it to a compiler which will auto-tune the prompt (or even fine-tune the LM) to optimize the task for your use case!
Lots of cool concepts in here, many of which haven't been explored in LlamaIndex yet.
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from LlamaIndex · LlamaIndex · 33 of 60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
▶
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
LlamaIndex Virtual Meetup (May 4th, 2023)
LlamaIndex
LlamaIndex + MongoDB Workshop/Fireside Chat
LlamaIndex
Discover LlamaIndex: Ask Complex Queries over Multiple Documents
LlamaIndex
Discover LlamaIndex: Document Management
LlamaIndex
Discover LlamaIndex: Joint Text to SQL and Semantic Search
LlamaIndex
Discover LlamaIndex: JSON Query Engine
LlamaIndex
LlamaIndex Webinar: Active Retrieval Augmented Generation
LlamaIndex
LlamaIndex Webinar: Demonstrate-Search-Predict (DSP) with Omar Khattab
LlamaIndex
LlamaIndex Sessions: Practical challenges of building a Legal Chatbot over your PDFs
LlamaIndex
LlamaIndex Webinar: Graph Databases, Knowledge Graphs, and RAG with Wey (NebulaGraph)
LlamaIndex
LlamaIndex Webinar: Community Project Showcase (07/07/2023)
LlamaIndex
LlamaIndex Webinar: LLMs for Investment Research (with Didier Lopes, co-founder/CEO at OpenBB)
LlamaIndex
Discover LlamaIndex: Bottoms-Up Development With LLMs (Part 1, LLMs and Prompts)
LlamaIndex
Discover LlamaIndex: Bottoms-Up Development With LLMs (Part 2, Documents and Metadata)
LlamaIndex
Discover LlamaIndex: Key Components to build QA Systems
LlamaIndex
Discover LlamaIndex: Bottoms-Up Development with LLMs (Part 3, Evaluation)
LlamaIndex
LlamaIndex Webinar: From Prompt to Schema Engineering with Pydantic (with @jxnlco)
LlamaIndex
Discover LlamaIndex: Bottoms-Up Development with LLMs (Part 4, Embeddings)
LlamaIndex
Discover LlamaIndex: Custom Retrievers + Hybrid Search
LlamaIndex
LlamaIndex Webinar: Document Metadata and Local Models for Better, Faster Retrieval
LlamaIndex
LlamaIndex Webinar: Build Personalized AI Characters with RealChar
LlamaIndex
LlamaIndex Webinar: Make RAG Production-Ready
LlamaIndex
LlamaIndex Workshop: Building RAG with Knowledge Graphs
LlamaIndex
Discover LlamaIndex: Introduction to Data Agents for Developers
LlamaIndex
LlamaIndex Webinar: Finetuning + RAG
LlamaIndex
Discover LlamaIndex: SEC Insights, End-to-End Guide
LlamaIndex
Discover LlamaIndex: Custom Tools for Data Agents
LlamaIndex
LlamaIndex Sessions: Building a Lending Criteria Chatbot in Production
LlamaIndex
Discover LlamaIndex: Bottoms-Up Development with LLMs (Part 5, Retrievers + Node Postprocessors)
LlamaIndex
LlamaIndex Webinar: How to Win a LLM Hackathon
LlamaIndex
LlamaIndex Webinar: LLM Challenges in Production (w/ Mayo Oshin, AI Jason, Dylan from Starmorph)
LlamaIndex
LlamaIndex Webinar: Agents Showcase!
LlamaIndex
LlamaIndex Webinar: Learn about DSPy
LlamaIndex
LlamaIndex Webinar: Time-based retrieval for RAG (with Timescale)
LlamaIndex
LlamaIndex Webinar: Build/Break/Test LLM Apps Showcase (co-hosted with TrueEra, Pinecone)
LlamaIndex
LlamaIndex Workshop: Evaluation-Driven Development (EDD)
LlamaIndex
LlamaIndex Webinar: Building LLM Apps for Production, Part 1 (co-hosted with Anyscale)
LlamaIndex
LlamaIndex Webinar: Learn about Fine-tuning + RAG (w/ Victoria Lin, author of RA-DIT)
LlamaIndex
LlamaIndex Webinar: What's next for AI after OpenAI Dev Day?
LlamaIndex
Introducing create-llama
LlamaIndex
LlamaIndex Webinar: PrivateGPT - Production RAG with Local Models
LlamaIndex
Multi-modal Retrieval Augmented Generation with LlamaIndex
LlamaIndex
LlamaIndex Webinar: LLaVa Deep Dive
LlamaIndex
A deep dive into Retrieval-Augmented Generation with Llamaindex
LlamaIndex
LlamaIndex Workshop: Multimodal + Advanced RAG Workhop with Gemini
LlamaIndex
LlamaIndex Webinar: Efficient Parallel Function Calling Agents with LLMCompiler
LlamaIndex
Introduction to Query Pipelines (Building Advanced RAG, Part 1)
LlamaIndex
LLMs for Advanced Question-Answering over Tabular/CSV/SQL Data (Building Advanced RAG, Part 2)
LlamaIndex
LlamaIndex Webinar: Advanced Tabular Data Understanding with LLMs
LlamaIndex
Ollama X LlamaIndex Multi-Modal
LlamaIndex
Build Agents from Scratch (Building Advanced RAG, Part 3)
LlamaIndex
LlamaIndex Webinar: Build No-Code RAG with Flowise
LlamaIndex
LlamaIndex Sessions: Practical Tips and Tricks for Productionizing RAG (feat. Sisil @ Jasper)
LlamaIndex
Introduction to LlamaIndex v0.10
LlamaIndex
Build SELF-DISCOVER from Scratch with LlamaIndex
LlamaIndex
Introducing LlamaCloud (and LlamaParse)
LlamaIndex
LlamaIndex Sessions: 12 RAG Pain Points and Solutions
LlamaIndex
LlamaIndex Webinar: RAG Beyond Basic Chatbots
LlamaIndex
A Comprehensive Cookbook for Claude 3
LlamaIndex
LlamaIndex Webinar: RAPTOR - Tree-Structured Indexing and Retrieval
LlamaIndex
More on: LLM Foundations
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
Claude AI vs ChatGPT: Which One Is Actually Better in 2026?
Medium · AI
Claude AI vs ChatGPT: Which One Is Actually Better in 2026?
Medium · Programming
IntelliBooks: Classic RAG vs Graph RAG vs Agentic RAG – Choosing the Right AI Retrieval Architecture for Enterprise AI
Dev.to AI
Fluid, natural voice translation with Gemini 3.5 Live Translate
Dev.to AI
🎓
Tutor Explanation
DeepCamp AI