High Agency Pydantic over VC Backed Frameworks — with Jason Liu of Instructor

Latent Space · Beginner ·✍️ Prompt Engineering ·2y ago
Today's guest is Jason Liu, creator of Instructor (https://github.com/jxnl/instructor). You might have seen him on Twitter (https://twitter.com/jxnlco) or at our AI Engineer Summit last year! Instructor was downloaded over 150,000 times last month and helps thousands of developers create structured outputs from LLMs. It's used for data extraction, knowledge graph generation, as well as AI agents planning. We dove into all of those use cases in today's episode! Show notes: https://www.latent.space/p/instructor Timestamps: 00:00:00 Introductions 00:02:50 Early experiments with Generative AI at StitchFix 00:09:39 Design philosophy behind the Instructor library 00:13:17 JSON Mode vs Function Calling 00:14:43 Single vs parallel function calling 00:16:28 How many functions is too many? 00:20:40 How to evaluate function calling 00:24:01 What is Instructor good for? 00:26:41 The Evolution from Looping to Workflow in AI Engineering 00:31:58 State of the AI Engineering Stack 00:33:40 Why Instructor isn't VC backed 00:37:08 Advice on Pursuing Open Source Projects and Consulting 00:42:59 The Concept of High Agency and Its Importance 00:51:06 Prompts as Code and the Structure of AI Inputs and Outputs 00:53:06 The Emergence of AI Engineering as a Distinct Field

What You'll Learn

The video discusses the use of High Agency Pydantic for structure outputs and prompt engineering, with Jason Liu, creator of Instructor, sharing his experience and insights on the topic. He compares Pydantic to VC-backed frameworks and highlights the benefits of using Pydantic for building tooling and creating structured data outputs.

Full Transcript

hey everyone welcome to the laden space podcast this is alesio partner at CTO and residents at deible partners and I'm joined by my co-host swix founder of small AI hello we're back in the remote studio with Jason Lou from instructor welcome Jason hey there thanks for having me Jason you are extremely famous so I don't know what I'm going to do introducing you but you're one of the waterl uh clan um there's like the small Cadre of you that is completely dominating machine learning um actually can you list like what Lums that you're like you you know are just dominating and crushing it right now so like John from like uh Rana is is doing his like inversion models right I know like uh CH C Chan from W he was like one of the kids where I when I started the dead sence Club he was one of the the guys who were like joining in and just like hanging out in the room and now he's like was at Tesla working with like karpathy now he's at open AI you know uh he's in my uh Climbing Club uh oh hell yeah yeah how is I haven't seen him in like six years now to to get in the social scene in San Francisco you have to climb um so yeah I'm sure sure both in career and in rocks yeah I mean a lot of good like problem solving there but oh man I feel like now they put me on the spot I don't know it's okay there there was a yeah there was there was a riff okay but anyway so uh you started the data science club in waterl we can talk about that uh but then also spent 5 years at Stitch fix as an mle um you p to use opening eyes llm to increase stylus efficiency um so you must have been like a very very early user this is this was like pretty early on yeah I mean this was like I mean this was like gbt 3 okay so we actually were using Transformers at Stitch fix like before the gbd3 model so we were just using Transformers recommendation systems at that time I was very skeptical of Transformers I was like why do we need all this infrastructure we can just use like Matrix factorization when J2 came out I fine-tuned my own bd2 to write like rap lyrics and I was like okay this is cute okay I got to go back to my real job right like who cares if I can rap like write a rap lyric When GB the uh instruct came out again I was very much like why are we using like a post request to review every comment a person leaves like we can just use like classical models so I was very against language models for like the longest time and then when Chachi PD came out I basically just like wrote a long apology letter to like everyone at the company it's like hey guys you know I was very dismissive of some of this technology I didn't think he would scale well and I am wrong this is incredible and I immediately just transitioned to go from computer vision recommendation systems to llms but uh funny enough now that we have rag we're kind of going back to recommendation systems yeah speaking of that I think Alesia is going to bring up the yeah I was going to say we had Brian Bishop from X on the podcast did you work did you overlap stitchfix yeah yeah he he was like one of my main users of the uh recommendation framework that I had built out at stitchfix yeah we talked a lot about rexus so that makes sense yeah so I actually like now I have adopted that line that rag is uh rexus and you you know if you're trying to reinvent New Concepts you you should study rexus first U because you're going to independently reinvent a lot of Concepts so your system was called flight it's a recommendation framework with over 80% adoption servicing 350 million requests every day um wasn't there something existing at Stitch fix like why why did you have to write one from scratch no so I think because at Stitch fix a lot of the machine learning engineers and data scientists are writing production code sort of every team's systems were very bespoke it's like this this team only needs to do like real-time recommendations with small data so they just have like a fast API app with some like pandas code this other team has to do a lot more data so they have some kind of like spark job that does some batch ETL that does a recommendation right and so what happens is each team writes their code different and I have to come in and like refactor their code and I was like oh man I'm refactoring four different code bases four different times wouldn't it be better if all the code quality was my fault all right let me just write this framework Force everyone else to use it and now one person can maintain five different systems rather than five teams having their own bespoke systems and so it was really a need of just sort of standardizing everything and then once you do that you can do observability across the entire Pipeline and make like large sweeping improvements in this infrastructure right if we notice that something is slow we can detect it on the like operator layer just hey hey like this team you guys are doing this operation it's lowering our lency by like 30% if you just optimize your python code here we can probably make an extra million dollars so let's like jump on a call and figure this out and then a lot of it was just like doing all this observability work to figure out what the heck is going on and optimize this system from not only just a code perspective but just like sort of like harassing the or against saying like we need to add cash in here we're doing duplicated work here let's go clean up the systems yeah uh one more system that I'm interested in finding out more about is your similarity search system using clip uh gpt3 embedding and Fe um where you you said 50 over $50 million in annual revenue so of course they gave all that to you right um no no I mean the stock went up and down but you know I got a little bit so I'm pretty happy about that um but there you know that was when we were doing like fine-tuning like res Nets to do image classification and so a lot of it was given an image if we could predict the different attributes we have in the merchandising and we can predict like the inex in beddings of the comments then we can kind of build a image vector or image embedding that can capture both descriptions of the clothing and sales of the clothing and then we would use these additional vectors to augment our recommendation system and so with this like the the recommendation system really was a just around like what are similar items what are complimentary items what are items that you would wear in a single outfit and being able to say on a product page let me show you like 15 20 more things and then what we found was like hey when you turn that on you make a bunch of money yeah so okay so you didn't actually use gpt3 and biddings you you fine tune your own U because I'm surprised that gpt3 worked off the shelf okay okay um we we because I mean at this point we would have like you know 3 million pieces of inventory over like a billion interactions between users and clothes um any kind of fine would would definitely outperform like some off off the shop model cool um I'm I'm about to move on from Stitch fix but uh you know any other like fun stories from the Stitch fix days that you want to cover no I think that's basically it I mean the biggest one really was the fact that like I think for just four years I was so bearish on language models and just NLP in general I'm just like oh like none of this really works like why would I spend time focusing on this I got to go do the things that makes money recommendations bounding boxes image classification yeah and uh now I'm like prompting an image model I like oh man I was wrong um I think okay so uh you know be uh so my speciic question would be you know I think you have a bit of a drip and I don't you know my my primary wardrobe is uh free startup conference t-shirts um should more technology brothers be using Stitch fix or what's your fashion advice oh man I mean I'm not a user of Stitch fix right it's like I enjoy going out and like touching things and putting things on and trying them on right I think Stitch fix is a place where you kind of go because you want the work offloaded whereas like I really love the clothing I buy where I have to like when I land in Japan I'm doing like a 45 minute walk up a giant Hill to find this like weird denim shop like that's the stuff that really excites me but I think the bigger thing this really captures is this idea that like narrative matters a lot to human beings okay and I think the recommendation system that's really hard to capture like it's easy to sell it's easy to use AI to sell like a $20 shirt but it's really hard for AI to sell like a $500 shirt but people are buying $500 shirts you know what I mean like there's there's definitely something that we can't really capture just yet that we probably will figure out how to in the future right well it'll probably I'll put in Json uh which is what we're going to turn to next uh so you then went on a sabatical to South Park Commons in New York which is unusual because it's usually basically in 20120 2020 really I was like just like enjoying working a lot and so I was just like building a lot of stuff this is where we were making like the you know tens of millions of dollars doing stuff and then I had a hand injury and so I really couldn't code anymore for like a year or two years and so I kind of took sort of half of it as medical leave the other half I became more of like a tech lead just like making sure the systems were like lights were on and then when I went to uh New York I spent some time there and kind of just like wound down the tech work you know did some pottery did some Jiu-Jitsu and uh after gbd came out I was like oh like I clearly need to figure out what what is going on here cuz something is feels very magical I don't understand it so I spent basically like five months just prompting and playing around with stuff and then afterwards it was just my startup friends going like hey Jason you know my investors want us to have an AI strategy can you help us out and it it just snowballed more and more until I was sort of like make making making this my full-time job and um you know you had YouTube University and a journaling app you know a bunch of other uh Explorations U but it seems like the the the most productive or the most um best known thing that came out of your time there was instructor yeah wran uh well well tell us the origin story yeah I mean I think at some point you know tools like guard rails and Marvin came out right those are kind of tools that like use XML and pantic to get structure data out but they really were doing things sort of in the prompt and these were built with sort of the instruct models in mind and I really like i' already done that in the past right at Stitch fix you know one of the things we did was we would take a reest note and and turn that into a Json object that we would use to send to uh our search engine right so if you said like I wanted you know skinny jeans that that were this size that would turn into Json that way you would sign to our internal search apis but it always felt kind of gross a lot of it is just like you read the Json you like parse it you make sure the names are strings and ages are numbers and you do all this like messy stuff but when function calling came out it was very much sort of a new way of doing things right function colon lets you define the schema separate from the data and the instructions and what this meant was uh you can kind of have a lot more complex schemas and just map them in pantic and then you can just keep those very separate and then once you add like methods you can add validators and all that kind of stuff the one thing I really had with a lot of these libraries though was it was doing a lot of the string formatting themselves which was fine when it was the instruction tune models you just have a string but when you have uh these new chat models you have like these chat messages and I just didn't really feel like um not being able to access that for the developer was sort of a good benefit that they would get and so I just said let me write like the most simple SDK around uh the openi SDK so simple rapper on the SDK just handle the response model a bit and kind of think of myself more like a requests than an actual framework that people can use and so the go is like hey like this is something that you can use to build your own framework but let me just do all the boring stuff that nobody really wants to do right people want to build their own Frameworks but people don't want to build like Json parsing uh and the retrying and and all that other stuff um yeah it's an we had this a little bit of this discussion before the show but like that design principle of going for being requests rather than being Jango um tell like what inspires you there like um is there just does this come from a lot of Prior pain are there other open source projects that kind of inspired your philosophy here yeah I mean I think it would be requests right like I think it is just some it is just the obvious thing you install like if you were going to go make like HTTP requests in Python you would obviously import requests maybe if you want to do more async work there's like Future tools but like you don't really even think about installing it and when you do install it you don't think of it as like though this is a requests app all right like this is just python like the bigger question is like a lot of people ask questions like oh why isn't request like in the standard Library yeah like that's how I want my library to feel right it's like oh if you're going to use the llm SDK you're obviously going to install instructor and then I think the second question would be like oh like how come instructor doesn't just go into open ai go into anthropic like if that's the conversation we're having like that's where I feel like I've succeeded yeah it's like so standard you may as well just have it in the based libraries mhm and the shape of the request has stayed the same but initially function calling was maybe equal structure outputs for a lot of people I think now the models also support like Json mode and some of these things and um you know return Json of my grandma's going to die all of that stuff is maybe maybe to the side how how have you seen that Evolution like maybe what's the the mattera game today like should people just forget about function calling for structure outputs or where when is structure output like Json mode the best versus not uh we love to get any thoughts given that you do this every day yeah I would almost say these are like different implementations of like the real thing we care about is the fact that now we have typed responses to language models and because we have the response my ID is a little bit happier I get auto complete if I'm using the response wrong there's a little red squiggly line like those are the things I care about in terms of whether or not like Json mode is better I usually think it's it's almost worse unless you want to spend less money on like the prompt tokens that the function call represents um primarily because with Json mode you don't actually specify the schema so sure like Json load works but really I I care a lot more than just the fact that it is Json right um I think function column gives you a tool to specify the fact like okay this is a list of objects that I want and each object has a name or an age and I want the age to be above zero and I want to make sure it's parsed correctly um that's where kind of function calling really shines MH any thoughts on um single versus parallel function calling uh when I first started so I did a presentation at our AI in action Discord um Channel and obviously showcase instructor uh one of the big things with that we had before with single function callings like when you're trying to extract lists you have to make these funky like properties that are list to then actually return all the objects um how how do you see the hack being put on the developers plate versus like more of the stuff just getting better in the model um and I know you you tweeted recently about entropic for example you know some list or not list or strings and there's like all of these discrepancies I almost would prefer it if it was always a single function but obviously there is like the agents workflows that you know instructor doesn't really support that well but are things that you know ought to be done right like you could Define I think maybe like 50 or 60 different functions in a single API call and you know if it was like get the weather or turn the lights on or do something else it makes a lot of sense to have these parallel function calls but in terms of an extraction workflow I definitely think it's probably more helpful to have everything be a single schema right just because you can sort of specify relationships between these entities right that you can't do in function parallel function calling you can have a single chain of thought before you generate a list of results like there's like small like API differences right where yeah like if it's for parallel function calling if you do one like again really I really care about the how the SDK looks and so it's okay do I always return a list of functions where or do you just want to have the actual object back out and you want to have like auto complete over that object what's kind of the the cap for like how many function definitions you can put in where it still works well do you have any any sense on on that I mean for the most part I haven't really had a need to do anything that's more than like six or seven different functions I think in the documentation they support way more mhm but yeah I don't even know if there's any good evals that have like you know over like two dozen function calls I think there like if if you're running into issues where you have like 20 or 50 or 60 function calls I think you're much better having those specifications St Vector database and then have them be retrieved right so if there are 30 tools like you should basically be like ranking them and then using the topk to uh do selection a little bit better rather than just like shoving like 60 functions into a simple API yeah well I mean so I think this is relevant now because previously I think context limits prevented you from having more than like you know a dozen tools anyway and now that we have million token context Windows um you know Cloud Rec with their new function calling release said they can handle two over 250 tools which is insane to me that's that's a lot um I would say like you know you're saying like you know you don't think there's many people doing that um I think anyone with a sort of agent likee platform where you have a bunch of connectors um they would run into that problem probably you're right that they should use a vector database and kind of rag their tools um I know zapier has like a few thousand like 8,000 9,000 connectors that um you know obviously don't fit anywhere so yeah I mean that I think that would be it unless you need some kind of intelligence that chain things together which is I think what alesio is coming back to right like there's this trend about parallel function calling I don't know what I think about that anthropics version was um I think they they use multiple Tools in sequence but they're not in parallel I I I haven't explored this at all I'm just like throwing this open to you as to like what do you think about the all you know do we assume that all function calls could happen in any order or like I think there's a lot of like and in which case like we either can assume that or we can assume that like things need to happen in some kind of sequence as a dag right but if it's a dag really that's just like one Json object that is the entire dag rather than going like okay the order of the functions that return don't matter like that's just that's definitely just not true in practice right like if I have a thing that's like turn the lights on like unplug the power and then like turn the toaster on or something like the order does matter right um and it's unclear how well you can describe the importance of that reasoning to a language model yet I mean I'm sure you can do it with like good enough prompting but I just haven't any I don't haven't had any use cases with the function sequence really matters yeah to me the most interesting thing is the models are better at picking than your ranking is usually like I'm in biding a company around system integration and for example with one system there are like 780 end points and if you actually try and do Vector similarity it's not that good because the people that brought the specs didn't have a mind making them like semantically apart you know they're kind of like oh create this create this create this versus when you give it to a model and you put like in in Opus you put them all it's quite good at picking which ones you should actually run um and I'm curious to see if the model providers actually care about some of those workflows or if the agent companies are actually going to build very good rankers to kind of fill that Gap yeah my money is on the rankers because you can do those so easily right you could say well given the embeddings of my search query and the embeddings of the description I can just train XG boost and just make sure that I have very high like Mr which is like mean reciprocal Rank and so like the only objective is to make sure that the tools you use are in the top and filtered like that feels super straightforward and you don't have to actually figure out how to find T in a language model to do tool selection anymore um yeah I definitely think that's the case cuz I for the most part I imagine you either have like less than three tools or more than a thousand mhm like I don't know how like what kind of companies oh thank God we only have like 185 tools and it this works perfectly right that's right um and before we maybe move on just from this um it was interesting to me you retweeted this thing about entropic function calling and it was uh Joshua Brown's uh retweeting some Benchmark that it's like oh my God entropic function calling so good and then uh you retweeted and then you tweeted later and it's like it's actually not that good uh what's your flow for like how do you actually test these things because obviously the benchmarks are lying right because the Benchmark say it's good and you said it's bad and I trust you more than the Benchmark uh how do you think about that and then how do you evolve it over time yeah like it's it's mostly just client data like I think when like I actually have been mostly busy with enough client work that I haven't been able to reproduce public benchmarks and so I can't even share some of the results anthropic but I would just say like in production we have some pretty interesting schemas where it's like you know iteratively building lists where we're doing like updates of lists like we're doing in place updates so like upserts and inserts and in those situations we're like oh yeah we have a bunch of different parsing errors like numbers are being returned as strings we we're expecting lists of objects but we're getting strings that are like the strings of Json right so we have to like call Json parse on individual elements um overall I'm like super happy with the um anthropic models compared to the openi models like Sonet is very cost effective Hau is in function calling is actually better um but I think they just have to sort of like file down the edges a little bit where like our tests pass but then we actually deploy a production we get like you know half a percent of traffic you know having issues where like if you ask for Json it'll still try it'll try to talk to you or if you use function calling you know we'll have like a parse error and so I think these are things are definitely going to be things that are uh fix in like the upcoming weeks um but in terms of like the reasoning capabilities man like it's hard to beat like 70% uh cost cost reduction especially when you're building consumer applications right like if you're building something for a consultants for private Equity like you're charging 400 it doesn't really matter if it's a dollar or $2 dollar but for Consumer apps it it makes products viable like if you can go from for to Sonic you you might actually be able to price it better yeah I I had this chart about the ELO versus the cost of all the models and uh you know you could you could put Trends Trend graphs on each of the each of those things about like you know higher ELO equals higher cost except for Haiku Haiku kind of just broke the lines or the iso elos if you want to of call it um cool uh before we go too far into like uh you know your opinions on just the the overall ecosystem uh I want to make sure that we map out the surface area of instructure I would say that most people would be familiar with instructure from your talks and your twe and all that uh you had a you had the number one talk at from the AI engineer Summit uh two Jason Lou and Jerry Lou yeah yeah yeah yeah yeah you have to be uh named start with j and then end end the for to do well um but yeah but until I until I actually went through your cookbook um I didn't realize like the the surface area like how would you C categorize like the the the use cases right you have like um LM self-critique you have knowledge graphs in here you have pii data sanitation um How do you to people like what is the surface area of instruct instructor yeah so I mean this is the part that feels crazy because really the difference is llms give you strings and instructor gives you data structures and once you get data structures again you can do every like lead code problem you ever thought of right um and so I think there's a couple of really common applications the first one obviously is extracting structured data this is just be okay well like I want to put in an image of a receipt I want to give back out a list of checkout items with a price and a fee and a coupon code or whatever that's one application another application really is around extracting graphs out so one of the things we found out about these language models is that not only can you define nodes it's really good at figuring out what are nodes and what are edges and so we have a bunch of examples where you know not only do I extract that you know this happens after that but also like okay these two are dependencies of another task and you can do you know extracting complex entities that have relationships given a story for example you could extract relationships of like families across different characters this can all be done by defining a graph um and then the LA the last really big application really is just around query understanding the idea is that like any any API call has some schema and if you can Define that schema ahead of time you can use a language model to resolve a request into a much more complex request um one that an embedding could not do so for example I have a really popular post called like rag is more than embeddings and effectively you know if if I have a question like this what was the latest thing that happened this week that embeds to nothing right but really like that query should just be like select all data where the date time is between today and today minus 7 Days right um what if I said how did my writing change between this month and last month again Tings would do nothing right but really if you could do like a group buy over the month and a summarize then you could again like do something much more interesting and so this really just calls out the fact that embeddings really is kind of like the lowest hanging fruit and using something like an instructor can really help produce that data structure and then you can just use your computer science to reason about this data structure maybe you say okay well I'm going to produce a graph where I want to group buy each month and then summarize them jointly you can do that if you know how to define this data structure that in that part you kind of run up against like the Lang chains of the world that used to have that me they still they still do have like the self querying I think it's they used to call it on on when we had Harrison on in our episode um how do you how do you see yourself interacting with the other I guess LM Frameworks in the ecosystem yeah I mean if they use instructure I think that's totally cool I think because of just again it's like it's just python right it's like it's like asking like oh how does like Jango interact with requests well you just might make a request.get in ajango app right um but no one would say oh like I like went off of Django because I'm using requests now like these are like they should be ideally like sort of the wrong comparison in terms of espeically like the agent workflows I think the real goal for me is to go down like the llm compiler route which is instead of doing like a react type reasoning loop I think my belief is that we should be using like workflows right if we do this then we always have a request and a complete workflow we can find to a model that has a better workflow whereas it's hard to think about like how do you fine-tune a better react Loop yeah do you want to always train it to have less looping in which case like you wanted to get the right answer the first time in which case it was a workflow to begin with right can can you can you define workflow because I think obvious I used to work at a workflow company but I'm not sure this is a well for everybody I'm thinking workflow in terms of like the prefect zap your workflow like I want to build a dag I want you to tell me what the nodes and edges are and then maybe maybe the edges are also like put in with AI but the idea is that like I want to be able to present you the entire plan and then ask you to fix things as I execute it rather than going like hey I couldn't par the Json so I'm going to try again I couldn't par the Jon like I'm going to try again and then next thing you know you spent like $2 on open AI credits right yeah um whereas with the plan you can just see oh The Edge between node like X and Y does not run let me just just it iteratively try to fix that component once it's fixed go on the next component right and obviously you can get into a world where if you have enough examples of the nodes X and Y maybe you can use like a vector database to find a good few shot examples you can do a lot if you sort of break down the problem into like that workflow and executing that workflow rather than looping and hoping the raisining is good enough to uh generate the correct output yeah I I would say um you know I've been hammering on Devon a lot I got um access a couple weeks ago and um obviously for simple task it does it does well for the complicated like multi like more than 10 20 hour tasks um I can see it that's a crazy comparison like we used to talk about like three four Loops wait like only once it gets to like hour task it's hard yeah less than an hour is there's nothing that's crazy I mean I don't know yeah okay maybe maybe my go for have shifted I don't know that's incredible yeah no like I'm like I'm like sub one minute executions like the fact that you're you're talking about 10 hours is incredible I think it's a spectrum um I actually I don't I really really I I I think I'm going to say this every single time bring out de in like let's not reward them for taking longer to do things you know what I mean like that's that's a metric that is easily abusable sure yeah but all I'm saying you can monotonically increase the success probability over an hour like that's winning to me right like obviously if you run an hour and you've made no progress like I think when we were in like Auto gbt Land there was that one example where it's like I wanted to I wanted it to like buy me a bicycle and overnight I spent $7 on credits and I never found the bicycle yeah yeah right I wonder if I wonder if he'll be able to purchase a bicycle um because it actually can do things in real worlds um it just needs to suspend to you for a stuff um but uh the point I was trying to make was that I can see it turning like when it when it gets on I think one of the agents um loopholes or one of the things that that is a real barrier for agents is llms really like to get stuck into a lane and and you know what you're talking about what what what I've seen Devon do is it get stuck in stuck in a lane and it would just kind of change plans based on the performance of the the plan itself yeah I it's kind of cool yeah I I feel like we've gone too much in the looping route and I think a lot of more plans and like dags and data structures are probably going to come back to help fill in some holes yeah what's like the interface to that you know do you see it's like an existing like stately like State machine kind of thing that like connects to the yellow lamps U the traditional like d player so like do you think we need something new for like AI dags yeah I mean I think that the hard part is going to be describing visually the fact that this dag can also like change over her time when it should still be allowed to be fuzzy right um I think in like math ni we have like plate diagrams and like Mark ofan diagrams and like you know recurrent States and all that like that some of that might come into this like workflow world but to be honest I'm not too sure I think right now the first steps are just how do we take this dag idea and break it down to modular components that we can like prompt better have few shot examples for and ultimately like fine tune against um but in terms of even the UI it's hard to say what will uh likely win I think you know people like prefect and zapier have a pretty good shot at doing a good job yeah uh so you seem to use prefa a lot I actually worked at a prefa competitor at temporal uh and I'm also very with Daxter um uh what else would you call out as like particularly interesting in the AI engineering stack man I almost use nothing I just use C and like pie tests okay I think that's basically it you know a lot of the observability companies have more observability companies I've tried Yeah the more I just use postgress really okay postgress for observability but the issue really is the fact that these observability companies isn't actually doing observability for the system it's just doing the llm thing like I still end up using like data dog right or like you know Sentry to do like latency and so I just have those systems handle it and then the like prompt in prompt out latency token costs I just put that in like a postc table now so you don't need like 20 funded startups uh building L mops yeah but I'm also like a like an old tire guy you know what I mean like I think like because of my background like yeah like the python stuff I'll write myself but you know I will also just use versel happily yeah yeah right because I'm just not familiar with that world of uh you know tooling whereas like I think you know I spent like three good years building up theability tools for recommendation systems and I was like oh compared to that like instructor is just one call I just have to put time start time end and then count the prompt tokens right cuz I'm not doing a very complex looping behavior I'm doing mostly um workflows and extraction yeah I mean while we're on this topic uh we'll just kind of get this out of the way like uh you you famously have chose decided to not be a not be a venture backs company you want to do the Consulting route like the obvious the obvious route for you know someone is successful as instructor is like oh here's hosted instructor with like all tooling um and you just you just said you had a whole bunch of experience building oer ability tooling like you have the perfect background to do this and you're not yeah isn't that sick I think that's sick I know I mean I know why because you want to go free dive um but yeah um yeah because I think there's two things right look one it's like if I tell myself I want to build requests request is not a venture back startup right I mean one could argue like whether or not like Postman is but I think for the most part it's like having worked so much I'm kind of like I am more interested in looking at how systems are being applied and just being having access to the most interesting data and I think I can do that more through a Consulting business where I can come in and cre go oh you want to build Perfect Memory you want to build an agent you want to build like automations over construction or like insurance and supply chain or like you want to handle like writing like private Equity like mergers and Acquisitions reports based off of user interviews like those things are super fun whereas like maintaining the library I think is mostly just kind of like a utility I try to keep up keep up especially because if I not Venture backed I have no reason to sort of go down the route of like trying to get a thousand Integrations like in my mind I just go like okay okay 98% of the people use open AI I'll support that and if someone contributes another like platform that's great I'll merge it in but um yeah I mean you only added enth thropic support like this year uh yeah yeah I think a lot of it was just like you couldn't even get an AP I until like this year right that's true that's true so okay if I add it like last year I was kind of I'm trying to like double the code base to service you know half a percent of all downloads do you think the market share will shift a lot now that anthropic has like a you know very very competitive offering I think it's still hard to get uh API access I don't know if it's it's fully GA now if it's G if you if you can get uh commercial access really easily I don't I I got commercial after like two weeks to reach yeah there's there's a call here and then anytime you run into rate limits just like Ping one of the anthropic um staff members we like cut that part out so I don't need to likeed false news but um it's a common question surely just from the price perspective it's going to make a lot of sense like like if you are a business you should totally consider like Sonet right like the cost savings is just going to justify if you actually are doing things at volume and yeah I think the SDK is like pretty good uh but to back to the instructor thing I just don't think it's a billion dollar company and I think if I raise money the first question is going to be like how you making a billion dollar company and I would just go like man like if I make a million dollars as a consultant I'm super happy I'm like more than a static I can have like a small staff of like three people like it's fun and I think a lot of my happiest founder friends are those who like raised the tiny seat round became profitable they're making like 70 60 70 like Mr uh 70,000 Mr and they're like we don't even need to raise the seed R like let's just keep it like between me and my co-founder we'll go traveling and it'll be a great time I think it's a lot of fun I I repeat to the seed investor in the company I I think that's like one of the things that people get wrong sometimes and I see this a lot um they have an insight into like some new tech like say LM sayi and they build some open source stuff and it's like I should just raise money and do this and I tell people a lot it's like look you you can make a lot more money to doing something else and doing a startup like most people that do a company could make a lot more money just working somewhere else than doing the company itself do you have any advice for folks that are maybe in a similar situation they're trying to decide oh should I stay in my like high paid Fang job and just tweet this on the side and and do this on GitHub should I go be a consultant like being a consultant seems like a lot of work it's like you got to talk to all these people you know there there's a lot there's a lot to unpack I think the open source thing is just like well I'm just doing it for like purely for fun and I'm doing it because I think I'm right but part of being right is the fact that it's not a venture back startup like like I think I'm right because this is this is all you need right like you know um so I think a part of is just like part of the philosophy is the fact that this all you need is a very sharp blade to sort of do your work and you don't actually need to build like a big Enterprise so that's that's one thing I think the other thing too that I've kind of been thinking around just because I have a lot of friends that Google that want to leave right now it's like man like what we lack is not money or like money or like skill like what we lack is courage like you just have to do this the hard thing and you have to do it scared anyways right in terms of like whether or not you want to do a Founder I think that's just a matter of like optionality but I definitely recognize that the like expected value of being a Founder is still quite low it is right like I know I know as many founder breakups and as I know friends who ra a SE raised a seed round this year right like that is like the reality and like you know even even in from that perspective it's been tough where it's like oh man like a lot of incubators want you to have co-founders now you spend half the time like fundraising and then trying to like meet co-founders and find co-founders rather than building the thing I was like man like this is a lot of stuff a lot of time spent out doing uh things I'm not really good at I I think I do think there's a rising Trend in Solo founding um you know I I am a solo um I think that something like 30% of like I think I forget what the exact stat is something like 30% of starters that make it to like series b or something actually are Solo founder um so I think I feel like this must have co-founder idea mostly comes from YC and most everyone else copies it and then yeah you like plenty your companies break up over co-funder breakups yeah and I bet it would be like I wonder how much of it is the people who don't have that much like and I hope this is not a diss to anybody but it's like you sort of you go through the incubator route because you don't have like the social Equity you you would need to just sort of like send an email to SEO and be like hey I'm I'm I'm going on this ride do you want do you want a ticket on the rocket ship right like that's very hard to sell like if I was to raise money like that's kind of like my message if I was to raise money is like you've seen my Twitter my life is sick I decided to make it much worse by being a Founder because because this is something I have to do so do you want to come along otherwise I want to fund it myself like if I can't say that like I don't need the money cuz like I can I can like handle payroll and like hire an intern and get an assistant like that's all fine but like what I don't want to do it's like I really don't want to go back to meta I I I want to like get two years to like try to find a problem we're solving that feels like a bad time yeah Jason it's like I wear a YSL jacket on stage at AI engineer Summit I don't need your accelerator money I got enough and boots you don't forget the boots that's true really good boots really good boots um but I I think that is a part of it right I think it is just like optionality and also just like I'm a lot older now I think 22-year-old Jason would have been probably too scared and now I'm like too wise but I think it's a matter of like oh if you raise money you have to have a plan of spending it and I'm just not that creative with spending that that much money yeah I mean to be clear you just celebrated your 30th birthday happy birthday yeah to Mex next weekend uh you know a lot older is relative to some some of the folks listening seeing on the on the career tips um I think swix had a great post about are you too old to get into AI um I saw one of your tweets in January 23 you applied to like figma notion coher anthropic and all of them rejected you because you didn't have enough llm experience uh I think at that time it would be easy for a lot of people to say oh I kind of missed the boat you know I'm too late not gonna make it you know um any advice for people the feel like that you know yeah I mean like the biggest learning here is actually from a lot of folks in Jiu-Jitsu they're like oh man like am I is it too late to start Jiu-Jitsu like oh I'll go to the I'll join Jiu-Jitsu once I get in more shape right it's like there's a lot of like excuses and then you say oh like why should I start now I'll be like 45 by the time I'm any good like well you'll be 45 anyways like time is passing like if you don't start now you start tomorrow you're just like one more day behind and if you're like if you're worried about being behind like today is like the soonest you can start right and so you got to recognize that like maybe you just don't want it and that's fine too like if you wanted it you would have started like you know I think a lot of these people again probably think things think of things on a too too short time Horizon but again you know you're you're going to be old on anyway so you may as well just start now you know one more thing on I guess the um uh career advice sort of blogging um you always go viral for this post that you wrote on advice to young people and the lies you tell yourself oh yeah yeah yeah you said you were writing it for your sister why like why is that yeah yeah yeah she was like bummed out about like you know going to college and like stressing about jobs and I was like what would I really want to hear okay and I just kind of like text the whole thing it's it's crazy it's got like 50,000 views like I'm I'm mind yeah I mean your your average tweet has [Music] more but that thing is like you know a 30 minute read now yeah yeah um so there's lots of stuff here which I agree with you know I'm I'm also of occasionally indulge in the sort of Life reflection phase um there's the the how to be lucky there's the how to how to have higher agency um I feel like the the agency thing is always making a is always a trend in SF or just in Tech circles um how do you define having high agency yeah I mean I'm almost I'm almost like past the high agency phase now now my biggest concern is like okay the agency is just like the norm of the vector what also matters is the direction right it's like how pure is the shot yeah I mean I think agency is just a matter of like having courage and doing the thing that's scary right like you know if if you want to go rock climbing it's like do you decide you want to go rock climbing and then you show up to the gym you rent some shoes and you just fall 40 times or do you go like oh like I'm actually more intelligent let me go research the kind of shoes that I want okay like there's there's flatter shoes and more incline shoes like which one should I get okay let me go order the shoes on Amazon I'll come back in three days like oh it's a little bit too tight maybe it's too aggressive I'm only a beginner let me go CH no I think the higher agent person just like goes and like falls down 20 times right yeah um I think the higher agency person is more focused on like process out uh metrics versus um outcome metrics right like from Pottery like one thing I learned was if you want to be good at Pottery you shouldn't count like the number of cups or bowls you make you should just weigh the amount of clay you use right like the successful person says oh I went through a th000 pounds of clay 100 pounds of clay right the less agency like oh I've made six cups and then after I made six cups there's not really what do you what do you do next no just pounds of clay pounds of clay same with the work here right oh you just got to write the tweets like make the commits contribute open source like write the documentation there's no real outcome it's just a process and if you love that process you just get really good at the thing you're doing yeah so just to push back on this because I obviously I I I mostly agree um how would you design performance review Systems because you're you're effectively saying we can count Lin of code for developers right like did you put up no I don't think that would be the actual like I think if you make that an outcome like I can just expand a for Loop right I think okay so for for performance review this is interesting because I've mostly thought of it from the perspective of Science and not engineering like I've been running a lot of engineering standups uh primarily because there's not really that many machine learning folks like the process outcome is like experiments and ideas right like if you think about outcomes what you might want to think about an outcome is oh I want to improve the revenue or whatnot but that's really hard but if you're someone who is going out like okay like this week I want to come up with like three or four experiments I might move the needle okay nothing nothing worked to them they might think oh nothing worked like I suck but to me it's like wow you've closed off all these other possible avenues for like research like you're going to get to the place that you're going to figure out that direction really soon right like there's no way you try 30 different things and none of them work usually like you know 10 of them work five of them work really well two of them work really really well and one thing was like you know nail nail on the head so agency lets you sort of capture the volume of experiments and like experience lets you figure out like oh that other half it's not worth doing right like I think experience is going to half these prompting papers don't make any sense just use Chain of Thought and just you know use a for Loop um but that's kind of that's basically it right it's like usually performance for me is around like how many experiments are you running like how how often times are you trying yeah when do you give up on an experiment because as Stitch fix you kind of give up on language models I guess in a way and as as a tool to use and then maybe the got better they got better before you know you were kind of like you were right at the time and then the tool improved I think there are similar uh paths in my engineering career where I try one approach and at the time it doesn't work and then the thing changes but then I kind of soured on that approach and I don't go back to it soon enough um yeah how do you think about that that Loop so usually when when I like when I'm coaching folks and I they say oh these things don't work I'm not going to pursue them in the future like one of the big things like hey the negative result is a result and this is something worth documenting like this isn't Academia like if it's negative you don't just like not publish right but then like what do you actually write down like what you should write down is like here are the conditions this is the inputs and the outputs we tried the experiment on and then one thing that's really valuable is basically writing down under what conditions would I revisit these experiments right it's like these things don't work because of what we had at the time if someone is reading this 2 years from now under what conditions will we try again that's really hard but again that's like another that's like another skill you you kind of learn right it's like you do go back and you do experiments you figure out why it works now I think a lot of it here is just like scaling worked yeah right like you could actually like rap lyrics you know like that was because I did not have high enough quality data if we Face shift and say okay you don't even need training data oh great then it might just work mhm yeah different domain do do you have any anything in your list that is like doesn't work now but I want to try it again later something that people should maybe keep in mind you know people always like AGI when you know when are you going to know the AGI here maybe it's less than that but uh any stuff that you tried recently that didn't work that you think uh will get there I mean I think like the personal assistance and the writing I've shown to myself is just not good enough yet so I hired a writer and I hired a personal assistant so now I'm going to basically like work with these people until I figure out like what I can actually like out of and what are like the reproducible steps but like I think the the experiment for me is like I'm going to go like pay a person like thousand of dollars a month that help me improve my life and then let me sort of get them to help me figure out like what are the components and how do I actually modularize something to get it to work because it's not just like oath Gmail calendar and and like notion it's a little bit more complicated than that but we just don't know what that is yet or those are two sort of systems that I wish gbd4 or Opus was actually good enough to just write me an essay but most of the essays are still pretty mhm yeah I would say um you know on the personal assistant side Lindy is probably the one I've seen the most um he was Flo was at a speaker at the summit I don't know if you've checked it out or any other sort of Agents assistant startup not recently I haven't tried Lindy it was they were like be they were not GA last time I was considering it yeah yeah they're not g a lot of it now it's like oh like really what I want you to do is like take a look at all of my meetings and like write like a really good weekly summary email for my clients remind them that I'm like you know thinking of them and like working for them right or it's like I want you to notice that like my Mondays were way like way too packed and like block out more time and also like email the people to do the reschedule and then try to opt in to move them around and then I want you to say oh Jason should have like a 15minute prep break after four back toback those are things that like now I know I can prompt them in but can it do it well before I didn't even know that's what I wanted to prompt for it was like defragging a calendar and adding breaks so I can like eat lunch right yeah that's the AGI test yeah exactly compassion right I think one thing that yeah we didn't touch on it before but I think was interesting uh you had this tweet a while ago about prompts should be code and then there were a lot of companies trying to build prompt engineering tooling kind of trying to turn the prompt into a more structured thing M what's your thought today like you know now you want to turn the the thing can into dags like do prompts should still be code like any any updated ideas no it's the same thing right I think like you know with instructor it is very much like the output model is defined as a code object that code object is sent to the llm and in return you get a data structure so the outputs of these models I think should also be code like code objects and the inputs somewhat should be code objects but I think the one thing that instructor tries to do is separate instruction data and the and the types of the output um and beyond that I really just think that you know most of it should be still like managed pretty closely to the developer like so much of is changing that if you give control of these systems away to early you end up ultimately wanting them back like many companies I know that I reach out are ones where like oh we're going off of the Frameworks because now that we know what the business outcomes we're trying to optimize for these Frameworks don't work yeah because like we do rag but we want to do rag to like sell you supplements or to have you like schedule the fitness appointment and like the the prompts are kind of too baked into the systems to really pull them back out and like start doing upselling or something it's really funny but a lot of it ends up being like once you understand the business outcomes uh you care way more about the prompt right actually this is fun so we were try in our prep for this call we were trying to say like what can you what can you as an independent person say that maybe me and cannot say or you know someone who works in a company say what do you think is the market share of the Frameworks the L chain and the Lama index the everything else oh massive because not everyone wants to care about the code yeah right it's like I think that's a different question to like what is the business model and are they going to be like massively profitable businesses right like making hundreds of millions of dollars that feels like so straightforward right cuz not everyone is a prompt engineer like there's there's so much productivity to be captured in like like back back office a automations right it's not because they care about the prompts that they care about managing these things um yeah but those would be S of low code experiences you know yeah I I think it I think the the bigger challenge is like okay $100 million probably pretty pretty easy it's just time and effort and they have both like the Manpower and the and the money to sort of solve those problems um I think it's just like again if you you go to the VC route then it's like you're talking about billions and that's really the goal that stuff uh for me it's like pretty unclear okay but again that is to say that like I sort of I'm building things for developers who want to use instructure to build their own tooling but in terms of the amount of developers there are in the world versus like Downstream consumers of these things or even just like you know like think of how many companies will use like the Adobes and the ibms right because they want something that's fully managed and they want something that they know will work and if the incremental 10% requires you to hire another team of 20 people you might not want to do it um and I think that kind of organization is really good for uh those are bigger companies and yeah I just want to capture your thoughts on one more thing which is you said you wanted most of the prompts to stay close to developer um I would and ham Hussein wrote this like post which I really love called like Fu show me the prompt I thinkes you in one of those uh part of the blog post and I think DSP is kind of like the complete antithesis of that uh which is I think is interesting because I I I also hold the strong view that AI is a better prompt engineer than you are um and I don't know how to square that um I think wondering if your have thoughts I think something like ds5 can work because there are like very shortterm uh metrics to measure success right it is like did you find the pii or like did you write the multihop question the correct way but in these like workflows that I've been managing a lot of it is like are we minimizing like minimizing turn and maximizing retention like that's not it's not really like a like up tuna like training Loop right like those things are much more harder to capture so we don't actually have those metrics for that right and obviously you can figure out like okay is the summary good but like how do you me measure the quality of the summary right it's like that that feedback loop ends up being a lot longer and then again when something changes it's really hard to make sure that it works across these like newer models or again like changes to work for uh the current promps like when we migrate from like anthropic to to open AI like there's just a ton of changes are like infrastructure related not necessarily around the prompt itself any other engineering startups that you think should not exist before we we wrap up no I mean oh my gosh I mean a lot of it again it's just like every time of investor is like what is how does this make a billion dollars like it doesn't I'm going to go back to just like tweeting and holding my breath underwater yeah like I don't really pay attention too much to most of these like most of the stuff I'm doing is is around like the consumer layer right like it's not the consumer ler but like the consumer of like llm calls I think people just want to move really fast and they willing to pick these vendors but it's like I don't really know if anything has really like blown me out the water like I only trust myself but that's also a function of like just being an old man like I think you know many companies are definitely very happy with using most of these tools anyways um but I definitely think I I occupy like a very small space in the engineering ecosystem yeah I would say um one of the challenges here you know you call about the dealing in the consumer of llms space um I think that's what AI engineering defers from ml engineering and I think a constant disconnect or cognitive dissonance in in this field in in the AI Engineers that have sprung up um is that they're not as good as the ml Engineers they're not as qualified um I think that you know you are someone who has credibility in the mle space and you are also uh you know a very very authoritative figure in the AI eace and I think so you know I think you've built the the de facto leading Library I think yours I think instructor should be part of the standard lib even though I try to not use it like I also try to figure out um that I I basically also end up rebuilding instructor right like um that that's that's a lot of the the back and forth that we had over the past two days um but like yeah like uh I I think that's a fundamental thing that we're trying to figure out like there's there's a very small supply of ml they they're not not like not everyone's going to have that experience that you had um but the the global demand for AI is going to far outstrip the existing mes so what do we do do we force everyone to go through this the standard mle curriculum or do we make a new one I got some takes go I think a lot of these Apper startups should not be hiring ml's because they end up turning yeah they want to work at openi they're just like hey guys I joined and you have no data and like all I did this week was like fix some typescript build errors and like figure out why we don't have any tests and like what is this framework X and Y like how come like what am I like what are like how do you measure success what are your business outcomes oh no okay let's not focus on that great I'll focus on like these typescript build errors and then you're just like what am I doing and then you kind of sort of feel really frustrated and I I I already recognize that because I've made offers to machine learning Engineers they've joined and they've left in like two months and and the response is like yeah I think I'm GNA join a research lab so I think it's not even that like I don't even think you should be hiring these ML on the other hand what I also see a lot of is the really motivated engineer that's doing more AI engineering is not being allowed to actually like fully pursue the AI engineering so they're the guy who built a demo it got traction now it's working but they're still being pulled back to figure out like why Google Calendar Integrations are not working or like how to make sure that like you know the button is loading on the page and so I I'm sort of like in a very interesting position where the companies want to hire an ml they don't need to hire but they won't let the excited people who've caught the AI engineering bug could go do that work more full-time um this is something I'm literally wrestling with like this week as I just wrote something about it this is one of the things I'm probably going to be recommending in the future is really thinking about like where is the talent coming from how much of it is internal and do you really need to hire someone who's like runting High torch code yeah exactly you most of the time you're not you're going to need someone to write instructor code and you're just like yeah you're making this like and like I feel goofy all the time just like prompting like oh man like I wish I just had a Target data set that I could like train a model against yes and I can just say it's right or wrong yeah so you know uh I guess what l space is what the AI engineer WS fair is is that we're trying to create and Elevate this this industry of AI Engineers where it's legitimate to actually take these motivated software Engineers who want to build more in Ai and do creative things in AI to actually say you have the blessing like and and this is legitimate sub specialty of software engineering yeah yeah I think there's been a mix of that product engineering I think a lot more data science is going to come in versus machine learning engineering because a lot of it now is just quantifying like what does the business actually want as an outcome right the outcome is not rag app yeah the outcome is like reduced CH or something like that people need to figure out what that actually is and how to measure it yeah yeah all the the data engineering tools still apply uh B layers sematic layers whatever yeah cool um we we'll have you back again for the Worlds Fair um we we don't know what we what you're going to talk about uh but I'm sure it's going to be amazing U you're very the title is written it's just the pantic is still all you need I I'm worried about having too many all you need titles because that's obviously very um so so yeah you have one of them but I I need to keep a lid on like you know everyone saying their thing is all you need um but yeah we'll figure it out hentic is not my thing it's someone else I think that's why it works it's true um cool well it's a real pleasure to have you on uh Al everyone everyone should go follow you on on Twitter and uh check out instructor there's also instructor jsing which um which I'm very happy to see and what else use instructure.com yeah anything else to plug use instructure.com we got a domain name now nice nice awesome cool cool thanks J thanks [Music]
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Latent Space · Latent Space · 27 of 60

1 Ep 18: Petaflops to the People — with George Hotz of tinycorp
Ep 18: Petaflops to the People — with George Hotz of tinycorp
Latent Space
2 FlashAttention-2: Making Transformers 800% faster AND exact
FlashAttention-2: Making Transformers 800% faster AND exact
Latent Space
3 RWKV: Reinventing RNNs for the Transformer Era
RWKV: Reinventing RNNs for the Transformer Era
Latent Space
4 Generating your AI Media Empire - with Youssef Rizk of Wondercraft.ai
Generating your AI Media Empire - with Youssef Rizk of Wondercraft.ai
Latent Space
5 RAG is a hack - with Jerry Liu of LlamaIndex
RAG is a hack - with Jerry Liu of LlamaIndex
Latent Space
6 The End of Finetuning — with Jeremy Howard of Fast.ai
The End of Finetuning — with Jeremy Howard of Fast.ai
Latent Space
7 Why AI Agents Don't Work (yet) - with Kanjun Qiu of Imbue
Why AI Agents Don't Work (yet) - with Kanjun Qiu of Imbue
Latent Space
8 Powering your Copilot for Data - with Artem Keydunov from Cube.dev
Powering your Copilot for Data - with Artem Keydunov from Cube.dev
Latent Space
9 Beating GPT-4 with Open Source Models - with Michael Royzen of Phind
Beating GPT-4 with Open Source Models - with Michael Royzen of Phind
Latent Space
10 The State of Silicon and the GPU Poors - with Dylan Patel of SemiAnalysis
The State of Silicon and the GPU Poors - with Dylan Patel of SemiAnalysis
Latent Space
11 The "Normsky" architecture for AI coding agents — with Beyang Liu + Steve Yegge of SourceGraph
The "Normsky" architecture for AI coding agents — with Beyang Liu + Steve Yegge of SourceGraph
Latent Space
12 The AI-First Graphics Editor - with Suhail Doshi of Playground AI
The AI-First Graphics Editor - with Suhail Doshi of Playground AI
Latent Space
13 The Accidental AI Canvas - with Steve Ruiz of tldraw
The Accidental AI Canvas - with Steve Ruiz of tldraw
Latent Space
14 The Origin and Future of RLHF: the secret ingredient for ChatGPT - with Nathan Lambert
The Origin and Future of RLHF: the secret ingredient for ChatGPT - with Nathan Lambert
Latent Space
15 The Four Wars of the AI Stack - Dec 2023 Recap
The Four Wars of the AI Stack - Dec 2023 Recap
Latent Space
16 The State of AI in production — with David Hsu of Retool
The State of AI in production — with David Hsu of Retool
Latent Space
17 Building an open AI company - with Ce and Vipul of Together AI
Building an open AI company - with Ce and Vipul of Together AI
Latent Space
18 Truly Serverless Infra for AI Engineers - with Erik Bernhardsson of Modal
Truly Serverless Infra for AI Engineers - with Erik Bernhardsson of Modal
Latent Space
19 A Brief History of the Open Source AI Hacker - with Ben Firshman of Replicate
A Brief History of the Open Source AI Hacker - with Ben Firshman of Replicate
Latent Space
20 Open Source AI is AI we can Trust — with Soumith Chintala of Meta AI
Open Source AI is AI we can Trust — with Soumith Chintala of Meta AI
Latent Space
21 Making Transformers Sing - with Mikey Shulman of Suno
Making Transformers Sing - with Mikey Shulman of Suno
Latent Space
22 A Comprehensive Overview of Large Language Models - Latent Space Paper Club
A Comprehensive Overview of Large Language Models - Latent Space Paper Club
Latent Space
23 Why Google failed to make GPT-3 -- with David Luan of Adept
Why Google failed to make GPT-3 -- with David Luan of Adept
Latent Space
24 Personal AI Meetup - Bee, BasedHardware, LangChain LangFriend, Deepgram EmilyAI
Personal AI Meetup - Bee, BasedHardware, LangChain LangFriend, Deepgram EmilyAI
Latent Space
25 Supervise the Process of AI Research — with Jungwon Byun and Andreas Stuhlmüller of Elicit
Supervise the Process of AI Research — with Jungwon Byun and Andreas Stuhlmüller of Elicit
Latent Space
26 Breaking down the OG GPT Paper by Alec Radford
Breaking down the OG GPT Paper by Alec Radford
Latent Space
High Agency Pydantic over VC Backed Frameworks — with Jason Liu of Instructor
High Agency Pydantic over VC Backed Frameworks — with Jason Liu of Instructor
Latent Space
28 This World Does Not Exist — Joscha Bach, Karan Malhotra, Rob Haisfield (WorldSim, WebSim, Liquid AI)
This World Does Not Exist — Joscha Bach, Karan Malhotra, Rob Haisfield (WorldSim, WebSim, Liquid AI)
Latent Space
29 LLM Asia Paper Club Survey Round
LLM Asia Paper Club Survey Round
Latent Space
30 How to train a Million Context LLM — with Mark Huang of Gradient.ai
How to train a Million Context LLM — with Mark Huang of Gradient.ai
Latent Space
31 How AI is Eating Finance - with Mike Conover of Brightwave
How AI is Eating Finance - with Mike Conover of Brightwave
Latent Space
32 How To Hire AI Engineers (ft. James Brady and Adam Wiggins of Elicit)
How To Hire AI Engineers (ft. James Brady and Adam Wiggins of Elicit)
Latent Space
33 State of the Art: Training 70B LLMs on 10,000 H100 clusters
State of the Art: Training 70B LLMs on 10,000 H100 clusters
Latent Space
34 The 10,000x Yolo Researcher Metagame — with Yi Tay of Reka
The 10,000x Yolo Researcher Metagame — with Yi Tay of Reka
Latent Space
35 Training Llama 2, 3 & 4: The Path to Open Source AGI — with Thomas Scialom of Meta AI
Training Llama 2, 3 & 4: The Path to Open Source AGI — with Thomas Scialom of Meta AI
Latent Space
36 [LLM Paper Club] Llama 3.1 Paper: The Llama Family of Models
[LLM Paper Club] Llama 3.1 Paper: The Llama Family of Models
Latent Space
37 Synthetic data + tool use for LLM improvements 🦙
Synthetic data + tool use for LLM improvements 🦙
Latent Space
38 RLHF vs SFT to break out of local maxima 📈
RLHF vs SFT to break out of local maxima 📈
Latent Space
39 The Winds of AI Winter (Q2 Four Wars of the AI Stack Recap)
The Winds of AI Winter (Q2 Four Wars of the AI Stack Recap)
Latent Space
40 Segment Anything 2: Memory + Vision = Object Permanence — with Nikhila Ravi and Joseph Nelson
Segment Anything 2: Memory + Vision = Object Permanence — with Nikhila Ravi and Joseph Nelson
Latent Space
41 Answer.ai & AI Magic with Jeremy Howard
Answer.ai & AI Magic with Jeremy Howard
Latent Space
42 Is finetuning GPT4o worth it?
Is finetuning GPT4o worth it?
Latent Space
43 Personal benchmarks vs HumanEval - with Nicholas Carlini of DeepMind
Personal benchmarks vs HumanEval - with Nicholas Carlini of DeepMind
Latent Space
44 Building AGI with OpenAI's Structured Outputs API
Building AGI with OpenAI's Structured Outputs API
Latent Space
45 Q* for model distillation 🍓
Q* for model distillation 🍓
Latent Space
46 Finetuning LoRAs on BILLIONS of tokens 🤖
Finetuning LoRAs on BILLIONS of tokens 🤖
Latent Space
47 Cursor UX team is CRACKED 💻
Cursor UX team is CRACKED 💻
Latent Space
48 Choosing the BEST OpenAI model 🏆
Choosing the BEST OpenAI model 🏆
Latent Space
49 How will OpenAI voice mode change API design?
How will OpenAI voice mode change API design?
Latent Space
50 STEALING OpenAI models data 🥷
STEALING OpenAI models data 🥷
Latent Space
51 [Paper Club] 🍓 On Reasoning: Q-STaR and Friends!
[Paper Club] 🍓 On Reasoning: Q-STaR and Friends!
Latent Space
52 [Paper Club] Writing in the Margins: Chunked Prefill KV Caching for Long Context Retrieval
[Paper Club] Writing in the Margins: Chunked Prefill KV Caching for Long Context Retrieval
Latent Space
53 The Ultimate Guide to Prompting - with Sander Schulhoff from LearnPrompting.org
The Ultimate Guide to Prompting - with Sander Schulhoff from LearnPrompting.org
Latent Space
54 llm.c's Origin and the Future of LLM Compilers - Andrej Karpathy at CUDA MODE
llm.c's Origin and the Future of LLM Compilers - Andrej Karpathy at CUDA MODE
Latent Space
55 Prompt Engineer is NOT a job 📝
Prompt Engineer is NOT a job 📝
Latent Space
56 Prompt Mining LLMs for better prompts ⛏️
Prompt Mining LLMs for better prompts ⛏️
Latent Space
57 The six pillars of few-shot prompting 🔧
The six pillars of few-shot prompting 🔧
Latent Space
58 Language Agents: From Reasoning to Acting — with Shunyu Yao of OpenAI, Harrison Chase of LangGraph
Language Agents: From Reasoning to Acting — with Shunyu Yao of OpenAI, Harrison Chase of LangGraph
Latent Space
59 [Paper Club] Who Validates the Validators? Aligning LLM-Judges with Humans (w/ Eugene Yan)
[Paper Club] Who Validates the Validators? Aligning LLM-Judges with Humans (w/ Eugene Yan)
Latent Space
60 Can you separate intelligence and knowledge?
Can you separate intelligence and knowledge?
Latent Space

The video teaches the use of High Agency Pydantic for structure outputs and prompt engineering, and provides insights on the benefits and applications of Pydantic in building tooling and creating structured data outputs. It also discusses the importance of retrieval augmented generation and fine-tuning in language models.

Key Takeaways
  1. Define schema separate from data and instructions using Pydantic
  2. Use function calling for extraction workflows
  3. Implement vector database for tool ranking and retrieval
  4. Use Anthropic models for cost-effective and better performance
  5. Apply retrieval augmented generation for improved language model performance
💡 High Agency Pydantic provides a flexible and efficient way to build structured data outputs and implement prompt engineering, and can be used in conjunction with retrieval augmented generation and fine-tuning for improved language model performance.

Related AI Lessons

Chapters (15)

Introductions
2:50 Early experiments with Generative AI at StitchFix
9:39 Design philosophy behind the Instructor library
13:17 JSON Mode vs Function Calling
14:43 Single vs parallel function calling
16:28 How many functions is too many?
20:40 How to evaluate function calling
24:01 What is Instructor good for?
26:41 The Evolution from Looping to Workflow in AI Engineering
31:58 State of the AI Engineering Stack
33:40 Why Instructor isn't VC backed
37:08 Advice on Pursuing Open Source Projects and Consulting
42:59 The Concept of High Agency and Its Importance
51:06 Prompts as Code and the Structure of AI Inputs and Outputs
53:06 The Emergence of AI Engineering as a Distinct Field
Up next
I Built an AI Agent in 6 Minutes (No Code, No Developer)
HubSpot Marketing
Watch →