Lessons From A Year Building With LLMs

AI Engineer · Beginner ·🧠 Large Language Models ·1y ago

Skills: LLM Foundations90%LLM Engineering80%Fine-tuning LLMs80%Prompt Craft70%

Key Takeaways

The video discusses lessons learned from a year of building with Large Language Models (LLMs), covering strategic, operational, and tactical considerations for LLM applications, with a focus on continuous improvement, evaluation, and data literacy. Tools and techniques such as GPT-4, Sona 3.5, MLOps, DevOps, and Lean Startup are highlighted.

Full Transcript

[Music] you're about to experience something of a strange talk and not just because Brian and I are strange but because something kind of strange happened over the last year A bunch of us were posting things on Twitter we were uh writing blog post complaining about llms and we formed a little group chat and we were you know continuing to complain about llms to each other uh and sharing what we were working on when we realized we were all about to write the exact same blog post what we learned in the last year so we we got together and we uh we turned what was initially a couple of short blog posts into a long white paper on O'Reilly uh combining our lessons across strategic operational and tactical levels of building llm applications and the response to that white paper was overwhelmingly positive we got uh we heard from everybody from people who contribute to postgress to venture capitalists to Tool Builders saying we loved what you wrote in that article um I like I felt that pain too and we were invited on the strength of that to give this keynote address and so so we faced a kind of funny challenge which is part of the appeal of this blog post uh of this article was that the six of us all came together to write it as uh Scott condrin put it it was like the an Avengers teamup uh so we had to figure out a way to deliver uh one keynote talk from six people uh so we we pulled the Avengers together for uh one night only uh to sort of EX like deliver some of the most important sites from that 30 page article uh to add some of our spicy extra takes that ended up on The Cutting Room floor and to respond to the allegations I'd like to State unequivocally that we are not in fact Crypt Bros who just found out that gp4 was the new web 3 um uh we all trained our first neural networks back when you had to write the gradients by hand so uh we split the article up to three pieces we split the talk into three pieces first you're going to hear from me and Brian talking about the Strategic considerations for building llm applications how do you look to the Future how do you see around corners how do you make big decisions then we're going to hand the clickers and the stage over to H Hussein and Jason Lou Who are going to share the operational considerations how do you put together processes how do you put together teams how do you think about workflows around delivering llm applications and then they will hand over the clickers in the stage to Shrea Shankar and Eugene Yan who will talk about the Tactical considerations for building LM applications what are the specific techniques tactics and moves that have stood the test of one year's time for building llm applications all right so Brian how do you build an LM application without getting outmaneuvered and wasting everybody time and money ah yes yes well many of you may be thinking that there's really only one way to win in this new exciting Dynamic and very scary industry and that of course is to train your own custom model pre-training fine-tuning a little rhf here and there you better start from scratch buddy eh not quite the model is actually not your moat for almost no one in this audience the model is the moat you all as AI engineering devotees should be building in your zone of Genius you should be leveraging your product expertise or your existing product maybe you've got one and you should be finding your Niche and digging into that Niche exploiting it you should be building what the model providers are not there's a high likelihood that the model providers have to build a lot of things for all of their customers don't waste your calories on building these things the Sam mman phrase of steamrolling is appropriate here and you should be treating the models like any other SAS product you should be quickly dropping them when there's a competitor that's clearly better no offense to GPT 40 but Sona 35 looking looking pretty Sharp it's important to keep in mind that a model with high MML U scores that's not a product 87% on spider SQL that doesn't automate all data requests or even 87% of them you can't sell human eval pass at 67 at least my GTM team doesn't know how an excellent llm powered application is an excellent product it's well-designed it solves a job to be done and it enhances your user why are we so excited about AI human enhancement so what should you build if not all of these things things that generalize to smarter and faster models things that help you maintain your products Quality Bar under uncertainty and things that help you continuously improve whoa Brian continuous Improvement that's uh that's my trigger phrase the idea of continuous Improvement has been brought to the world of llm applications by like this shif in Focus that we've all felt since the previous AI engineer Summit to focus on evaluation and data it's nicely synic dozed by this diagram from our co-author ham Hussein showing this virtuous cycle of improvement it has evals and data at the center but the core reason to create those evals the core reason to collect that data is to drive forward this Loop of continuous Improvement and despite what your expensive Consultants or U your the many of the uh LinkedIn fluenc posting about llm apps might say this is not actually the first time that Engineers have tried to tame a complex system and make it useful and valuable this same Loop of iterative improvement was also at the core of mlops at the operationalization of machine learning models before llms this figure from our co-author sh on car's paper uh had that same Loop of iterative improvement centered also on a value valuation and on data collection mlops was also not the first time that Engineers faced this problem the problem of complexity the problem of non-determinism and uncertainty this the devops movement that gave mlops its name also focused on this kind of iterative improvement and on monitoring uh information in production to turn into improvements to products but dear reader devops was not the first time that Engineers tackled this problem of uncertainty and solved it with iterative Improvement devops built on the ideas of The Lean Startup movement uh from Eric Reese that that was focusing not just on building an application not just on building an a machine learning model or an llm agent but on building the entire business and it used this same Loop centered on measurement and data uh to to drive the Improvement of and building of a business this idea itself was not invented in Northern California despite what uh some people might say it has its roots in the Toyota production system and in the idea of Kaizen or continuous Improvement geni gbsu is one of the core principles from that movement that we can take forward into the development of llm application it means real things real places and at Toyota that me sending Executives out to factory floors getting their khakis a bit dirty for LM applications the equivalent is looking at your data looking the that data is the real information about how your llm application is delivering value to users there's nothing that is more valuable than that finally it's there's lots of people selling tools at this conference including myself it's easy to get overly excited about the tools and the construction of this iterative Loop of improvement and to forget where value actually comes from and there's a great hity the earthy statement from the Toyota production system from chigo shingo that I really like value is only created when metal gets vent so we have to make sure that we don't get lost just building our evals and calculating concept drift and we instead make sure that we continue to get out there and bend metal and value for our users not going to lie I might have misunderstood earlier when you said let's get bent okay so right off the bat we need to spin that data flywheel Bob oh wait sorry wrong wrong game show point is we need to get this moving we need to get this in front of users and human beings we need to express the goals for our system and how do we do that with evals remember EV vals are not convenient weird bespoke uh metrics evals are objectives they're what we want our system to do any system for capturing this behavior is good enough I don't have an eval framework to sell you but what I do have to sell you is this idea that you should be getting out there you should be getting started but wait Brian I'm really nervous what if this isn't good enough for my customers fear is the mind killer put it out there in beta if it's good enough for the these incredible companies like apple intelligence Photoshop and hex that's me it's good enough for you you need to collect this data you need to put something in the wild you need to start looking at your user interactions the real user interactions llms responses deserve human eyes you can give it some AI eyes too but definitely look at it with your human eyes binary human feedback is valuable it's nice to add Rich feedback too that can be interesting but start with binaries and finally user requests will reveal the pmf opportunities that lie below your product substrate where is your pmf everybody wants to know it's in your user interactions what are they asking your chatbot that you haven't yet implemented that's a really nice direction to skate if that's where the Puck's going and despite the focus on the user interactions that you can have today the things that you can ship right now it's important to also think about the future the best way to predict the future is to look at the past find people predicting the present and copy what they did in designing the many of the components of the personal Computing Revolution Alan Kay and others at Park adopted as a core technique projecting Moore's Law out into the future they built expensive unmarketable slow and buggy systems themselves so they could experience what it was like and build for that future and and create it we don't have quite the industrial scaling uh information that uh that Moore had when he wrote down his predictions but we do have the beginnings of those same laws there's been an order of magnitude decrease every 12 to 18 months at three distinct levels of capability at the capability of Da Vinci the original gpt3 API that brought that excited a lot of us about the idea of building on Foundation models the capabilities of Tex D Vinci 2 the model lineage underlying chat gbt that brought the rest of the world to excitement about this technology and the latest and greatest level of capabilities with gp4 and Sonet in each case around 15 months is enough time to drop the cost by an entire order of magnitude this is faster than Mo's law and so the appropriate way to plan for the future is to think what this implies for what applications that are not economical today will be economical at the time that you need to raise your next round uh so in 2023 it cost about $625 an hour to run a video game where all the NPCs were powered by a chat bot that's pretty expensive in 1980 it cost about $6 an hour to play Pac-Man inflation adjusted that suggests that if we just wait for two orders of magnitude reduction or about 30 months from mid 2023 it should be possible to deliver a compelling video game experience with chat chatbot NPCs at about $6 an hour and people will probably pay for it so you can't sell it now but you could live it and you can design it and you can be ready when the time comes so that's how to think about the future and how to think strategically when building LM applications I'd like to call to the stage my co-authors Jason Lou and hamama Hussein to talk about the operational aspects let's give them a [Applause] hand all right thank you all right so how I have basically been doing a lot of AI Consulting in the past year right we've worked with about 20 companies so far and you know we've done something from precede all the way to public companies and I'm pretty bored of giving generic good advice especially because there's such a range of operators here and so instead I'm going to invert my goal today is to tell you how to ruin your business first of all everyone knows that in the gold rush you sell shovels and so if you want to get gold you got to buy shovels too right you know if you want to find more gold keep buying shovels where do I dig keep buying shovels how do I know when to stop digging the shovel will tell you and how do I dig one deep hole versus making investments in a plenty of shallow holes again the answer is more shovels clearly right and this might be generic so I'll give you some more specific advice if your rag app doesn't work try a vector database a different Vector database if the methodology doesn't work Implement a new paper and maybe if you update the embedding model you'll finally find product Market F because truth be told success does not lie in developing expertise or processes try more tools there's no need to balance between exploring and exploiting the mechanisms that work for you change the tools and the processes and the decision-making Frameworks don't matter the right tool will solve everything number two find a machine learning engineer who can fine-tune as quickly as possible a 20 000 per month open AI bill is very expensive and instead hire someone for a quarter of a million dollars give them 1% of their company to fight Cuda build errors and figure out server colde starts right because what's the point of growing your company if you're just a rapper and if your margins are too low try fine-tuning it's much easier than figuring out how to build something worth charging for it's really I can I cannot reiterate this enough it's very important to hire a machine learning engineer at quickly as possible right even if you have no data generating products they love fixing versel typescript build errors and generally if you hire a full stack engineer who's really caught the llm bug they they're going to lack real experience and this is because python is a dead language right machine learning Engineers research Engineers can easily pick up typescript and the ecosystem that exists in Python could be quickly reimplemented in a couple weekends right the people who wrote python code for the past 10 years doing data analysis they're going to easily be able to transition their tools and if anything it's really easy to teach things like product sense and data literacy to the JavaScript community and most important of all in order to find this kind of magic Talent we need to create a very catchall job title let's use words like ninja and Wizard or data scientist or prompt engineer or even the AI engine year in the past 10 years we've known that this works really well right every time we know exactly who we want as long as we catch a very wide net of skills it doesn't really matter whether or not we don't know what outcomes we're looking for anyways to dig me out of this hole I'll uh have HL explain and uh you know take a deep breath think out loud step by step thank you Jason [Applause] so that was really good I mean let's just step back from the cliff a little bit and let's kind of Linger on the topic of AI engineer had heard some booing in the audience um and so I love the term a engineer like much props to swix for kind of popularizing this term allows us all to get together and have conversations like this but I think that there's a misunderstanding of the skills of AI engineer what what skills you need to be successful and there's a lot of inflated expectations as a founder or engineering leader your talent is the most important lever that you have and so what I'm going to do is I'm going to talk about some of the problems and perhaps some solutions when it comes to this Talent a misunderstanding so just a review what is an AI engineer so this is a diagram that everyone has probably seen uh there's a spectrum of skills in the AI space and there's this API dividing line in the middle and kind of to the right of the API dividing line we have ai engineer a engineer skills are focused on things like chains agents tooling and infra and auspiciously missing from the AI engineer are tools like evals and data and I think a lot of people have taken this diagram too literally and taken it to heart and say hey we don't really need to know about eals for example the problem is is that you can go from 0 to one really fast in fact you can go to 01 faster than ever before with all the great tools out there just by using Vibe checks and implementing the tools that we talked about however without evals you can't make progress quickly lead to stagnation because if you can't measure what you're doing you can't make your system better and you can't go beyond 0o to one so what can we do about this about this evals skill set and data literacy so Jason and I have found that you can actually get really good at writing evals and data literacy with just four to six weeks of deliberate practice in fact like very effective and we think that these skills evals and data should be brought more into the core of AI engineer and it really it like helps solve this problem and it's something that we see over and over over again so the next thing I want to talk about is the AI engineer job title itself and so vague job titles can be problematic what we see over and over again in our Consulting is that this kind of catchall role have very infl inflated expectations um this anytime anything goes wrong with the AI people look towards that role to fix it and sometimes that role doesn't have all the skills they need to move forward and we've seen this before with the role of data scientists titles and names really matter um and what I want to emphasize is I think AI engineer is very aspirational and you should keep learning and it's a good thing to strive towards but you need to have reasonable expectations and just to kind of bring it back to data science we've seen this before in data science as well where we had kind of a decade ago when this role was coined it was a unicorn that had all these skills software engineering skills statistics math domain expertise we found out as industry that we had to unroll that into many other different roles such as decision scientists machine learning engineer data engineer so on and so forth and I think similar things may be happening with the role of AI engineer and it's good to keep that in mind and what I see or what we both see in Consulting is that it's helpful to be more specific to be more deliberate about what skills you need and at what time and depending on your maturity it's very helpful to not only specify what the skills are but what kinds of products you'll be working on so these are some job titles from GitHub co-pilot um that kind of are very specific about the skills you need at that time and really it's important to hire the right Talent at the right time on the maturity curve so when you're first starting out you only need application development thought sare engineering and or AI engineering to go from zero to one then you need platform and data engineering to capture that data and then only after that you should hire a machine learning engineer do not hire a machine learning engineer without having any data but again you can get a lot more mileage out of your AI engineer with deliberate practice on evals and data we usually find four to six weeks practice does the job so in recap one of the biggest failure modes is Talent we think that a engineer is often overs scoped but underspecified but we can fix that by learning evals next I want to give it over to Shrea Shankar and Eugene Yan to talk about to dive into this evals and data [Applause] literacy thanks question thank you Jason thank you haml next up sh and I go share with you about the Tactical aspects of building with LMS in production specifically evals monitoring and guard rails so here's he news quote how important evals are to the team is a differentiator between team shipping out hot garbage and those building real products I would agree I think here's an example of lm's uh of Apple's recent LM where they shared about how they actually collected 750 summaries of push notification and email SU summarizations because these are data sets they are representative of their actual use case so how do we build evals for our own products well I think the same thing the simple thing is to just make it simpler for example if you're trying to extract product attributes from a product description break it down into title price rating and then you can just simple do simply do assertions s similarly for summarization instead of trying to eval that amorphus blob of a summary break it down into Dimensions such as factual inconsist y relevance and informational density and once you've done that assertion based test can go a long way are we extracting the correct price are we extracting the correct title or if you're doing natural language to SQL generation is it using the expected table is it using the expected columns these are very simple to eval and reiterates what haml has mentioned about keeping it simple lastly assertions can do everything but they can only go so far so therefore consider evaluate the model models maybe training a classifier for factual inconsistency or reward model for relevance this is easier if your evals are classification and regression based but that said I don't know how I feel about LM as a judge what do you mean you don't like LM as a judge I I personally am super bullish on llm as a judge and I'm curious how many of you are exploring LM as judge or have implemented it no yeah there's a judge right here you want to stand up no actual Jud LM judge here yeah anyways we're going to go through some points on what to consider when deploying llms Judge first of all there it's no brainer llms judge is the most easy to prototype you just have to write a prompt to check for the criteria or metric that you want and you can even align this towards your own preferences by providing few shot examples of good and bad for that criteria on the other hand find two models or llms that you know you have to collect a lot of data and set up a pipeline to train as your evaluator are not super easy to prototype and have a lot of upfront investment yeah but that said LM as a judge is pretty difficult to align it to your specific criteria in the business who here has not had who here has not had any difficulty aligning the LM as a judge to your criteria anyone okay we got to talk later sha um I think that if you just have a few hundred to a few thousand samples it's very easy to F tune a simple model who can do it more precisely second if you want to do LM as a judge and have it fairly precise you sort of need to use Chain of Thought and Chain of Thought is going to be I know 5 to 8 seconds long on the other hand if you have a simple classifier or reward model every request is maybe 10 milliseconds long that's two orders of magnitude lower and would improve trut next we want to think about technical Deb okay when we're implementing our validators in production even if they run asynchronously or they run in the critical path how much effort do we need to put in to keep these up to date with llm as judge if you don't make sure your few shot examples are dynamic or some way of making sure your judge kind of prompt aligns with your definition of good and bad then your toast and kind of the effect is not as pronounced for fine-tune models but if you don't continually fine tune Junior validators on new data on new production data then they will also be susceptible to drift so overall when do you want to use llm as judge it's honestly a resources question and where you are in your application development if you're starting to prototype it um you need quick evals with minimal Dev effort and need something you have a lowish volume of evals start with llm as a judge and kind of invest in the infrastructure to align that over time if you have more resources or you know that you're product is going to be sticky go for a fine tune model next I'm going to talk about looking at the data Eugene mentioned you know you should create evals on your custom or bespoke criteria but how do you know what criteria you want simple answer look at your data great AI researchers but we Chang that to Engineers great AI Engineers look at their data so how do we do this the first question actually before how is when do you look at this I know people who never look at their data at all or people who look at it initially after deployment wrong answer you want to look at it regularly I work with a startup that you know whenever they ship a new llm agent they create a new slack Channel with all of the agents outputs that come in real time after a couple of weeks they transition this to kind of daily batch jobs um and make sure that you know they're not running into errors that they didn't anticipate second thing is what specifically are you looking for you want to find slices of the data that are pretty simple or easy to characterize in some way for example data that comes from a particular Source or data that has a certain keyword or phrase or is about a certain topic right simply just saying all of these are bad but having no way of characterizing them and then improving your pipeline based on that it's not going to help finally some things to keep in mind throughout this whole kind of looking at your data experience is that your codebase is very rapidly changing over time probably your prompts components of the pipeline and Etc so when you're inspecting traces it's super helpful to be able to know you know what GitHub commit or what model version or prompt version that this correspond to I think this is one of the very successful things that traditional mlops tools did like ml flow for example they made it very easy to trace back and then hopefully you could replay something well I I see the judge shaking his head but great um and finally when using llms as apis pin model version um llm apis are known to you know exhibit different behavior that is very hard to quantify for certain tasks so pin you know GPT 4 1106 pin GPT 40 whatever it is that you're using so shya mentioned that we need to look at our data but how do we look at our data all the time I think the way to do this is VI an automated guard real here's brandolini's law adapted the amount of energy to catch and fix defects is an order of magnitude larger than needed to produce it and that's true it's really easy to call llm Api and just get something but how do we know if it's actually bad I think it's really important that we do have some basic form of guard raos and some of them are just table sticks toxicity personally identified information copyright and expected language now you may imagine that this is pretty straightforward but sometimes you don't actually have control over the context for example if someone's posting an ad on your English website that's in a different language and you're asking your LM to extract the attributes or to summarize it you may be surprised that for some nonzero proportion of the time it actually in a different language similarly hallucinations happen more often that we would like um so imagine you're trying to summarize a movie based on the description you just have a description for the trailer it may actually include spoilers because it's trying so hard to be helpful but that's actually a bad user experience so sometimes you will include information that's not in that here's a tip if we spend a little bit more time building reference free vals we can use them as guard rails so reference-based evals are when we generate some kind of output and we compare it to some ideal sample this is pretty expensive and you actually have to collect all these goal samples on the other hand if we have these labels we can train an evaluator model and just compare it to the source document so for example if we comparing summarizations we can just check if the summary entails or contradicts The Source document and now we have a summarization I mean hallucination eval so therefore if we spend some time building reference free evals once we can use it to guard real all new output cool thanks Eugene so we're going to wrap up the next minute or so on some high level Bird's eyee view 2000 foot view whatever you want to call it takeaways first off how many of you remember this figure from this pretty seminal paper in mlops that came out maybe 10 years ago 2015 so 9 years ago yeah so I think this paper really communicated the idea that the model is a small part and when you're productionizing ml systems right there's so much more around the model that you have to maintain over time data verification U feature engineering monitoring your infrastructure Etc so you might be wondering you know we have LMS does any of this matter yeah I'm seeing few nods here absolutely um when we have llms it all of these you know Tech debt principles still apply and you can even think of the exact mapping for every single component in here to the llm equivalent for example maybe we don't have feature engineering pipelines but you know cast in New Light they it's rack right we're looking at context we're trying to retrieve what's relevant engineer that to you know not distract the llm too much we have a ton of experimentation around that all of this is something that needs to be maintained over time especially as models change under the hood similarly for data validation and verification right we have evals we have guard rails that need to be deployed right it's not just simply wrap your uh model or GPT um in some software and ship it no there's like a lot of investment that needs to happen around the model all right so I'd like to end with this quote from kapati Senai there's a large class of problems they are really easy to imagine and build demos for but it's extremely hard to mil products out of for example Charles dug up this paper of the first car driven by neuron Network that was 1988 25 years later Andre kapati took his first demo Drive of whmo 2013 10 years later I hope all of you had a chance to try the weo we got the first driess we got the driess permit for weo in San Francisco maybe in a couple more years we'll have it for the whole of California the point is going from demo to production takes time that's all we had thank you let's build [Music]

Original Description

Special double-feature closing keynote from the 6 authors of the hit O'Reilly article on Applied LLMs. Recorded live in San Francisco at the AI Engineer World's Fair. See the full schedule of talks at https://www.ai.engineer/worldsfair/2024/schedule & join us at the AI Engineer World's Fair in 2025! Get your tickets today at https://ai.engineer/2025 About Eugene Yan I build ML systems to serve customers at scale, and write to learn and teach. About Shreya Shankar I'm Shreya Shankar. I am a machine learning (ML) engineer and computer scientist in the Bay Area. I am completing my PhD in data management systems for ML, with a human-centered focus. I am fortunate to be advised by Dr. Aditya Parameswaran at UC Berkeley. Go Bears! 🐻 I also consult on ML engineering and production AI strategy for enterprises. Prior to my PhD, I was the first ML engineer at a startup, did research engineering at Google Brain, and engineering at Facebook. Before all of that, I did my BS and MS in computer science at Stanford. Go Trees! 🌲 About Hamel Husain Hamel Husain started working with language models five years ago when he led the team that created CodeSearchNet, a precursor to GitHub CoPilot. Since then, he has seen many successful and unsuccessful approaches to building LLM products. Hamel is also an active open source maintainer and contributor of a wide range of ML/AI projects. Hamel is currently an independent consultant. About Jason Liu Jason is an independent AI consultant, advisor, writer, and educator. His main interests are structured outputs, search and retrieval for RAG as well as understanding how to leverage AI to build scalable and valuable businesses. About Bryan Bischof Bryan Bischof is the Head of AI at Hex, where he leads the team of engineers building Magic—the data science and analytics copilot. Bryan has worked all over the data stack leading teams in analytics, machine learning engineering, data platform engineering, and AI engineering. He started the da

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from AI Engineer · AI Engineer · 43 of 60

← Previous Next →

AI Engineer Summit 2023 — DAY 1 Livestream

AI Engineer Summit 2023 — DAY 1 Livestream

AI Engineer Summit 2023 — DAY 2 Livestream

AI Engineer Summit 2023 — DAY 2 Livestream

Principles for Prompt Engineering - Karina Nguyen (Claude Instant @ Anthropic)

Principles for Prompt Engineering - Karina Nguyen (Claude Instant @ Anthropic)

Announcing the AI Engineer Network: Benjamin Dunphy

Announcing the AI Engineer Network: Benjamin Dunphy

The 1,000x AI Engineer: Swyx

The 1,000x AI Engineer: Swyx

Building AI For All: Amjad Masad & Michele Catasta

Building AI For All: Amjad Masad & Michele Catasta

The Age of the Agent: Flo Crivello

The Age of the Agent: Flo Crivello

See, Hear, Speak, Draw: Logan Kilpatrick & Simón Fishman

See, Hear, Speak, Draw: Logan Kilpatrick & Simón Fishman

Building Context-Aware Reasoning Applications with LangChain and LangSmith: Harrison Chase

Building Context-Aware Reasoning Applications with LangChain and LangSmith: Harrison Chase

Pydantic is all you need: Jason Liu

Pydantic is all you need: Jason Liu

Building Blocks for LLM Systems & Products: Eugene Yan

Building Blocks for LLM Systems & Products: Eugene Yan

The Intelligent Interface: Sam Whitmore & Jason Yuan of New Computer

The Intelligent Interface: Sam Whitmore & Jason Yuan of New Computer

Climbing the Ladder of Abstraction: Amelia Wattenberger

Climbing the Ladder of Abstraction: Amelia Wattenberger

Supabase Vector: The Postgres Vector database: Paul Copplestone

Supabase Vector: The Postgres Vector database: Paul Copplestone

[Workshop] AI Engineering 101

[Workshop] AI Engineering 101

The Hidden Life of Embeddings: Linus Lee

The Hidden Life of Embeddings: Linus Lee

[Workshop] AI Engineering 201: Inference

[Workshop] AI Engineering 201: Inference

The AI Pivot: With Chris White of Prefect & Bryan Bischof of Hex

The AI Pivot: With Chris White of Prefect & Bryan Bischof of Hex

The AI Evolution: Mario Rodriguez, GitHub

The AI Evolution: Mario Rodriguez, GitHub

Move Fast Break Nothing: Dedy Kredo

Move Fast Break Nothing: Dedy Kredo

AI Engineering 201: The Rest of the Owl

AI Engineering 201: The Rest of the Owl

Building Reactive AI Apps: Matt Welsh

Building Reactive AI Apps: Matt Welsh

Pragmatic AI with TypeChat: Daniel Rosenwasser

Pragmatic AI with TypeChat: Daniel Rosenwasser

Domain adaptation and fine-tuning for domain-specific LLMs: Abi Aryan

Domain adaptation and fine-tuning for domain-specific LLMs: Abi Aryan

Retrieval Augmented Generation in the Wild: Anton Troynikov

Retrieval Augmented Generation in the Wild: Anton Troynikov

Building Production-Ready RAG Applications: Jerry Liu

Building Production-Ready RAG Applications: Jerry Liu

120k players in a week: Lessons from the first viral CLIP app: Joseph Nelson

120k players in a week: Lessons from the first viral CLIP app: Joseph Nelson

The Weekend AI Engineer: Hassan El Mghari

The Weekend AI Engineer: Hassan El Mghari

Harnessing the Power of LLMs Locally: Mithun Hunsur

Harnessing the Power of LLMs Locally: Mithun Hunsur

Trust, but Verify: Shreya Rajpal

Trust, but Verify: Shreya Rajpal

Open Questions for AI Engineering: Simon Willison

Open Questions for AI Engineering: Simon Willison

Storyteller: Building Multi-modal Apps with TS & ModelFusion - Lars Grammel, PhD

Storyteller: Building Multi-modal Apps with TS & ModelFusion - Lars Grammel, PhD

GPT Web App Generator - 10,000 apps created in a month: Matija Sosic

GPT Web App Generator - 10,000 apps created in a month: Matija Sosic

Using AI to Build an Infinite Game: Jeff Schomay

Using AI to Build an Infinite Game: Jeff Schomay

How to Become an AI Engineer from a Fullstack Background - Reid Mayo

How to Become an AI Engineer from a Fullstack Background - Reid Mayo

The Code AI Maturity Model and What It Means For You: Ado Kukic

The Code AI Maturity Model and What It Means For You: Ado Kukic

AI Engineer World’s Fair 2024 - Keynotes & Multimodality track

AI Engineer World’s Fair 2024 - Keynotes & Multimodality track

From Text to Vision to Voice Exploring Multimodality with Open AI: Romain Huet

From Text to Vision to Voice Exploring Multimodality with Open AI: Romain Huet

The Making of Devin by Cognition AI: Scott Wu

The Making of Devin by Cognition AI: Scott Wu

The Future of Knowledge Assistants: Jerry Liu

The Future of Knowledge Assistants: Jerry Liu

Llamafile: bringing AI to the masses with fast CPU inference: Stephen Hood and Justine Tunney

Llamafile: bringing AI to the masses with fast CPU inference: Stephen Hood and Justine Tunney

Open Challenges for AI Engineering: Simon Willison

Open Challenges for AI Engineering: Simon Willison

Lessons From A Year Building With LLMs

Lessons From A Year Building With LLMs

From Software Developer to AI Engineer: Antje Barth

From Software Developer to AI Engineer: Antje Barth

Unlocking Developer Productivity across CPU and GPU with MAX: Chris Lattner

Unlocking Developer Productivity across CPU and GPU with MAX: Chris Lattner

Copilots Everywhere: Thomas Dohmke and Eugene Yan

Copilots Everywhere: Thomas Dohmke and Eugene Yan

Fixing bugs in Gemma, Llama, & Phi 3: Daniel Han

Fixing bugs in Gemma, Llama, & Phi 3: Daniel Han

Low Level Technicals of LLMs: Daniel Han

Low Level Technicals of LLMs: Daniel Han

Emergence Launch: AI Agents and the future enterprise: Dr. Satya Nitta

Emergence Launch: AI Agents and the future enterprise: Dr. Satya Nitta

How Codeium Breaks Through the Ceiling for Retrieval: Kevin Hou

How Codeium Breaks Through the Ceiling for Retrieval: Kevin Hou

What's new from Anthropic and what's next: Alex Albert

What's new from Anthropic and what's next: Alex Albert

Using agents to build an agent company: Joao Moura

Using agents to build an agent company: Joao Moura

Decoding the Decoder LLM without de code: Ishan Anand

Decoding the Decoder LLM without de code: Ishan Anand

Running AI Application in Minutes w/ AI Templates: Gabriela de Queiroz, Pamela Fox, Harald Kirschner

Running AI Application in Minutes w/ AI Templates: Gabriela de Queiroz, Pamela Fox, Harald Kirschner

Building with Anthropic Claude: Prompt Workshop with Zack Witten

Building with Anthropic Claude: Prompt Workshop with Zack Witten

Building Reliable Agentic Systems: Eno Reyes

Building Reliable Agentic Systems: Eno Reyes

10x Development: LLMs For the working Programmer - Manuel Odendahl

10x Development: LLMs For the working Programmer - Manuel Odendahl

Disrupting the $15 Trillion Construction Industry with Autonomous Agents: Dr. Sarah Buchner

Disrupting the $15 Trillion Construction Industry with Autonomous Agents: Dr. Sarah Buchner

Hypermode Launch: Kevin Van Gundy

Hypermode Launch: Kevin Van Gundy

Git push get an AI API: Ryan Fox-Tyler

Git push get an AI API: Ryan Fox-Tyler

This video provides lessons learned from a year of building with LLMs, covering key considerations for LLM applications, including continuous improvement, evaluation, and data literacy. The video highlights the importance of strategic, operational, and tactical considerations for building successful LLM-powered applications. By following the lessons outlined in this video, viewers can improve their skills in building and deploying LLMs.

Key Takeaways

Try a vector database
Implement a new paper
Update the embedding model
Hire a machine learning engineer
Fine-tune LLM models
Continuously evaluate and improve LLM model performance
Deploy LLMs in production environments

💡 Continuous improvement, evaluation, and data literacy are crucial for building successful LLM-powered applications

🔒 Pro feature: Ask AI to explain this lesson →

More on: LLM Foundations

View skill →

Getting Started with Vertex AI Gemini 1.5 Flash

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

How to use the ChatGPT API with Python!!

How to use the ChatGPT API with Python!!

Nicholas Renotte

Gemini 2.5: Create an interactive plot of economic data

Gemini 2.5: Create an interactive plot of economic data

Google DeepMind

LangChain Chatbots: Building a Personalized AI Assistant

LangChain Chatbots: Building a Personalized AI Assistant

Analytics Vidhya

Auto-generating meeting notes with Python

Auto-generating meeting notes with Python

Related AI Lessons

Sub-10ms AI Workflows: Accelerating sim.ai with On-Device Semantic Search using Moss

Learn how to accelerate AI workflows with on-device semantic search using Moss, achieving sub-10ms response times and improving user experience

Medium · Machine Learning

Stop Guessing: Guaranteed Structured Output from LLMs in Node.js

Learn to guarantee structured output from LLMs in Node.js and stop parsing JSON manually

Dev.to · Hardik Mehta

Spring AI Tutorial — Your First REST Endpoint with OpenAI (2026)

Build a REST endpoint with Spring Boot 3 and OpenAI to create an LLM-powered API, leveraging the power of AI in your applications

Notes: Memory, Context, and Large Language Models (LLMs)

Learn how memory and context work in Large Language Models (LLMs) and potential improvements

Dev.to · Vladimir Panov

5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems

Dave Ebbelaar (LLM Eng)