Context Engineering for Engineers

YC Root Access · Beginner ·🧠 Large Language Models ·10mo ago

Key Takeaways

This video teaches context engineering techniques for large language models

Full Transcript

So tonight um I want to share some thoughts about context engineering for engineers this audience. Um by way of context my name is Jeff and I'm the founder of Chroma. Chroma for those of you who don't know u builds a search and retrieval database. Uh numerous illustrious speakers have shouted out Chroma. So thank you very much. But let's get into some meat here. Um, you know, one way that we think about what really is happening inside of an AI system is that it is ultimately just a program. You have your instruction set, the relevant information and tools. You have your user input, that's the part that changes, and then you put it into this magic box, and an output comes out the other end. And yeah, this is just very much a program. And though people may want to sell this to you as uh a techno machine god, um we believe it is ultimately just software. So I will assert uh Jake and I can get into fisticuffs later on this, but I will assert that context engineering is a much better term than prompt engineering or rag. I think there's a lot of buzzwords that fly around in the AI space and every week there's some new AI thought boy with their head explode emoji going crazy over some new technique. you know, they tell you the 18 different kinds of rag that you need to know about, just stop. Mute them. Your life will be better. Um, and I think as evidenced by a lot of the things you've heard tonight, that's very true. Um, what is context engineering? It is quite simply deciding what's in the context window. It's that simple. That includes the prompt. That may include retrieval depending on the use case. Um but context engineering is the right term I believe and I think it's great because context engineering uh I think uh implies the existence of context engineers people whose jobs it is to like make this really good. Maybe even context engineering implies the existence of a context engine. I'm not going to talk about that tonight but you can go home and and think about what that might mean. So really what is a lot of our shared goal? Our shared goal is to build reliable software. Um, this new software has some new abilities and primitives that prior software didn't that can be pretty useful. We believe AI can be useful if you give it the right context and you know these systems ideally are reliable, fast and cheap. Uh, I do believe that in general we should all take a make it work, make it fast, make it cheap type approach and probably today most people are still on stage one. Uh, how do we make it reliable? Okay, so why don't we just use long context? Anthropic just announced a couple days ago their million token context limit model famously uh there was a certain language model lab that released a model that 10 million tokens. This is amazing. And then you've seen these like you know startups raise half a billion dollars for 100 million billion infinite tokens. Nice. Well, unfortunately that doesn't work yet. Um and who knows? Maybe it'll never work. Maybe it'll work next year. We don't know yet. But as uh as the room is full of engineers and builders, we want to know what works today. Um so Chroma put out this technical report um about a month ago. Uh Kelly, who did a lot of the work on this, is in the audience. Shout out Kelly. Yep. Um and uh yeah, this I think this video now has done about 120,000 views on YouTube. So it is doing pretty well. And what we should demonstrated in this technical report is this across simple tasks that you think a human should do pretty well. This is a task to repeat back certain set of words model performance uh as an input of token length uh goes down precipitously. I it is clipping off the bottom here but I'll tell you I believe that the blue dot at the far bottom right hand corner is 10,000 tokens. So actually model performance uh I've heard a couple other numbers referenced tonight 40% 170k across some tasks seems like it's even much much sooner than that. Um now of course the way that usually the lab substantiate these context windows being useful is they'll tell you about needle and a haststack. Needle and haystack is solved across all the different token dimensions. Great. Um but I think what we want to point out to people is needle in a hstack is a very easy task. Um, on the screen I have an example of both a needle and a haystack. Um, and you'll notice that there's number one, the model only has to pay attention to a needle by definition. It doesn't have to pay attention to lots of the context window, only the needle. And then number two, the reasoning power is basically zero. Uh, I will read this out loud. The question is, what was the best writing advice I got from my college classmate? The needle is the best writing advice I got from my college classmate was to write every week. So like you know imagine the reasoning power required to make that match is basically zero. And what we ended up doing was plotting a number of different tasks across these dimensions of on the lefth hand axis uh amount. So the amount of the context window the model has to pay attention to and on the bottom axis the difficulty or the reasoning power required to do this task well. Um and you'll notice needle and a hack is in the bottom left requires you to pay attention to a needle. Zero reasoning power. But our assertion is that most interesting things people are doing with language models today require either more context or more reasoning or both. And actually many uh agent tasks and even summarization are much more difficult. And so then it sort of begs the question well how much of the model can you actually use effectively? So we also uh ran some tests on long meal and demonstrated that very simply if you were to give the model full context versus focus context focus context in this case is Oracle. So it's sort of human curated. Um this is the numbers for performance. So again massive gains in performance by curating context. You should curate your context. So broadly speaking the goal of context engineering is to number one find the relevant information. Number two remove the irrelevant information and then number three optimize the relevant information. And you could argue that for any given turn of the model there's this problem of out of all the information in the universe what information should be in the context window this time. And I I have this model here that I call gather and glean. Yes, it is an alliteration. Yes, I did think about that for probably 30 minutes to get there. And the way that I think makes sense to think about this problem is for those of you who have a machine learning background, this will connect. If not, I'll explain again. Stage one is you want to maximize recall. You want to get all possible relevant tokens or information even at the risk of getting information that's not relevant. And then stage two is maximizing precision. you want to then remove and call out and cut out all of the irrelevant information so you're just left with that pristine set of highly relevant non-distracting information. Um what we're seeing a lot of developers do now on kind of the retrieval side is this very uh this interesting pipeline where the query comes in from the user. You have an LLM create functionally a query plan of like okay based on this query I'm going to use these tools. I'm going to search in these ways. Maybe it creates 10 different search probes or 30 different search probes across structured data SQL queries APIs and tools unstructured data like data in Chroma and it gets a big pool of data. Um and then there's the question of like well how do you glean it down? And I'm going to get to that in a second. So again gathering it is not news to you all but you know it could be structured data unstructured data local file system tools uh other kinds of tools like MCP tools web search your chat conversation history you know all these pools of data may be relevant to the task the model has at hand and then glean um so top k on vector similarity I think you've seen that mentioned before that's usually people's first pass the next sort of approach here people use commonly is reciprocal rank fusion or RRF uh LTR learning to rank as sort of an OG information retrieval technique that's implemented into elastic search and then of course you have dedicated reranking models also common and then increasingly just LLMs believe it or not LLM um this is a great meme uh and what I what I think is quite interesting is actually that more and more developers that I talked to are they're they're calling it cheating at search or brute forcing search instead of trying to get super fancy about it they're just using more lang just use more intelligence like spend more money on tokens. Um you don't have to use state-of-the-art models all the time. You can use small fast cheap models and use a lot of them and use a lot of them in parallel to kind of help you with this curation and gleaning stage. All right. So now I wanted to spend a little bit more time talking about context engineering for agents. And of course what is agents? Well, there's there's a there's a loop happening here. And so you know in a deep research agent for example, you're not just doing this gathering glean task once. you're doing this gathering glean task many times conceptually. Um you're doing it inside of sub agents and the sub agents are getting judged by the orchestrator and they're going back and forth and you know carving up the web and and finding lots of relevant information. Um and so of course this makes the stuff more complicated and more interesting. And you know notably um you now have the addition of agent conversation and history as a major factor in the context window. When you're going back and forth you're generating lots of information. Uh, this was also alluded to a moment ago, but prompt histories can be really, really big. So, for example, this is a GIF. I know it's a little bit blurry, but this is an example from Swebench of like the code and logs generated from like one, you know, a couple turns of Swebench. Like, as has been stated before, you know, what human could possibly parse this and make sense of it is insanely large. And so, we really found this quite interesting learning actually when we were looking at the ability of agents to learn from long context. And we found one thing that was quite notable which was that if you give the agent access to past failure cases, it helps improve agent performance. The agent seems to be able to break out of these like local minimas where it commonly gets trapped and like move forward. But it wasn't really a slam dunk to give the agent access to prior success cases. In fact, in many cases, it seemed like the agent would slip into a local minima and kind of just like pattern match and get lazy like, "Oh, you already gave me the answer. Thank you. I'll just say that back." Um, and so again, there's a lot of I think these are not solved problems. I do not have the answer to this problem for you today. I wish I did. Um I think this is why like it's important to create uh a community around this idea of context engineering so we can all solve these problems together. Um and so you know as has as been stated before compaction is a really important point of leverage. Um understanding you know that gift that I showed you a moment ago that's going on forever and ever and ever. Um how do you distill down for the next turn of the model compassion is so important. And what we find is that like today's approaches don't really work. Um the difference again I apologize for that it's clipped the difference between no summary and the compaction coming out of like open code for example is negligible. So you can basically throw away that compaction entirely and it's only worse than uh using like the sort of built-in compaction tool from open code. Um but if you do a smarter a smarter compaction with a better prompt uh it can be much better. All right well thank you very much for listening. I'm Jeff uh and Mr. German. [Music]

Original Description

Jeff Huber, founder of Chroma, shares why building with large language models isn’t just about prompts or RAG—it’s about context. He explains how deciding what goes into the context window shapes reliability, why performance drops with long inputs, and how careful filtering and compaction can make AI systems faster and more useful. Chapters: 00:00 - Introduction to Context Engineering 00:26 - Understanding AI Systems as Programs 01:29 - The Concept of Context Engineering 02:02 - Building Reliable Software with AI 02:31 - Challenges with Long Contexts 03:07 - Chroma's Technical Report Insights 03:57 - Needle in a Haystack Problem 05:08 - The Importance of Context in AI Tasks 06:05 - Gather and Glean Model 06:44 - Data Gathering Techniques 07:31 - Gleaning and Optimizing Data 08:26 - Content Engineering for Agents 09:35 - Challenges with Agent Performance 10:13 - The Role of Compaction 10:57 - Conclusion and Final Thoughts

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from YC Root Access · YC Root Access · 0 of 60

← Previous Next →

Lecture 1 - How to Start a Startup (Sam Altman, Dustin Moskovitz)

Lecture 1 - How to Start a Startup (Sam Altman, Dustin Moskovitz)

Lecture 2 - Team and Execution (Sam Altman)

Lecture 2 - Team and Execution (Sam Altman)

Lecture 3 - Before the Startup (Paul Graham)

Lecture 3 - Before the Startup (Paul Graham)

Lecture 4 - Building Product, Talking to Users, and Growing (Adora Cheung)

Lecture 4 - Building Product, Talking to Users, and Growing (Adora Cheung)

Lecture 5 - Competition is for Losers (Peter Thiel)

Lecture 5 - Competition is for Losers (Peter Thiel)

Lecture 6 - Growth (Alex Schultz)

Lecture 6 - Growth (Alex Schultz)

Lecture 7 - How to Build Products Users Love (Kevin Hale)

Lecture 7 - How to Build Products Users Love (Kevin Hale)

Lecture 8 - How to Get Started, Doing Things that Don't Scale, Press

Lecture 8 - How to Get Started, Doing Things that Don't Scale, Press

Lecture 9 - How to Raise Money (Marc Andreessen, Ron Conway, Parker Conrad)

Lecture 9 - How to Raise Money (Marc Andreessen, Ron Conway, Parker Conrad)

Lecture 10 - Culture (Brian Chesky, Alfred Lin)

Lecture 10 - Culture (Brian Chesky, Alfred Lin)

Lecture 11 - Hiring and Culture, Part 2 (Patrick and John Collison, Ben Silbermann)

Lecture 11 - Hiring and Culture, Part 2 (Patrick and John Collison, Ben Silbermann)

Lecture 12 - Building for the Enterprise (Aaron Levie)

Lecture 12 - Building for the Enterprise (Aaron Levie)

Lecture 13 - How to be a Great Founder (Reid Hoffman)

Lecture 13 - How to be a Great Founder (Reid Hoffman)

Lecture 14 - How to Operate (Keith Rabois)

Lecture 14 - How to Operate (Keith Rabois)

Lecture 15 - How to Manage (Ben Horowitz)

Lecture 15 - How to Manage (Ben Horowitz)

Lecture 16 - How to Run a User Interview (Emmett Shear)

Lecture 16 - How to Run a User Interview (Emmett Shear)

Lecture 17 - How to Design Hardware Products (Hosain Rahman)

Lecture 17 - How to Design Hardware Products (Hosain Rahman)

Lecture 18 - Legal and Accounting Basics for Startups (Kirsty Nathoo, Carolynn Levy)

Lecture 18 - Legal and Accounting Basics for Startups (Kirsty Nathoo, Carolynn Levy)

Lecture 19 - Sales and Marketing; How to Talk to Investors (Tyler Bosmeny; YC Partners)

Lecture 19 - Sales and Marketing; How to Talk to Investors (Tyler Bosmeny; YC Partners)

Lecture 20 - Later-stage Advice (Sam Altman)

Lecture 20 - Later-stage Advice (Sam Altman)

YC's Summer 2022 Startup Job Expo - Pitches from 30 YC founders & find your next startup

YC's Summer 2022 Startup Job Expo - Pitches from 30 YC founders & find your next startup

AMA with YC: Job Searching During an Economic Downturn (Event Summary)

AMA with YC: Job Searching During an Economic Downturn (Event Summary)

YC Startup Job Hunt Bootcamp, September 14, 2022

YC Startup Job Hunt Bootcamp, September 14, 2022

YC Startup Talks: Understanding Equity with Jordan Gonen, CEO & Co-founder of Compound

YC Startup Talks: Understanding Equity with Jordan Gonen, CEO & Co-founder of Compound

YC Tech Talks: Climate Tech with Charge Robotics (S21), Wright Electric (W17) and Impossible Mining

YC Tech Talks: Climate Tech with Charge Robotics (S21), Wright Electric (W17) and Impossible Mining

YC Women in Tech: Breaking Into Product

YC Women in Tech: Breaking Into Product

YC Ultimate Job Guide: Startup Stages

YC Ultimate Job Guide: Startup Stages

Becoming a founding engineer at a YC startup

Becoming a founding engineer at a YC startup

3 tips for finding a job on YC's Work at a Startup

3 tips for finding a job on YC's Work at a Startup

YC Tech Talks: Defi and Scalability with Nemil at Coinbase (S12)

YC Tech Talks: Defi and Scalability with Nemil at Coinbase (S12)

YC Tech Talks: Designing Game Characters with Deep Learning, from Cory Li at Spellbrush (W18)

YC Tech Talks: Designing Game Characters with Deep Learning, from Cory Li at Spellbrush (W18)

YC Tech Talks: Designing from Day One: Artists as Founders with Multiverse (S20)

YC Tech Talks: Designing from Day One: Artists as Founders with Multiverse (S20)

YC Tech Talks: MMOs in the Instagram Era: Highrise (S18)

YC Tech Talks: MMOs in the Instagram Era: Highrise (S18)

Becoming a founding engineer at a YC startup - Finley short

Becoming a founding engineer at a YC startup - Finley short

Why become a product engineer? -- with Volley (YC W18) & Luminai (YC S20)

Why become a product engineer? -- with Volley (YC W18) & Luminai (YC S20)

Y Combinator Go-To-Market Jobs Expo, 2022

Y Combinator Go-To-Market Jobs Expo, 2022

Fireside Chat with Tanay Tandon of Athelas

Fireside Chat with Tanay Tandon of Athelas

Fireside Chat with Ivana Djuretic of Asher Bio

Fireside Chat with Ivana Djuretic of Asher Bio

The Past and Future of YC Bio

The Past and Future of YC Bio

What VCs Look for When Investing in Bio and Healthcare

What VCs Look for When Investing in Bio and Healthcare

Finding your next role: Tips from YC's Talent team

Finding your next role: Tips from YC's Talent team

YC Startup Talks: Startup Equity with Compound (YC S19)

YC Startup Talks: Startup Equity with Compound (YC S19)

YC Tech Talks: Machine Learning

YC Tech Talks: Machine Learning

FTC Chair Lina Khan at Y Combinator

FTC Chair Lina Khan at Y Combinator

AI, Startups, & Competition: Shaping California’s Tech Future

AI, Startups, & Competition: Shaping California’s Tech Future

Y Combinator Little Tech Competition Summit - Washington, DC

Y Combinator Little Tech Competition Summit - Washington, DC

The Exit Interview with Jonathan Kanter

The Exit Interview with Jonathan Kanter

Founder Demo: Daniel Vega, Co-Founder & CTO of Inversion Semiconductor

Founder Demo: Daniel Vega, Co-Founder & CTO of Inversion Semiconductor

Wither Realignment?

Wither Realignment?

Founder Demo: Cyril Gorrla, Co-founder & CEO of CTGT

Founder Demo: Cyril Gorrla, Co-founder & CEO of CTGT

Founder Demo: Newsha Ghaeli, Co-founder & President of Biobot Analytics

Founder Demo: Newsha Ghaeli, Co-founder & President of Biobot Analytics

Fireside with FTC Chairman Andrew Ferguson

Fireside with FTC Chairman Andrew Ferguson

Fireside with Boom Founder & CEO Blake Scholl

Fireside with Boom Founder & CEO Blake Scholl

Founder Demo: AJ Forsythe & Jordan Barnes of Coop

Founder Demo: AJ Forsythe & Jordan Barnes of Coop

Are Techno Optimism and Populism Incompatible?

Are Techno Optimism and Populism Incompatible?

Founder Demo: Trevor Mckendrick, Co-founder & CEO of Seis

Founder Demo: Trevor Mckendrick, Co-founder & CEO of Seis

Founder Demo: Matt Bolous, Head of Policy & Safety of Imbue

Founder Demo: Matt Bolous, Head of Policy & Safety of Imbue

Fireside with Teresa Ribiera, EVP, European Commission for Clean, Just & Competitive Transition

Fireside with Teresa Ribiera, EVP, European Commission for Clean, Just & Competitive Transition

Fireside with Epic Games Founder & CEO Tim Sweeney

Fireside with Epic Games Founder & CEO Tim Sweeney

Fireside with Former FTC Chair Lina Khan

Fireside with Former FTC Chair Lina Khan

Related AI Lessons

Your LLM Doesn’t Pick Stocks — It Remembers Them

Discover how LLMs remember stock picks rather than making actual predictions, and why this matters for AI-driven investment strategies

Medium · Machine Learning

Word Representation

Learn how word representation works in NLP and its importance in understanding human language, enabling applications like text classification and language translation

When Cosine Similarity Approaching Singularity in Google Search AI Mode

Learn how cosine similarity approaching singularity affects Google Search AI and unified knowledge graphs, and why it matters for AI engineers and data scientists

When Cosine Similarity Approaching Singularity in Google Search AI Mode

Learn how cosine similarity approaching singularity affects Google Search AI and unified knowledge graphs, and why it matters for data science and AI development

Medium · Data Science

Chapters (15)

Introduction to Context Engineering

0:26 Understanding AI Systems as Programs

1:29 The Concept of Context Engineering

2:02 Building Reliable Software with AI

2:31 Challenges with Long Contexts

3:07 Chroma's Technical Report Insights

3:57 Needle in a Haystack Problem

5:08 The Importance of Context in AI Tasks

6:05 Gather and Glean Model

6:44 Data Gathering Techniques

7:31 Gleaning and Optimizing Data

8:26 Content Engineering for Agents

9:35 Challenges with Agent Performance

10:13 The Role of Compaction

10:57 Conclusion and Final Thoughts

5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems

Dave Ebbelaar (LLM Eng)