Context Engineering for Engineers
Key Takeaways
This video teaches context engineering techniques for large language models
Full Transcript
So tonight um I want to share some thoughts about context engineering for engineers this audience. Um by way of context my name is Jeff and I'm the founder of Chroma. Chroma for those of you who don't know u builds a search and retrieval database. Uh numerous illustrious speakers have shouted out Chroma. So thank you very much. But let's get into some meat here. Um, you know, one way that we think about what really is happening inside of an AI system is that it is ultimately just a program. You have your instruction set, the relevant information and tools. You have your user input, that's the part that changes, and then you put it into this magic box, and an output comes out the other end. And yeah, this is just very much a program. And though people may want to sell this to you as uh a techno machine god, um we believe it is ultimately just software. So I will assert uh Jake and I can get into fisticuffs later on this, but I will assert that context engineering is a much better term than prompt engineering or rag. I think there's a lot of buzzwords that fly around in the AI space and every week there's some new AI thought boy with their head explode emoji going crazy over some new technique. you know, they tell you the 18 different kinds of rag that you need to know about, just stop. Mute them. Your life will be better. Um, and I think as evidenced by a lot of the things you've heard tonight, that's very true. Um, what is context engineering? It is quite simply deciding what's in the context window. It's that simple. That includes the prompt. That may include retrieval depending on the use case. Um but context engineering is the right term I believe and I think it's great because context engineering uh I think uh implies the existence of context engineers people whose jobs it is to like make this really good. Maybe even context engineering implies the existence of a context engine. I'm not going to talk about that tonight but you can go home and and think about what that might mean. So really what is a lot of our shared goal? Our shared goal is to build reliable software. Um, this new software has some new abilities and primitives that prior software didn't that can be pretty useful. We believe AI can be useful if you give it the right context and you know these systems ideally are reliable, fast and cheap. Uh, I do believe that in general we should all take a make it work, make it fast, make it cheap type approach and probably today most people are still on stage one. Uh, how do we make it reliable? Okay, so why don't we just use long context? Anthropic just announced a couple days ago their million token context limit model famously uh there was a certain language model lab that released a model that 10 million tokens. This is amazing. And then you've seen these like you know startups raise half a billion dollars for 100 million billion infinite tokens. Nice. Well, unfortunately that doesn't work yet. Um and who knows? Maybe it'll never work. Maybe it'll work next year. We don't know yet. But as uh as the room is full of engineers and builders, we want to know what works today. Um so Chroma put out this technical report um about a month ago. Uh Kelly, who did a lot of the work on this, is in the audience. Shout out Kelly. Yep. Um and uh yeah, this I think this video now has done about 120,000 views on YouTube. So it is doing pretty well. And what we should demonstrated in this technical report is this across simple tasks that you think a human should do pretty well. This is a task to repeat back certain set of words model performance uh as an input of token length uh goes down precipitously. I it is clipping off the bottom here but I'll tell you I believe that the blue dot at the far bottom right hand corner is 10,000 tokens. So actually model performance uh I've heard a couple other numbers referenced tonight 40% 170k across some tasks seems like it's even much much sooner than that. Um now of course the way that usually the lab substantiate these context windows being useful is they'll tell you about needle and a haststack. Needle and haystack is solved across all the different token dimensions. Great. Um but I think what we want to point out to people is needle in a hstack is a very easy task. Um, on the screen I have an example of both a needle and a haystack. Um, and you'll notice that there's number one, the model only has to pay attention to a needle by definition. It doesn't have to pay attention to lots of the context window, only the needle. And then number two, the reasoning power is basically zero. Uh, I will read this out loud. The question is, what was the best writing advice I got from my college classmate? The needle is the best writing advice I got from my college classmate was to write every week. So like you know imagine the reasoning power required to make that match is basically zero. And what we ended up doing was plotting a number of different tasks across these dimensions of on the lefth hand axis uh amount. So the amount of the context window the model has to pay attention to and on the bottom axis the difficulty or the reasoning power required to do this task well. Um and you'll notice needle and a hack is in the bottom left requires you to pay attention to a needle. Zero reasoning power. But our assertion is that most interesting things people are doing with language models today require either more context or more reasoning or both. And actually many uh agent tasks and even summarization are much more difficult. And so then it sort of begs the question well how much of the model can you actually use effectively? So we also uh ran some tests on long meal and demonstrated that very simply if you were to give the model full context versus focus context focus context in this case is Oracle. So it's sort of human curated. Um this is the numbers for performance. So again massive gains in performance by curating context. You should curate your context. So broadly speaking the goal of context engineering is to number one find the relevant information. Number two remove the irrelevant information and then number three optimize the relevant information. And you could argue that for any given turn of the model there's this problem of out of all the information in the universe what information should be in the context window this time. And I I have this model here that I call gather and glean. Yes, it is an alliteration. Yes, I did think about that for probably 30 minutes to get there. And the way that I think makes sense to think about this problem is for those of you who have a machine learning background, this will connect. If not, I'll explain again. Stage one is you want to maximize recall. You want to get all possible relevant tokens or information even at the risk of getting information that's not relevant. And then stage two is maximizing precision. you want to then remove and call out and cut out all of the irrelevant information so you're just left with that pristine set of highly relevant non-distracting information. Um what we're seeing a lot of developers do now on kind of the retrieval side is this very uh this interesting pipeline where the query comes in from the user. You have an LLM create functionally a query plan of like okay based on this query I'm going to use these tools. I'm going to search in these ways. Maybe it creates 10 different search probes or 30 different search probes across structured data SQL queries APIs and tools unstructured data like data in Chroma and it gets a big pool of data. Um and then there's the question of like well how do you glean it down? And I'm going to get to that in a second. So again gathering it is not news to you all but you know it could be structured data unstructured data local file system tools uh other kinds of tools like MCP tools web search your chat conversation history you know all these pools of data may be relevant to the task the model has at hand and then glean um so top k on vector similarity I think you've seen that mentioned before that's usually people's first pass the next sort of approach here people use commonly is reciprocal rank fusion or RRF uh LTR learning to rank as sort of an OG information retrieval technique that's implemented into elastic search and then of course you have dedicated reranking models also common and then increasingly just LLMs believe it or not LLM um this is a great meme uh and what I what I think is quite interesting is actually that more and more developers that I talked to are they're they're calling it cheating at search or brute forcing search instead of trying to get super fancy about it they're just using more lang just use more intelligence like spend more money on tokens. Um you don't have to use state-of-the-art models all the time. You can use small fast cheap models and use a lot of them and use a lot of them in parallel to kind of help you with this curation and gleaning stage. All right. So now I wanted to spend a little bit more time talking about context engineering for agents. And of course what is agents? Well, there's there's a there's a loop happening here. And so you know in a deep research agent for example, you're not just doing this gathering glean task once. you're doing this gathering glean task many times conceptually. Um you're doing it inside of sub agents and the sub agents are getting judged by the orchestrator and they're going back and forth and you know carving up the web and and finding lots of relevant information. Um and so of course this makes the stuff more complicated and more interesting. And you know notably um you now have the addition of agent conversation and history as a major factor in the context window. When you're going back and forth you're generating lots of information. Uh, this was also alluded to a moment ago, but prompt histories can be really, really big. So, for example, this is a GIF. I know it's a little bit blurry, but this is an example from Swebench of like the code and logs generated from like one, you know, a couple turns of Swebench. Like, as has been stated before, you know, what human could possibly parse this and make sense of it is insanely large. And so, we really found this quite interesting learning actually when we were looking at the ability of agents to learn from long context. And we found one thing that was quite notable which was that if you give the agent access to past failure cases, it helps improve agent performance. The agent seems to be able to break out of these like local minimas where it commonly gets trapped and like move forward. But it wasn't really a slam dunk to give the agent access to prior success cases. In fact, in many cases, it seemed like the agent would slip into a local minima and kind of just like pattern match and get lazy like, "Oh, you already gave me the answer. Thank you. I'll just say that back." Um, and so again, there's a lot of I think these are not solved problems. I do not have the answer to this problem for you today. I wish I did. Um I think this is why like it's important to create uh a community around this idea of context engineering so we can all solve these problems together. Um and so you know as has as been stated before compaction is a really important point of leverage. Um understanding you know that gift that I showed you a moment ago that's going on forever and ever and ever. Um how do you distill down for the next turn of the model compassion is so important. And what we find is that like today's approaches don't really work. Um the difference again I apologize for that it's clipped the difference between no summary and the compaction coming out of like open code for example is negligible. So you can basically throw away that compaction entirely and it's only worse than uh using like the sort of built-in compaction tool from open code. Um but if you do a smarter a smarter compaction with a better prompt uh it can be much better. All right well thank you very much for listening. I'm Jeff uh and Mr. German. [Music]
Original Description
Jeff Huber, founder of Chroma, shares why building with large language models isn’t just about prompts or RAG—it’s about context. He explains how deciding what goes into the context window shapes reliability, why performance drops with long inputs, and how careful filtering and compaction can make AI systems faster and more useful.
Chapters:
00:00 - Introduction to Context Engineering
00:26 - Understanding AI Systems as Programs
01:29 - The Concept of Context Engineering
02:02 - Building Reliable Software with AI
02:31 - Challenges with Long Contexts
03:07 - Chroma's Technical Report Insights
03:57 - Needle in a Haystack Problem
05:08 - The Importance of Context in AI Tasks
06:05 - Gather and Glean Model
06:44 - Data Gathering Techniques
07:31 - Gleaning and Optimizing Data
08:26 - Content Engineering for Agents
09:35 - Challenges with Agent Performance
10:13 - The Role of Compaction
10:57 - Conclusion and Final Thoughts
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from YC Root Access · YC Root Access · 0 of 60
← Previous
Next →
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Lecture 1 - How to Start a Startup (Sam Altman, Dustin Moskovitz)
YC Root Access
Lecture 2 - Team and Execution (Sam Altman)
YC Root Access
Lecture 3 - Before the Startup (Paul Graham)
YC Root Access
Lecture 4 - Building Product, Talking to Users, and Growing (Adora Cheung)
YC Root Access
Lecture 5 - Competition is for Losers (Peter Thiel)
YC Root Access
Lecture 6 - Growth (Alex Schultz)
YC Root Access
Lecture 7 - How to Build Products Users Love (Kevin Hale)
YC Root Access
Lecture 8 - How to Get Started, Doing Things that Don't Scale, Press
YC Root Access
Lecture 9 - How to Raise Money (Marc Andreessen, Ron Conway, Parker Conrad)
YC Root Access
Lecture 10 - Culture (Brian Chesky, Alfred Lin)
YC Root Access
Lecture 11 - Hiring and Culture, Part 2 (Patrick and John Collison, Ben Silbermann)
YC Root Access
Lecture 12 - Building for the Enterprise (Aaron Levie)
YC Root Access
Lecture 13 - How to be a Great Founder (Reid Hoffman)
YC Root Access
Lecture 14 - How to Operate (Keith Rabois)
YC Root Access
Lecture 15 - How to Manage (Ben Horowitz)
YC Root Access
Lecture 16 - How to Run a User Interview (Emmett Shear)
YC Root Access
Lecture 17 - How to Design Hardware Products (Hosain Rahman)
YC Root Access
Lecture 18 - Legal and Accounting Basics for Startups (Kirsty Nathoo, Carolynn Levy)
YC Root Access
Lecture 19 - Sales and Marketing; How to Talk to Investors (Tyler Bosmeny; YC Partners)
YC Root Access
Lecture 20 - Later-stage Advice (Sam Altman)
YC Root Access
YC's Summer 2022 Startup Job Expo - Pitches from 30 YC founders & find your next startup
YC Root Access
AMA with YC: Job Searching During an Economic Downturn (Event Summary)
YC Root Access
YC Startup Job Hunt Bootcamp, September 14, 2022
YC Root Access
YC Startup Talks: Understanding Equity with Jordan Gonen, CEO & Co-founder of Compound
YC Root Access
YC Tech Talks: Climate Tech with Charge Robotics (S21), Wright Electric (W17) and Impossible Mining
YC Root Access
YC Women in Tech: Breaking Into Product
YC Root Access
YC Ultimate Job Guide: Startup Stages
YC Root Access
Becoming a founding engineer at a YC startup
YC Root Access
3 tips for finding a job on YC's Work at a Startup
YC Root Access
YC Tech Talks: Defi and Scalability with Nemil at Coinbase (S12)
YC Root Access
YC Tech Talks: Designing Game Characters with Deep Learning, from Cory Li at Spellbrush (W18)
YC Root Access
YC Tech Talks: Designing from Day One: Artists as Founders with Multiverse (S20)
YC Root Access
YC Tech Talks: MMOs in the Instagram Era: Highrise (S18)
YC Root Access
Becoming a founding engineer at a YC startup - Finley short
YC Root Access
Why become a product engineer? -- with Volley (YC W18) & Luminai (YC S20)
YC Root Access
Y Combinator Go-To-Market Jobs Expo, 2022
YC Root Access
Fireside Chat with Tanay Tandon of Athelas
YC Root Access
Fireside Chat with Ivana Djuretic of Asher Bio
YC Root Access
The Past and Future of YC Bio
YC Root Access
What VCs Look for When Investing in Bio and Healthcare
YC Root Access
Finding your next role: Tips from YC's Talent team
YC Root Access
YC Startup Talks: Startup Equity with Compound (YC S19)
YC Root Access
YC Tech Talks: Machine Learning
YC Root Access
FTC Chair Lina Khan at Y Combinator
YC Root Access
AI, Startups, & Competition: Shaping California’s Tech Future
YC Root Access
Y Combinator Little Tech Competition Summit - Washington, DC
YC Root Access
The Exit Interview with Jonathan Kanter
YC Root Access
Founder Demo: Daniel Vega, Co-Founder & CTO of Inversion Semiconductor
YC Root Access
Wither Realignment?
YC Root Access
Founder Demo: Cyril Gorrla, Co-founder & CEO of CTGT
YC Root Access
Founder Demo: Newsha Ghaeli, Co-founder & President of Biobot Analytics
YC Root Access
Fireside with FTC Chairman Andrew Ferguson
YC Root Access
Fireside with Boom Founder & CEO Blake Scholl
YC Root Access
Founder Demo: AJ Forsythe & Jordan Barnes of Coop
YC Root Access
Are Techno Optimism and Populism Incompatible?
YC Root Access
Founder Demo: Trevor Mckendrick, Co-founder & CEO of Seis
YC Root Access
Founder Demo: Matt Bolous, Head of Policy & Safety of Imbue
YC Root Access
Fireside with Teresa Ribiera, EVP, European Commission for Clean, Just & Competitive Transition
YC Root Access
Fireside with Epic Games Founder & CEO Tim Sweeney
YC Root Access
Fireside with Former FTC Chair Lina Khan
YC Root Access
Related AI Lessons
Chapters (15)
Introduction to Context Engineering
0:26
Understanding AI Systems as Programs
1:29
The Concept of Context Engineering
2:02
Building Reliable Software with AI
2:31
Challenges with Long Contexts
3:07
Chroma's Technical Report Insights
3:57
Needle in a Haystack Problem
5:08
The Importance of Context in AI Tasks
6:05
Gather and Glean Model
6:44
Data Gathering Techniques
7:31
Gleaning and Optimizing Data
8:26
Content Engineering for Agents
9:35
Challenges with Agent Performance
10:13
The Role of Compaction
10:57
Conclusion and Final Thoughts
🎓
Tutor Explanation
DeepCamp AI