Meta's Code World Model

Sam Witteveen · Beginner ·📄 Research Papers Explained ·9mo ago

Key Takeaways

Meta's Code World Model is a 32 billion parameter model for research on code generation with world models, trained using observation-action trajectories and fine-tuned with reinforcement learning to improve following instructions and solving complex multiple steps. The model has been made available for researchers to try out, but it's not for commercial use yet and is not fully optimized.

Full Transcript

Okay, so we have a new release from Meta and this is not Llama 5 and it's also not anything from their super artificial intelligence lab or anything like that. This is from the researchers at fair and they've basically released a model which is not for commercial use but they've released a 32 billion parameter model and they've also released a paper and it's actually quite interesting. So I want to just go through and have a look at what this actually is. So what they've released is called the code world model. And the whole idea here is about code generation with world models. So you've probably used AI to write code at some point if you're listening to this channel. If you haven't, you really should try it out. It can do wonders as long as you make sure you check that code, etc. And probably some models you've found that okay, you can get really amazing results. But at other times you find that when you look at it deep down there's actually subtle bugs in there or there's things that just don't make sense. And often this is because that the model is trained to basically replicate the syntax of code. It doesn't really understand what it does. Now given enough data and given enough examples, this still can get you really good results, right? the models can really mimic what they're doing, but they don't really understand what they're doing. And this is where this paper comes in. They're trying to basically get it. So the model actually has a sort of understanding and when I say understanding, I'm using inverted commas around that of the sort of cause and effect relationship between each line of code and actually what that does when the code gets run. So this is CWM an open weights LLM for research on code generation with world models. And the whole idea here is that the researchers are basically trying to get a model that can reason about the code not just write the code. So this brings us to the whole sort of concept of what is a world model. So I've been meaning to make a whole video about world models. I feel people misunderstand the concepts behind them and people tend to look at them as just game models for things like the genie models etc. Really the whole idea of a world model is you want something that can understand and learn how the world works just through examples. So even things like VO can exhibit really interesting sort of world moral characteristics where it can learn the properties of water, learn the properties of gravity, etc. Now models like Genie 3 that came out recently from deep mind look amazing because it looks like you're in a game and you can move around. And not only can you move around, but when you come back to where you were, the state is as you've left it. So, they have a really nice example in there of where they're painting the wall and then they turn away and come back and the paint is still in the same places that it was before. This is the whole allure of world models is that we want something that can actually learn representations that are not just at a surface level of being able to replicate something the way it looks, but a sort of understanding of what's actually going on underneath. A nicer example of this would be cooking, right? You can get a model that can memorize lots of different recipes and you could even interpolate between recipes to make up something new, but it probably has no concept of actually what it is doing in those things. It's just remembering that certain combinations go together and by playing around with these combinations, it can reproduce something. The goal of the world model is to have this sort of internal representation of how the system actually works. Now in this case of the coding world models, the goal is to actually build a world model for code where it learns the rules of this sort of computational universe that it's operating in instead of just memorizing the syntax of things that it's seen before. All right. So how do they actually do this? So instead of just training on purely lots and lots of static code, they train on this idea of observation action trajectories. Meaning that if your model takes some sort of action, can it predict what that action would cause to happen? And the way they've done this with the CWM is that if they make a model where it can watch sort of Python programs running line by line and it can observe how variables and memory change at each step. So it's learning to actually manipulate those variables not just remember the syntax of how those things work. These are what we think of the trajectories or the traces that are going on there. They also worked on this idea of agentic interactions. So they created a virtual agent that tried to solve real world software engineering problems like bug fixing etc. And then this codewell world model learns from that agent's successes and failures through a sort of reinforcement learning technique. Now does it work? It seems yes. This is the interesting thing here is that this is not about making the biggest model with pre-training or all of those sorts of things. It's about getting this new way of training to try and create the properties of this world model. So when we look at the benchmarks, this model is actually doing really well on things like SWE, especially when comparing to sort of models of its own size. It's also doing really well on math and reasoning. Again, while it may not be state-of-the-art for each of these benchmarks, for a model of this size to be able to do this, and my guess is that they haven't maxed out the sort of pre-training here, but it's really creating this model which has powerful reasoning abilities in more than just a way of generating long chain of thought out. All right. So, if we look at what they actually did, so for their pre-training, they've only used 8 trillion tokens of generated text and code in here. So, that's a lot less than what we've seen that from the Quen models recently where their big models are doing about 36 trillion. I think even the Quen Next was doing around half that. So, relatively this is pretty small. The mid-training bit here is really interesting. So this is where they're getting the model to learn these world model properties by training on another 5 trillion tokens of these special execution traces and agent data. And finally they fine-tune it with some reinforcement learning to get it even better at following instructions and solving complex multiple steps. So really the key thing here is they're adding this new step of training the model to get it to be able to do much better at these kind of reasoning traces. So the idea here is because the code world model understands code execution, it can simulate what code would do. And this opens up an opportunity for what they're calling this neural debugger where it can basically go through it itself and suggest things that are going to go wrong or be aware of what those variables are doing. Another big application of this is that this should be able to make smarter, more reliable agents. So often current agents are really only trying things until something works. The idea with this CWM model is that it should be able to plan and reason about their actions in a better way than what we've seen before. And this is something that they experiment with in here. And the idea here is that the agents then can reliably fix bugs, add features, do things on the fly rather than just brute forcing trial and error until they get the right call, etc. Now, one of the cool things is that they've made the model weights actually available for researchers and for people to try this out. It's not for commercial use yet, and it really is not fully optimized. You could imagine a version of this that is going to be much better with more pre-training, with more RL guiding some of these agent trajectories, etc. But it's good to see Meta going back to actually releasing some of these things because don't forget originally the first llama model was exactly like this just a researchon model wasn't for commercial use. It was just to see like how well this could actually work. So let's just recap the key things here. This is really trying to move away from just learning syntax and being able to predict the next token to really understanding the consequences of actions that it's predicting. And you can really think of this as sort of syntax to much more learning about the semantics of what is actually going on in a simulation as we go through this. Okay, so this would be the part that I would normally go in and actually show you the model running. Unfortunately, Meta doesn't seem to be giving access to me, and I applied for access well over probably 36 hours ago, but I still don't have it. And this is a gated model, so you basically have to fill in your details to actually get access to the weights. I was tempted to take it personally until I saw that it wasn't only me that hasn't gotten access, that quite a number of people by the looks of this have also applied for access and not gotten access in here. So for the time being we can basically take a look at some of the examples that are in the appendex of the actual paper in here. So the model itself is a 32 billion parameter model. It will be interesting to see how this is affected by if you're going to do quantized versions of it etc. To run it without quantization you're probably going to need like an H100 or something like that to be able to run it. But we can see some of the outputs by coming and looking at the examples that are in the paper here. So this is one of the first ones that they show and it's interesting to see here that okay this is using a bash environment and we can see that it actually gets something wrong and then basically realizes that it's gotten that wrong and then it actually backtracks on that to update what it's got here. So it is interesting to see here we've got like the thinking we've got tool calls we've got output and we've got a series of these coming out. So this is not just your regular reasoning chain of thought etc coming out here. The whole idea of backtracking in LMS is not new. It's been around for probably about 18 months now that people have been training models to realize that they output something wrong and to basically go back and fix that up. And you can imagine this is where the agentic traces become really useful for this kind of task. So this is one of the SWE examples in Python showing again doing different kinds of thinking of the agent the tool calls deciding what it's actually going to do at each step. Now what we don't know without testing the actual model out properly is how much has this model been overfitted for this kind of SWEBench task versus for more general programming and code generation tasks. That's something that we're going to need to check out in the future. But overall, just to finish up, this is definitely a really interesting paper. If this idea does seem to work, I think we're going to see this incorporated not only for coding models and mathematics, but for all kinds of specialist agent models, which I was talking about in one of the Quen videos that I did recently where they had like a travel agent with an agentic and model suited to each other. You could imagine this exact same kind of thing being done with these kind of world model trajectories so that it can learn to get much better at that kind of task. And I think what Quen was doing is even experimenting with some of these ideas already as well. All right, I'll leave it there for this video. Let me know in the comments what you think of this idea. It will be interesting to see where this goes. Is this something that's going to take off more or are we going to just see these kind of ideas be folded into the different kinds of RL that people are actually doing already for post training? Anyway, as always, if you like the video, please click like and subscribe and I will talk to you in the next video. Bye for now.

Original Description

In this video I look at some new research out of Meta which is a code world model and is basically an LLM trained in a different way to try and get it to understand the tokens that it's generating. more than a conventional LLM. Blog: https://ai.meta.com/research/publications/cwm-an-open-weights-llm-for-research-on-code-generation-with-world-models/ HF: https://huggingface.co/facebook/cwm For more tutorials on using LLMs and building agents, check out my Patreon Patreon: https://www.patreon.com/SamWitteveen Twitter: https://x.com/Sam_Witteveen 🕵️ Interested in building LLM Agents? Fill out the form below Building LLM Agents Form: https://drp.li/dIMes 👨‍💻Github: https://github.com/samwit/llm-tutorials ⏱️Time Stamps: 00:00 Intro 02:04 World Models 04:55 Agentic Interactions 06:03 CWM Diagram 07:01 Agentic Loop 09:04 CWM Hugging Face 09:38 Code World Model Paper
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Sam Witteveen · Sam Witteveen · 0 of 60

← Previous Next →
1 LangChain Basics Tutorial #1 - LLMs & PromptTemplates with Colab
LangChain Basics Tutorial #1 - LLMs & PromptTemplates with Colab
Sam Witteveen
2 LangChain Basics Tutorial #2 Tools and Chains
LangChain Basics Tutorial #2 Tools and Chains
Sam Witteveen
3 ChatGPT API Announcement & Code Walkthrough with LangChain
ChatGPT API Announcement & Code Walkthrough with LangChain
Sam Witteveen
4 Trying Out Flan 20B with UL2 - Working in Colab with 8Bit Inference
Trying Out Flan 20B with UL2 - Working in Colab with 8Bit Inference
Sam Witteveen
5 LangChain - Conversations with Memory (explanation & code walkthrough)
LangChain - Conversations with Memory (explanation & code walkthrough)
Sam Witteveen
6 LangChain Chat with Flan20B
LangChain Chat with Flan20B
Sam Witteveen
7 LangChain - Using Hugging Face Models locally (code walkthrough)
LangChain - Using Hugging Face Models locally (code walkthrough)
Sam Witteveen
8 PAL : Program-aided Language Models with LangChain code
PAL : Program-aided Language Models with LangChain code
Sam Witteveen
9 Building a Summarization System with LangChain and GPT-3 - Part 1
Building a Summarization System with LangChain and GPT-3 - Part 1
Sam Witteveen
10 Building a Summarization System with LangChain and GPT-3 - Part 2
Building a Summarization System with LangChain and GPT-3 - Part 2
Sam Witteveen
11 Microsoft's Visual ChatGPT using LangChain
Microsoft's Visual ChatGPT using LangChain
Sam Witteveen
12 Building a Summarization System with LangChain - Part 3 Using ChatGPT Turbo
Building a Summarization System with LangChain - Part 3 Using ChatGPT Turbo
Sam Witteveen
13 LangChain Agents - Joining Tools and Chains with Decisions
LangChain Agents - Joining Tools and Chains with Decisions
Sam Witteveen
14 Investigating Alpaca 7B - Finetuned LLaMa LLM
Investigating Alpaca 7B - Finetuned LLaMa LLM
Sam Witteveen
15 Comparing LLMs with LangChain
Comparing LLMs with LangChain
Sam Witteveen
16 Running Alpaca7B in Colab
Running Alpaca7B in Colab
Sam Witteveen
17 How to finetune your own Alpaca 7B
How to finetune your own Alpaca 7B
Sam Witteveen
18 How to make a custom dataset like Alpaca7B
How to make a custom dataset like Alpaca7B
Sam Witteveen
19 Understanding Constitutional AI - the paper and key concepts
Understanding Constitutional AI - the paper and key concepts
Sam Witteveen
20 Using Constitutional AI in LangChain
Using Constitutional AI in LangChain
Sam Witteveen
21 Talking to Alpaca with LangChain - Creating an Alpaca Chatbot
Talking to Alpaca with LangChain - Creating an Alpaca Chatbot
Sam Witteveen
22 Text-to-video-synthesis with Diffusers and Colab
Text-to-video-synthesis with Diffusers and Colab
Sam Witteveen
23 Meet Dolly the new Alpaca model
Meet Dolly the new Alpaca model
Sam Witteveen
24 Checking out the Cerebras-GPT family of models
Checking out the Cerebras-GPT family of models
Sam Witteveen
25 A Step-by-Step Guide to Fine-Tuning Your Dolly Model (tutorial)
A Step-by-Step Guide to Fine-Tuning Your Dolly Model (tutorial)
Sam Witteveen
26 Is GPT4All your new personal ChatGPT?
Is GPT4All your new personal ChatGPT?
Sam Witteveen
27 Raven - RWKV-7B RNN's LLM Strikes Back
Raven - RWKV-7B RNN's LLM Strikes Back
Sam Witteveen
28 Talk to your CSV & Excel with LangChain
Talk to your CSV & Excel with LangChain
Sam Witteveen
29 Vicuna - 90% of ChatGPT quality by using a new dataset?
Vicuna - 90% of ChatGPT quality by using a new dataset?
Sam Witteveen
30 Koala Revealed: The ChatGPT Alternative You Need to Know! 🔍
Koala Revealed: The ChatGPT Alternative You Need to Know! 🔍
Sam Witteveen
31 Running Koala for free in Colab. Your own personal ChatGPT? (tutorial)
Running Koala for free in Colab. Your own personal ChatGPT? (tutorial)
Sam Witteveen
32 BabyAGI: Discover the Power of Task-Driven Autonomous Agents!
BabyAGI: Discover the Power of Task-Driven Autonomous Agents!
Sam Witteveen
33 Auto-GPT - How to Automate a Task Based AI with GPT-4
Auto-GPT - How to Automate a Task Based AI with GPT-4
Sam Witteveen
34 Improve your BabyAGI with LangChain
Improve your BabyAGI with LangChain
Sam Witteveen
35 Generative Agents - Deep Dive and GPT-4 Recreation
Generative Agents - Deep Dive and GPT-4 Recreation
Sam Witteveen
36 GPT4ALLv2: The Improvements and Drawbacks You Need to Know!
GPT4ALLv2: The Improvements and Drawbacks You Need to Know!
Sam Witteveen
37 Dolly 2.0 by Databricks: Open for Business but is it  Ready to Impress!
Dolly 2.0 by Databricks: Open for Business but is it Ready to Impress!
Sam Witteveen
38 Red Pajama - Operation: Freeing LLaMA
Red Pajama - Operation: Freeing LLaMA
Sam Witteveen
39 Investigating Open Assistant - Models, Datasets and Addons
Investigating Open Assistant - Models, Datasets and Addons
Sam Witteveen
40 Investigating MiniGPT-4 - The Secret behind GPT-V?
Investigating MiniGPT-4 - The Secret behind GPT-V?
Sam Witteveen
41 Stable LM 3B - The new tiny kid on the block.
Stable LM 3B - The new tiny kid on the block.
Sam Witteveen
42 Bard can now code and put that code in Colab for you.
Bard can now code and put that code in Colab for you.
Sam Witteveen
43 Checking out Bark: a Text to Speech system by Suno AI
Checking out Bark: a Text to Speech system by Suno AI
Sam Witteveen
44 Fine-tuning LLMs with PEFT and LoRA
Fine-tuning LLMs with PEFT and LoRA
Sam Witteveen
45 Master PDF Chat with LangChain - Your essential guide to queries on documents
Master PDF Chat with LangChain - Your essential guide to queries on documents
Sam Witteveen
46 Using LangChain with DuckDuckGO Wikipedia & PythonREPL Tools
Using LangChain with DuckDuckGO Wikipedia & PythonREPL Tools
Sam Witteveen
47 Building Custom Tools and Agents with LangChain (gpt-3.5-turbo)
Building Custom Tools and Agents with LangChain (gpt-3.5-turbo)
Sam Witteveen
48 StableVicuna: The New King of Open ChatGPTs?
StableVicuna: The New King of Open ChatGPTs?
Sam Witteveen
49 WizardLM: Evolving Instruction Datasets to Create a Better Model
WizardLM: Evolving Instruction Datasets to Create a Better Model
Sam Witteveen
50 LaMini-LM - Mini Models Maxi Data!
LaMini-LM - Mini Models Maxi Data!
Sam Witteveen
51 Finding the Best Free ChatGPT
Finding the Best Free ChatGPT
Sam Witteveen
52 MPT-7B - The First Commercially Usable Fully Trained LLaMA Style Model
MPT-7B - The First Commercially Usable Fully Trained LLaMA Style Model
Sam Witteveen
53 LangChain Retrieval QA Over Multiple Files with ChromaDB
LangChain Retrieval QA Over Multiple Files with ChromaDB
Sam Witteveen
54 LangChain Retrieval QA with Instructor Embeddings & ChromaDB for PDFs
LangChain Retrieval QA with Instructor Embeddings & ChromaDB for PDFs
Sam Witteveen
55 LangChain + Retrieval Local LLMs for Retrieval QA - No OpenAI!!!
LangChain + Retrieval Local LLMs for Retrieval QA - No OpenAI!!!
Sam Witteveen
56 Transformers Agent - Is this Hugging Face's LangChain Competitor?
Transformers Agent - Is this Hugging Face's LangChain Competitor?
Sam Witteveen
57 StarCoder - The LLM to make you a coding star?
StarCoder - The LLM to make you a coding star?
Sam Witteveen
58 Testing Starcoder for Reasoning with PAL
Testing Starcoder for Reasoning with PAL
Sam Witteveen
59 The New Wizards - Unfiltered & Unaligned
The New Wizards - Unfiltered & Unaligned
Sam Witteveen
60 Camel + LangChain for Synthetic Data & Market Research
Camel + LangChain for Synthetic Data & Market Research
Sam Witteveen

Meta's Code World Model is a novel approach to code generation using world models and reinforcement learning. The model has shown promising results in planning and reasoning about its actions, and has been made available for researchers to try out. However, it's not yet fully optimized and is not for commercial use.

Key Takeaways
  1. Train on observation-action trajectories
  2. Pre-train on 8 trillion tokens of generated text and code
  3. Train on 5 trillion tokens of execution traces and agent data
  4. Fine-tune with reinforcement learning
  5. Use backtracking to correct outputs
  6. Apply to specialist agent models
💡 The use of world models and reinforcement learning enables the Code World Model to plan and reason about its actions in a more effective way than previous models.

Related AI Lessons

I Spent Weeks Looking for a Research Gap Before I Realized I Was Searching the Wrong Way
Learn how to effectively find research gaps by changing your approach, a crucial skill for AI researchers and academics
Medium · AI
ICMI 2026 Reviews [D]
Learn how to interpret ICMI 2026 reviews and improve your paper's acceptance chances
Reddit r/MachineLearning
Workshop submission for main conference paper under review [D]
Learn how to navigate submitting a paper to a non-archival workshop before the final decision of a main conference like ECCV
Reddit r/MachineLearning
Kept context-switching between arxiv, OpenReview, GitHub, and HuggingFace for every paper, so I built this. Chrome extension + website with everything inline, plus citation graph + SPECTER2 neighbors. 3M papers, free, feedback welcome [P]
Streamline your research with a new Chrome extension and website that integrates 3M papers from arxiv, OpenReview, GitHub, and HuggingFace, including citation graphs and SPECTER2 neighbors, and provide feedback to improve it
Reddit r/MachineLearning

Chapters (7)

Intro
2:04 World Models
4:55 Agentic Interactions
6:03 CWM Diagram
7:01 Agentic Loop
9:04 CWM Hugging Face
9:38 Code World Model Paper
Up next
Beyond Big Vendors: ERP Systems Explained #shorts
Digital Transformation with Eric Kimberling
Watch →