Meta's Code World Model
Key Takeaways
Meta's Code World Model is a 32 billion parameter model for research on code generation with world models, trained using observation-action trajectories and fine-tuned with reinforcement learning to improve following instructions and solving complex multiple steps. The model has been made available for researchers to try out, but it's not for commercial use yet and is not fully optimized.
Full Transcript
Okay, so we have a new release from Meta and this is not Llama 5 and it's also not anything from their super artificial intelligence lab or anything like that. This is from the researchers at fair and they've basically released a model which is not for commercial use but they've released a 32 billion parameter model and they've also released a paper and it's actually quite interesting. So I want to just go through and have a look at what this actually is. So what they've released is called the code world model. And the whole idea here is about code generation with world models. So you've probably used AI to write code at some point if you're listening to this channel. If you haven't, you really should try it out. It can do wonders as long as you make sure you check that code, etc. And probably some models you've found that okay, you can get really amazing results. But at other times you find that when you look at it deep down there's actually subtle bugs in there or there's things that just don't make sense. And often this is because that the model is trained to basically replicate the syntax of code. It doesn't really understand what it does. Now given enough data and given enough examples, this still can get you really good results, right? the models can really mimic what they're doing, but they don't really understand what they're doing. And this is where this paper comes in. They're trying to basically get it. So the model actually has a sort of understanding and when I say understanding, I'm using inverted commas around that of the sort of cause and effect relationship between each line of code and actually what that does when the code gets run. So this is CWM an open weights LLM for research on code generation with world models. And the whole idea here is that the researchers are basically trying to get a model that can reason about the code not just write the code. So this brings us to the whole sort of concept of what is a world model. So I've been meaning to make a whole video about world models. I feel people misunderstand the concepts behind them and people tend to look at them as just game models for things like the genie models etc. Really the whole idea of a world model is you want something that can understand and learn how the world works just through examples. So even things like VO can exhibit really interesting sort of world moral characteristics where it can learn the properties of water, learn the properties of gravity, etc. Now models like Genie 3 that came out recently from deep mind look amazing because it looks like you're in a game and you can move around. And not only can you move around, but when you come back to where you were, the state is as you've left it. So, they have a really nice example in there of where they're painting the wall and then they turn away and come back and the paint is still in the same places that it was before. This is the whole allure of world models is that we want something that can actually learn representations that are not just at a surface level of being able to replicate something the way it looks, but a sort of understanding of what's actually going on underneath. A nicer example of this would be cooking, right? You can get a model that can memorize lots of different recipes and you could even interpolate between recipes to make up something new, but it probably has no concept of actually what it is doing in those things. It's just remembering that certain combinations go together and by playing around with these combinations, it can reproduce something. The goal of the world model is to have this sort of internal representation of how the system actually works. Now in this case of the coding world models, the goal is to actually build a world model for code where it learns the rules of this sort of computational universe that it's operating in instead of just memorizing the syntax of things that it's seen before. All right. So how do they actually do this? So instead of just training on purely lots and lots of static code, they train on this idea of observation action trajectories. Meaning that if your model takes some sort of action, can it predict what that action would cause to happen? And the way they've done this with the CWM is that if they make a model where it can watch sort of Python programs running line by line and it can observe how variables and memory change at each step. So it's learning to actually manipulate those variables not just remember the syntax of how those things work. These are what we think of the trajectories or the traces that are going on there. They also worked on this idea of agentic interactions. So they created a virtual agent that tried to solve real world software engineering problems like bug fixing etc. And then this codewell world model learns from that agent's successes and failures through a sort of reinforcement learning technique. Now does it work? It seems yes. This is the interesting thing here is that this is not about making the biggest model with pre-training or all of those sorts of things. It's about getting this new way of training to try and create the properties of this world model. So when we look at the benchmarks, this model is actually doing really well on things like SWE, especially when comparing to sort of models of its own size. It's also doing really well on math and reasoning. Again, while it may not be state-of-the-art for each of these benchmarks, for a model of this size to be able to do this, and my guess is that they haven't maxed out the sort of pre-training here, but it's really creating this model which has powerful reasoning abilities in more than just a way of generating long chain of thought out. All right. So, if we look at what they actually did, so for their pre-training, they've only used 8 trillion tokens of generated text and code in here. So, that's a lot less than what we've seen that from the Quen models recently where their big models are doing about 36 trillion. I think even the Quen Next was doing around half that. So, relatively this is pretty small. The mid-training bit here is really interesting. So this is where they're getting the model to learn these world model properties by training on another 5 trillion tokens of these special execution traces and agent data. And finally they fine-tune it with some reinforcement learning to get it even better at following instructions and solving complex multiple steps. So really the key thing here is they're adding this new step of training the model to get it to be able to do much better at these kind of reasoning traces. So the idea here is because the code world model understands code execution, it can simulate what code would do. And this opens up an opportunity for what they're calling this neural debugger where it can basically go through it itself and suggest things that are going to go wrong or be aware of what those variables are doing. Another big application of this is that this should be able to make smarter, more reliable agents. So often current agents are really only trying things until something works. The idea with this CWM model is that it should be able to plan and reason about their actions in a better way than what we've seen before. And this is something that they experiment with in here. And the idea here is that the agents then can reliably fix bugs, add features, do things on the fly rather than just brute forcing trial and error until they get the right call, etc. Now, one of the cool things is that they've made the model weights actually available for researchers and for people to try this out. It's not for commercial use yet, and it really is not fully optimized. You could imagine a version of this that is going to be much better with more pre-training, with more RL guiding some of these agent trajectories, etc. But it's good to see Meta going back to actually releasing some of these things because don't forget originally the first llama model was exactly like this just a researchon model wasn't for commercial use. It was just to see like how well this could actually work. So let's just recap the key things here. This is really trying to move away from just learning syntax and being able to predict the next token to really understanding the consequences of actions that it's predicting. And you can really think of this as sort of syntax to much more learning about the semantics of what is actually going on in a simulation as we go through this. Okay, so this would be the part that I would normally go in and actually show you the model running. Unfortunately, Meta doesn't seem to be giving access to me, and I applied for access well over probably 36 hours ago, but I still don't have it. And this is a gated model, so you basically have to fill in your details to actually get access to the weights. I was tempted to take it personally until I saw that it wasn't only me that hasn't gotten access, that quite a number of people by the looks of this have also applied for access and not gotten access in here. So for the time being we can basically take a look at some of the examples that are in the appendex of the actual paper in here. So the model itself is a 32 billion parameter model. It will be interesting to see how this is affected by if you're going to do quantized versions of it etc. To run it without quantization you're probably going to need like an H100 or something like that to be able to run it. But we can see some of the outputs by coming and looking at the examples that are in the paper here. So this is one of the first ones that they show and it's interesting to see here that okay this is using a bash environment and we can see that it actually gets something wrong and then basically realizes that it's gotten that wrong and then it actually backtracks on that to update what it's got here. So it is interesting to see here we've got like the thinking we've got tool calls we've got output and we've got a series of these coming out. So this is not just your regular reasoning chain of thought etc coming out here. The whole idea of backtracking in LMS is not new. It's been around for probably about 18 months now that people have been training models to realize that they output something wrong and to basically go back and fix that up. And you can imagine this is where the agentic traces become really useful for this kind of task. So this is one of the SWE examples in Python showing again doing different kinds of thinking of the agent the tool calls deciding what it's actually going to do at each step. Now what we don't know without testing the actual model out properly is how much has this model been overfitted for this kind of SWEBench task versus for more general programming and code generation tasks. That's something that we're going to need to check out in the future. But overall, just to finish up, this is definitely a really interesting paper. If this idea does seem to work, I think we're going to see this incorporated not only for coding models and mathematics, but for all kinds of specialist agent models, which I was talking about in one of the Quen videos that I did recently where they had like a travel agent with an agentic and model suited to each other. You could imagine this exact same kind of thing being done with these kind of world model trajectories so that it can learn to get much better at that kind of task. And I think what Quen was doing is even experimenting with some of these ideas already as well. All right, I'll leave it there for this video. Let me know in the comments what you think of this idea. It will be interesting to see where this goes. Is this something that's going to take off more or are we going to just see these kind of ideas be folded into the different kinds of RL that people are actually doing already for post training? Anyway, as always, if you like the video, please click like and subscribe and I will talk to you in the next video. Bye for now.
Original Description
In this video I look at some new research out of Meta which is a code world model and is basically an LLM trained in a different way to try and get it to understand the tokens that it's generating. more than a conventional LLM.
Blog: https://ai.meta.com/research/publications/cwm-an-open-weights-llm-for-research-on-code-generation-with-world-models/
HF: https://huggingface.co/facebook/cwm
For more tutorials on using LLMs and building agents, check out my Patreon
Patreon: https://www.patreon.com/SamWitteveen
Twitter: https://x.com/Sam_Witteveen
🕵️ Interested in building LLM Agents? Fill out the form below
Building LLM Agents Form: https://drp.li/dIMes
👨💻Github:
https://github.com/samwit/llm-tutorials
⏱️Time Stamps:
00:00 Intro
02:04 World Models
04:55 Agentic Interactions
06:03 CWM Diagram
07:01 Agentic Loop
09:04 CWM Hugging Face
09:38 Code World Model Paper
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Sam Witteveen · Sam Witteveen · 0 of 60
← Previous
Next →
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
LangChain Basics Tutorial #1 - LLMs & PromptTemplates with Colab
Sam Witteveen
LangChain Basics Tutorial #2 Tools and Chains
Sam Witteveen
ChatGPT API Announcement & Code Walkthrough with LangChain
Sam Witteveen
Trying Out Flan 20B with UL2 - Working in Colab with 8Bit Inference
Sam Witteveen
LangChain - Conversations with Memory (explanation & code walkthrough)
Sam Witteveen
LangChain Chat with Flan20B
Sam Witteveen
LangChain - Using Hugging Face Models locally (code walkthrough)
Sam Witteveen
PAL : Program-aided Language Models with LangChain code
Sam Witteveen
Building a Summarization System with LangChain and GPT-3 - Part 1
Sam Witteveen
Building a Summarization System with LangChain and GPT-3 - Part 2
Sam Witteveen
Microsoft's Visual ChatGPT using LangChain
Sam Witteveen
Building a Summarization System with LangChain - Part 3 Using ChatGPT Turbo
Sam Witteveen
LangChain Agents - Joining Tools and Chains with Decisions
Sam Witteveen
Investigating Alpaca 7B - Finetuned LLaMa LLM
Sam Witteveen
Comparing LLMs with LangChain
Sam Witteveen
Running Alpaca7B in Colab
Sam Witteveen
How to finetune your own Alpaca 7B
Sam Witteveen
How to make a custom dataset like Alpaca7B
Sam Witteveen
Understanding Constitutional AI - the paper and key concepts
Sam Witteveen
Using Constitutional AI in LangChain
Sam Witteveen
Talking to Alpaca with LangChain - Creating an Alpaca Chatbot
Sam Witteveen
Text-to-video-synthesis with Diffusers and Colab
Sam Witteveen
Meet Dolly the new Alpaca model
Sam Witteveen
Checking out the Cerebras-GPT family of models
Sam Witteveen
A Step-by-Step Guide to Fine-Tuning Your Dolly Model (tutorial)
Sam Witteveen
Is GPT4All your new personal ChatGPT?
Sam Witteveen
Raven - RWKV-7B RNN's LLM Strikes Back
Sam Witteveen
Talk to your CSV & Excel with LangChain
Sam Witteveen
Vicuna - 90% of ChatGPT quality by using a new dataset?
Sam Witteveen
Koala Revealed: The ChatGPT Alternative You Need to Know! 🔍
Sam Witteveen
Running Koala for free in Colab. Your own personal ChatGPT? (tutorial)
Sam Witteveen
BabyAGI: Discover the Power of Task-Driven Autonomous Agents!
Sam Witteveen
Auto-GPT - How to Automate a Task Based AI with GPT-4
Sam Witteveen
Improve your BabyAGI with LangChain
Sam Witteveen
Generative Agents - Deep Dive and GPT-4 Recreation
Sam Witteveen
GPT4ALLv2: The Improvements and Drawbacks You Need to Know!
Sam Witteveen
Dolly 2.0 by Databricks: Open for Business but is it Ready to Impress!
Sam Witteveen
Red Pajama - Operation: Freeing LLaMA
Sam Witteveen
Investigating Open Assistant - Models, Datasets and Addons
Sam Witteveen
Investigating MiniGPT-4 - The Secret behind GPT-V?
Sam Witteveen
Stable LM 3B - The new tiny kid on the block.
Sam Witteveen
Bard can now code and put that code in Colab for you.
Sam Witteveen
Checking out Bark: a Text to Speech system by Suno AI
Sam Witteveen
Fine-tuning LLMs with PEFT and LoRA
Sam Witteveen
Master PDF Chat with LangChain - Your essential guide to queries on documents
Sam Witteveen
Using LangChain with DuckDuckGO Wikipedia & PythonREPL Tools
Sam Witteveen
Building Custom Tools and Agents with LangChain (gpt-3.5-turbo)
Sam Witteveen
StableVicuna: The New King of Open ChatGPTs?
Sam Witteveen
WizardLM: Evolving Instruction Datasets to Create a Better Model
Sam Witteveen
LaMini-LM - Mini Models Maxi Data!
Sam Witteveen
Finding the Best Free ChatGPT
Sam Witteveen
MPT-7B - The First Commercially Usable Fully Trained LLaMA Style Model
Sam Witteveen
LangChain Retrieval QA Over Multiple Files with ChromaDB
Sam Witteveen
LangChain Retrieval QA with Instructor Embeddings & ChromaDB for PDFs
Sam Witteveen
LangChain + Retrieval Local LLMs for Retrieval QA - No OpenAI!!!
Sam Witteveen
Transformers Agent - Is this Hugging Face's LangChain Competitor?
Sam Witteveen
StarCoder - The LLM to make you a coding star?
Sam Witteveen
Testing Starcoder for Reasoning with PAL
Sam Witteveen
The New Wizards - Unfiltered & Unaligned
Sam Witteveen
Camel + LangChain for Synthetic Data & Market Research
Sam Witteveen
More on: LLM Foundations
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
I Spent Weeks Looking for a Research Gap Before I Realized I Was Searching the Wrong Way
Medium · AI
ICMI 2026 Reviews [D]
Reddit r/MachineLearning
Workshop submission for main conference paper under review [D]
Reddit r/MachineLearning
Kept context-switching between arxiv, OpenReview, GitHub, and HuggingFace for every paper, so I built this. Chrome extension + website with everything inline, plus citation graph + SPECTER2 neighbors. 3M papers, free, feedback welcome [P]
Reddit r/MachineLearning
Chapters (7)
Intro
2:04
World Models
4:55
Agentic Interactions
6:03
CWM Diagram
7:01
Agentic Loop
9:04
CWM Hugging Face
9:38
Code World Model Paper
🎓
Tutor Explanation
DeepCamp AI