Meta's Code World Model

Sam Witteveen · Beginner ·📄 Research Papers Explained ·9mo ago

Skills: LLM Foundations90%LLM Engineering80%Fine-tuning LLMs70%Prompt Craft60%

Key Takeaways

Meta's Code World Model is a 32 billion parameter model for research on code generation with world models, trained using observation-action trajectories and fine-tuned with reinforcement learning to improve following instructions and solving complex multiple steps. The model has been made available for researchers to try out, but it's not for commercial use yet and is not fully optimized.

Full Transcript

Okay, so we have a new release from Meta and this is not Llama 5 and it's also not anything from their super artificial intelligence lab or anything like that. This is from the researchers at fair and they've basically released a model which is not for commercial use but they've released a 32 billion parameter model and they've also released a paper and it's actually quite interesting. So I want to just go through and have a look at what this actually is. So what they've released is called the code world model. And the whole idea here is about code generation with world models. So you've probably used AI to write code at some point if you're listening to this channel. If you haven't, you really should try it out. It can do wonders as long as you make sure you check that code, etc. And probably some models you've found that okay, you can get really amazing results. But at other times you find that when you look at it deep down there's actually subtle bugs in there or there's things that just don't make sense. And often this is because that the model is trained to basically replicate the syntax of code. It doesn't really understand what it does. Now given enough data and given enough examples, this still can get you really good results, right? the models can really mimic what they're doing, but they don't really understand what they're doing. And this is where this paper comes in. They're trying to basically get it. So the model actually has a sort of understanding and when I say understanding, I'm using inverted commas around that of the sort of cause and effect relationship between each line of code and actually what that does when the code gets run. So this is CWM an open weights LLM for research on code generation with world models. And the whole idea here is that the researchers are basically trying to get a model that can reason about the code not just write the code. So this brings us to the whole sort of concept of what is a world model. So I've been meaning to make a whole video about world models. I feel people misunderstand the concepts behind them and people tend to look at them as just game models for things like the genie models etc. Really the whole idea of a world model is you want something that can understand and learn how the world works just through examples. So even things like VO can exhibit really interesting sort of world moral characteristics where it can learn the properties of water, learn the properties of gravity, etc. Now models like Genie 3 that came out recently from deep mind look amazing because it looks like you're in a game and you can move around. And not only can you move around, but when you come back to where you were, the state is as you've left it. So, they have a really nice example in there of where they're painting the wall and then they turn away and come back and the paint is still in the same places that it was before. This is the whole allure of world models is that we want something that can actually learn representations that are not just at a surface level of being able to replicate something the way it looks, but a sort of understanding of what's actually going on underneath. A nicer example of this would be cooking, right? You can get a model that can memorize lots of different recipes and you could even interpolate between recipes to make up something new, but it probably has no concept of actually what it is doing in those things. It's just remembering that certain combinations go together and by playing around with these combinations, it can reproduce something. The goal of the world model is to have this sort of internal representation of how the system actually works. Now in this case of the coding world models, the goal is to actually build a world model for code where it learns the rules of this sort of computational universe that it's operating in instead of just memorizing the syntax of things that it's seen before. All right. So how do they actually do this? So instead of just training on purely lots and lots of static code, they train on this idea of observation action trajectories. Meaning that if your model takes some sort of action, can it predict what that action would cause to happen? And the way they've done this with the CWM is that if they make a model where it can watch sort of Python programs running line by line and it can observe how variables and memory change at each step. So it's learning to actually manipulate those variables not just remember the syntax of how those things work. These are what we think of the trajectories or the traces that are going on there. They also worked on this idea of agentic interactions. So they created a virtual agent that tried to solve real world software engineering problems like bug fixing etc. And then this codewell world model learns from that agent's successes and failures through a sort of reinforcement learning technique. Now does it work? It seems yes. This is the interesting thing here is that this is not about making the biggest model with pre-training or all of those sorts of things. It's about getting this new way of training to try and create the properties of this world model. So when we look at the benchmarks, this model is actually doing really well on things like SWE, especially when comparing to sort of models of its own size. It's also doing really well on math and reasoning. Again, while it may not be state-of-the-art for each of these benchmarks, for a model of this size to be able to do this, and my guess is that they haven't maxed out the sort of pre-training here, but it's really creating this model which has powerful reasoning abilities in more than just a way of generating long chain of thought out. All right. So, if we look at what they actually did, so for their pre-training, they've only used 8 trillion tokens of generated text and code in here. So, that's a lot less than what we've seen that from the Quen models recently where their big models are doing about 36 trillion. I think even the Quen Next was doing around half that. So, relatively this is pretty small. The mid-training bit here is really interesting. So this is where they're getting the model to learn these world model properties by training on another 5 trillion tokens of these special execution traces and agent data. And finally they fine-tune it with some reinforcement learning to get it even better at following instructions and solving complex multiple steps. So really the key thing here is they're adding this new step of training the model to get it to be able to do much better at these kind of reasoning traces. So the idea here is because the code world model understands code execution, it can simulate what code would do. And this opens up an opportunity for what they're calling this neural debugger where it can basically go through it itself and suggest things that are going to go wrong or be aware of what those variables are doing. Another big application of this is that this should be able to make smarter, more reliable agents. So often current agents are really only trying things until something works. The idea with this CWM model is that it should be able to plan and reason about their actions in a better way than what we've seen before. And this is something that they experiment with in here. And the idea here is that the agents then can reliably fix bugs, add features, do things on the fly rather than just brute forcing trial and error until they get the right call, etc. Now, one of the cool things is that they've made the model weights actually available for researchers and for people to try this out. It's not for commercial use yet, and it really is not fully optimized. You could imagine a version of this that is going to be much better with more pre-training, with more RL guiding some of these agent trajectories, etc. But it's good to see Meta going back to actually releasing some of these things because don't forget originally the first llama model was exactly like this just a researchon model wasn't for commercial use. It was just to see like how well this could actually work. So let's just recap the key things here. This is really trying to move away from just learning syntax and being able to predict the next token to really understanding the consequences of actions that it's predicting. And you can really think of this as sort of syntax to much more learning about the semantics of what is actually going on in a simulation as we go through this. Okay, so this would be the part that I would normally go in and actually show you the model running. Unfortunately, Meta doesn't seem to be giving access to me, and I applied for access well over probably 36 hours ago, but I still don't have it. And this is a gated model, so you basically have to fill in your details to actually get access to the weights. I was tempted to take it personally until I saw that it wasn't only me that hasn't gotten access, that quite a number of people by the looks of this have also applied for access and not gotten access in here. So for the time being we can basically take a look at some of the examples that are in the appendex of the actual paper in here. So the model itself is a 32 billion parameter model. It will be interesting to see how this is affected by if you're going to do quantized versions of it etc. To run it without quantization you're probably going to need like an H100 or something like that to be able to run it. But we can see some of the outputs by coming and looking at the examples that are in the paper here. So this is one of the first ones that they show and it's interesting to see here that okay this is using a bash environment and we can see that it actually gets something wrong and then basically realizes that it's gotten that wrong and then it actually backtracks on that to update what it's got here. So it is interesting to see here we've got like the thinking we've got tool calls we've got output and we've got a series of these coming out. So this is not just your regular reasoning chain of thought etc coming out here. The whole idea of backtracking in LMS is not new. It's been around for probably about 18 months now that people have been training models to realize that they output something wrong and to basically go back and fix that up. And you can imagine this is where the agentic traces become really useful for this kind of task. So this is one of the SWE examples in Python showing again doing different kinds of thinking of the agent the tool calls deciding what it's actually going to do at each step. Now what we don't know without testing the actual model out properly is how much has this model been overfitted for this kind of SWEBench task versus for more general programming and code generation tasks. That's something that we're going to need to check out in the future. But overall, just to finish up, this is definitely a really interesting paper. If this idea does seem to work, I think we're going to see this incorporated not only for coding models and mathematics, but for all kinds of specialist agent models, which I was talking about in one of the Quen videos that I did recently where they had like a travel agent with an agentic and model suited to each other. You could imagine this exact same kind of thing being done with these kind of world model trajectories so that it can learn to get much better at that kind of task. And I think what Quen was doing is even experimenting with some of these ideas already as well. All right, I'll leave it there for this video. Let me know in the comments what you think of this idea. It will be interesting to see where this goes. Is this something that's going to take off more or are we going to just see these kind of ideas be folded into the different kinds of RL that people are actually doing already for post training? Anyway, as always, if you like the video, please click like and subscribe and I will talk to you in the next video. Bye for now.

Original Description

In this video I look at some new research out of Meta which is a code world model and is basically an LLM trained in a different way to try and get it to understand the tokens that it's generating. more than a conventional LLM. Blog: https://ai.meta.com/research/publications/cwm-an-open-weights-llm-for-research-on-code-generation-with-world-models/ HF: https://huggingface.co/facebook/cwm For more tutorials on using LLMs and building agents, check out my Patreon Patreon: https://www.patreon.com/SamWitteveen Twitter: https://x.com/Sam_Witteveen 🕵️ Interested in building LLM Agents? Fill out the form below Building LLM Agents Form: https://drp.li/dIMes 👨‍💻Github: https://github.com/samwit/llm-tutorials ⏱️Time Stamps: 00:00 Intro 02:04 World Models 04:55 Agentic Interactions 06:03 CWM Diagram 07:01 Agentic Loop 09:04 CWM Hugging Face 09:38 Code World Model Paper

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Sam Witteveen · Sam Witteveen · 0 of 60

← Previous Next →

LangChain Basics Tutorial #1 - LLMs & PromptTemplates with Colab

LangChain Basics Tutorial #1 - LLMs & PromptTemplates with Colab

LangChain Basics Tutorial #2 Tools and Chains

LangChain Basics Tutorial #2 Tools and Chains

ChatGPT API Announcement & Code Walkthrough with LangChain

ChatGPT API Announcement & Code Walkthrough with LangChain

Trying Out Flan 20B with UL2 - Working in Colab with 8Bit Inference

Trying Out Flan 20B with UL2 - Working in Colab with 8Bit Inference

LangChain - Conversations with Memory (explanation & code walkthrough)

LangChain - Conversations with Memory (explanation & code walkthrough)

LangChain Chat with Flan20B

LangChain Chat with Flan20B

LangChain - Using Hugging Face Models locally (code walkthrough)

LangChain - Using Hugging Face Models locally (code walkthrough)

PAL : Program-aided Language Models with LangChain code

PAL : Program-aided Language Models with LangChain code

Building a Summarization System with LangChain and GPT-3 - Part 1

Building a Summarization System with LangChain and GPT-3 - Part 1

Building a Summarization System with LangChain and GPT-3 - Part 2

Building a Summarization System with LangChain and GPT-3 - Part 2

Microsoft's Visual ChatGPT using LangChain

Microsoft's Visual ChatGPT using LangChain

Building a Summarization System with LangChain - Part 3 Using ChatGPT Turbo

Building a Summarization System with LangChain - Part 3 Using ChatGPT Turbo

LangChain Agents - Joining Tools and Chains with Decisions

LangChain Agents - Joining Tools and Chains with Decisions

Investigating Alpaca 7B - Finetuned LLaMa LLM

Investigating Alpaca 7B - Finetuned LLaMa LLM

Comparing LLMs with LangChain

Comparing LLMs with LangChain

Running Alpaca7B in Colab

Running Alpaca7B in Colab

How to finetune your own Alpaca 7B

How to finetune your own Alpaca 7B

How to make a custom dataset like Alpaca7B

How to make a custom dataset like Alpaca7B

Understanding Constitutional AI - the paper and key concepts

Understanding Constitutional AI - the paper and key concepts

Using Constitutional AI in LangChain

Using Constitutional AI in LangChain

Talking to Alpaca with LangChain - Creating an Alpaca Chatbot

Talking to Alpaca with LangChain - Creating an Alpaca Chatbot

Text-to-video-synthesis with Diffusers and Colab

Text-to-video-synthesis with Diffusers and Colab

Meet Dolly the new Alpaca model

Meet Dolly the new Alpaca model

Checking out the Cerebras-GPT family of models

Checking out the Cerebras-GPT family of models

A Step-by-Step Guide to Fine-Tuning Your Dolly Model (tutorial)

A Step-by-Step Guide to Fine-Tuning Your Dolly Model (tutorial)

Is GPT4All your new personal ChatGPT?

Is GPT4All your new personal ChatGPT?

Raven - RWKV-7B RNN's LLM Strikes Back

Raven - RWKV-7B RNN's LLM Strikes Back

Talk to your CSV & Excel with LangChain

Talk to your CSV & Excel with LangChain

Vicuna - 90% of ChatGPT quality by using a new dataset?

Vicuna - 90% of ChatGPT quality by using a new dataset?

Koala Revealed: The ChatGPT Alternative You Need to Know! 🔍

Koala Revealed: The ChatGPT Alternative You Need to Know! 🔍

Running Koala for free in Colab. Your own personal ChatGPT? (tutorial)

Running Koala for free in Colab. Your own personal ChatGPT? (tutorial)

BabyAGI: Discover the Power of Task-Driven Autonomous Agents!

BabyAGI: Discover the Power of Task-Driven Autonomous Agents!

Auto-GPT - How to Automate a Task Based AI with GPT-4

Auto-GPT - How to Automate a Task Based AI with GPT-4

Improve your BabyAGI with LangChain

Improve your BabyAGI with LangChain

Generative Agents - Deep Dive and GPT-4 Recreation

Generative Agents - Deep Dive and GPT-4 Recreation

GPT4ALLv2: The Improvements and Drawbacks You Need to Know!

GPT4ALLv2: The Improvements and Drawbacks You Need to Know!

Dolly 2.0 by Databricks: Open for Business but is it Ready to Impress!

Dolly 2.0 by Databricks: Open for Business but is it Ready to Impress!

Red Pajama - Operation: Freeing LLaMA

Red Pajama - Operation: Freeing LLaMA

Investigating Open Assistant - Models, Datasets and Addons

Investigating Open Assistant - Models, Datasets and Addons

Investigating MiniGPT-4 - The Secret behind GPT-V?

Investigating MiniGPT-4 - The Secret behind GPT-V?

Stable LM 3B - The new tiny kid on the block.

Stable LM 3B - The new tiny kid on the block.

Bard can now code and put that code in Colab for you.

Bard can now code and put that code in Colab for you.

Checking out Bark: a Text to Speech system by Suno AI

Checking out Bark: a Text to Speech system by Suno AI

Fine-tuning LLMs with PEFT and LoRA

Fine-tuning LLMs with PEFT and LoRA

Master PDF Chat with LangChain - Your essential guide to queries on documents

Master PDF Chat with LangChain - Your essential guide to queries on documents

Using LangChain with DuckDuckGO Wikipedia & PythonREPL Tools

Using LangChain with DuckDuckGO Wikipedia & PythonREPL Tools

Building Custom Tools and Agents with LangChain (gpt-3.5-turbo)

Building Custom Tools and Agents with LangChain (gpt-3.5-turbo)

StableVicuna: The New King of Open ChatGPTs?

StableVicuna: The New King of Open ChatGPTs?

WizardLM: Evolving Instruction Datasets to Create a Better Model

WizardLM: Evolving Instruction Datasets to Create a Better Model

LaMini-LM - Mini Models Maxi Data!

LaMini-LM - Mini Models Maxi Data!

Finding the Best Free ChatGPT

Finding the Best Free ChatGPT

MPT-7B - The First Commercially Usable Fully Trained LLaMA Style Model

MPT-7B - The First Commercially Usable Fully Trained LLaMA Style Model

LangChain Retrieval QA Over Multiple Files with ChromaDB

LangChain Retrieval QA Over Multiple Files with ChromaDB

LangChain Retrieval QA with Instructor Embeddings & ChromaDB for PDFs

LangChain Retrieval QA with Instructor Embeddings & ChromaDB for PDFs

LangChain + Retrieval Local LLMs for Retrieval QA - No OpenAI!!!

LangChain + Retrieval Local LLMs for Retrieval QA - No OpenAI!!!

Transformers Agent - Is this Hugging Face's LangChain Competitor?

Transformers Agent - Is this Hugging Face's LangChain Competitor?

StarCoder - The LLM to make you a coding star?

StarCoder - The LLM to make you a coding star?

Testing Starcoder for Reasoning with PAL

Testing Starcoder for Reasoning with PAL

The New Wizards - Unfiltered & Unaligned

The New Wizards - Unfiltered & Unaligned

Camel + LangChain for Synthetic Data & Market Research

Camel + LangChain for Synthetic Data & Market Research

Meta's Code World Model is a novel approach to code generation using world models and reinforcement learning. The model has shown promising results in planning and reasoning about its actions, and has been made available for researchers to try out. However, it's not yet fully optimized and is not for commercial use.

Key Takeaways

Train on observation-action trajectories
Pre-train on 8 trillion tokens of generated text and code
Train on 5 trillion tokens of execution traces and agent data
Fine-tune with reinforcement learning
Use backtracking to correct outputs
Apply to specialist agent models

💡 The use of world models and reinforcement learning enables the Code World Model to plan and reason about its actions in a more effective way than previous models.

🔒 Pro feature: Ask AI to explain this lesson →

More on: LLM Foundations

View skill →

Getting Started with Vertex AI Gemini 1.5 Flash

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

How to use the ChatGPT API with Python!!

How to use the ChatGPT API with Python!!

Nicholas Renotte

Gemini 2.5: Create an interactive plot of economic data

Gemini 2.5: Create an interactive plot of economic data

Google DeepMind

LangChain Chatbots: Building a Personalized AI Assistant

LangChain Chatbots: Building a Personalized AI Assistant

Analytics Vidhya

Auto-generating meeting notes with Python

Auto-generating meeting notes with Python

Related AI Lessons

I Spent Weeks Looking for a Research Gap Before I Realized I Was Searching the Wrong Way

Learn how to effectively find research gaps by changing your approach, a crucial skill for AI researchers and academics

ICMI 2026 Reviews [D]

Learn how to interpret ICMI 2026 reviews and improve your paper's acceptance chances

Reddit r/MachineLearning

Workshop submission for main conference paper under review [D]

Learn how to navigate submitting a paper to a non-archival workshop before the final decision of a main conference like ECCV

Reddit r/MachineLearning

Kept context-switching between arxiv, OpenReview, GitHub, and HuggingFace for every paper, so I built this. Chrome extension + website with everything inline, plus citation graph + SPECTER2 neighbors. 3M papers, free, feedback welcome [P]

Streamline your research with a new Chrome extension and website that integrates 3M papers from arxiv, OpenReview, GitHub, and HuggingFace, including citation graphs and SPECTER2 neighbors, and provide feedback to improve it

Reddit r/MachineLearning

Chapters (7)

Intro

2:04 World Models

4:55 Agentic Interactions

6:03 CWM Diagram

7:01 Agentic Loop

9:04 CWM Hugging Face

9:38 Code World Model Paper

Beyond Big Vendors: ERP Systems Explained #shorts

Digital Transformation with Eric Kimberling