GPT5 unlocks LLM System 2 Thinking?
Key Takeaways
The video discusses how GPT-5 can unlock LLM System 2 Thinking, enabling large language models to tackle bigger problems by breaking down complex tasks into smaller steps and exploring different options. It covers various techniques such as fine-tuning, retrieval augmented generation, and communicative agent collaboration to achieve system 2 level thinking.
Full Transcript
for most of us sync D playay is not that straightforward and simple for example in a video from very tassu he asked some seemly easy and straightforward question to a group of college students which turn out to be not that easy I asked these guys how long does it take for the earth to go around the Sun what do you reckon C is it 24 hours obviously a yes or take this problem which has been given to thousands of college students you go into a toy store and there's a toy bat and a toy ball together they cost A110 and the bat costs a dollar more than the ball how much does the ball cost we're all wrong aren't we what we this looks like easy question but when you slow down and sink a bit more you'll realize that there's no way the bull should cost 10 cents cuz otherwise the total cost will be $1.2 instead of 1.1 instead you actually need to do some calculation and realize the real answer is 5 cents but the all got answer wrong at the beginning because they all saw that this is a simple question and just to give it automatic intuitive answer and this concept that human has two modes of thinking has been introduced and popularized by the book think fast and slow by Daniel Conan the idea is that your brain can function in kind of two different modes the system one thinking is your fast intuitive brain for example if I ask you what's 1 plus one you'll just tell me it's two you don't really think about it because it's already cach memorized it's part of intuition but if I give you a more complicated question like what is 129 multiply by 3.56 you don't have the answer ready you actually need to take time do some calculations syn that through and give me the answer and this is SE two the other modes of bring where it is slower but much more rational give more accurate answer both system one system two play critical role in our life and decision making but the problems often occur when we try to solve a complex system two problems with system one sinking which is exactly what we observed in the video clips before and that's also where we are at with large LGE model even though it is already impressive it didn't really have any system two slow thinking all it does is just try to predict what the best next words are based on the sequence of words that I already have it didn't have any default ability to break down complex tasing into small steps and explore all different options and in a video from Andre capsi he had a really good analogy that the way large ler model currently work is almost like you are only running train also building the trail in front of you at the same time so to large lerge model this literally know any difference between answering what's 1 plus one versus complex mathematic formula and this also why why fine tuning works because if all large language model has a system one intuitive thinking then fine tuning is basic training AI to get a much better intuition on a given subject it's the same thing for human if you practice something for more than 10,000 hours then you can actually solve a lot of system two level problem with just system one level intuition but how does human actually do system two level thinking when faced with complex question we'll break it down to subset of problems think through each of problem and explore different methods and obviously this take time but in exchange it get higher quality and accuracy and secondly we also know when should we make this trade-off so each person's braing is almost like a adaptive system that can switch between system one and system two effectively and what does this means for a large Lage model is it should be able to take time break it down into subset of questions and explore different options and that behavior also need to happen adaptively we wouldn't want large L model to have that behavior for every single simple request and this two seem to be the focus area for GPT 5 development as well which has been mentioned by S during an interview with Bill Gates here is a quick video clip when you look at the next two years what what do you think some of the the key Milestones will be maybe the most important areas of progress will be around reasoning ability right now gp4 can reason and only extremely limited ways and also reliability you know if you if you ask gp4 most questions 10,000 times one of those 10,000 is probably pretty good but it doesn't always know which one and you'd like to get the best response of 10,000 each time so that'll be that that that increase in reliability will be important I'll be interested if it if you ever get to the point where you know like solving a complex math equation where you might have to you know apply Transformations an arbitrary number of times that the control Logic for the reasoning may have to be quite a bit more complex than just what we do today at a minimum it seems like we need some sort of adaptive compute right now we spend you know the same amount of compute on each token a dumb one or like figuring out some complicated mathod yeah when we say do the reman hypothesis that deserves a lot of compute the same compute as saying the right so so at a minimum we've got to get that to work we may need much more sophisticated things Beyond it so obviously we should see some very big and exciting updates when gbt 5 come out especially for those reasoning and system two level thinking apart from better model itself what are the ways that we can do today to enforce larger knowledge model have system two level syncing so that you can use it to complete complex tasks or solve big problems there are two common ways either through prompt engineer or communicative agents firstly some prompt engineer strategy one of the most simple and common ways to do that is chain of s many of you probably pretty familiar with this method it basically means that before the larg language model going to generate anything you will insert a sentence called Lessing step by step this has been effective way the force l l model to break down the problem into small steps and syn through those steps and what's amazing is that this method is so simple and generic that can be used in many different areas on the other hand you can even try some few short prompt examples so instead of saying Lessing step by step you actually give it example about what the step should be and how you should think about those problems this was effective because forc large lar model to sinks through a few different steps before it gets to the answer very similar to how our human brain functions but the downside is also pretty clear because this chain of s prompting only get large larage model to consider one possibility but as human when we try to do creative problem solving it is very common that we will Explore More Than Just One path or one solution the Journey of problem solving often involve exploring multiple different options played out and keep track about all the learnings and new knowledge acquired during the exploration and this is something that Chain of Thought is not capable to do that's why people also for more advanced prompting tactics like self-consistency with chain of sauce sh for ctsc the way it works is it get large langage model to run chain of salt multiple times and in the end review and vote on answers that are most reasonable it does require you to implement on code to iteratively run the chain of s process multiple times and the benefit of this is it does explore a few different option and pass before it land on the final answer but downside is also pretty clear it does cause more token when it's necessary and quite often large language model very likely explore very similar ways of solving the problem instead of explore real diverse options that's why people proposed another option called tree of sorts and this probably one of the most advanced pring tactics to achieve Sy two level thinking the main Innovation here is that it gets large large model to come up with a few different ways that the problem can be solved and explore all the different branch and option that seems promising and also keep a state about all the path that it has explored so far so that if the pass it is on didn't really leads to the outcome they want they can treas B and find the second best Solutions the main problem of trof s is that implementation is quite complicated you need to make multiple C to large L model and also save the result somewhere to keep a state of tree so that it can retrieve back and this is really cool because it significantly increased amount of options that a large lar model explore they tried to simulate a very similar concept about for go is doing the tree search to explore all the different op options though at much smaller scale so what would be really interesting is if someone actually Implement an effective search ability for large langage model so that it can explore tons of different options without burning through a lot of unnecessary tokens and that's kind of the limitation of tree of Saw mechanism and moment because this exploration and search process is not effective and also does require huge amount of implementation up front so it's not a trivial task to implement Tre of sorts for a specific scenario this is where I think the second option communicative agents provide an elegant solution to enforce those type of system two level think communicative agents are basically multi- agent setup where users can easily Define two different agents and simulate a conversation between them so that it can reflect and spot the flaw in each other's perspective and thinking process it was initially introduced by a project called Camu communicative agents for mind exploration of large scale language model Society where this showe some complex example that large language model can completed task like develop a trading bot for the stock market by simulated the conversation between two agents python programmer and stock Trader and this really works because of a couple reasons one is that large larage model is much better at judging whether an answer is right or wrong rather than generate the answer so buy dedicated agents specifically for reviewing and critique it can actually does a pretty good job in terms of identify the flaw of thinking another reason I think this communicative agent is great short term solution is because how easy it is to set up as there are many different ways you can get agents to work together for example if you're doing content generation you might have very sequential linear flow to just get a manager do the planning and researcher to do the research and hand over the results to content creator or it can be just a join chat where you can put a problem solver as well as a critique in the same room so that they can continue the conversation between each other until the criti thinks the problem has been solved properly and you you can even mix match them together into some kind of hierarchal collaboration modes by dedicating different tasks to different teams so it has a lot of plal and pretty easy to set up for the past 6 months there are huge amount of different multi-agent Frameworks has show up initially there are framework like Chad da and metag gbt they're pretty good for sequential order collaboration but a bit hard to set up and the best framework so far in my opinion is still autogen where it is super flexible and probably the only framework that allow you to set up those joint chat or h ra code chat with just a few lines of code and there's also a new one called crew AI it is really easy to set up sequential order but the framework is not that flexible yet for other type of collaborations at this point and auto Jun recently just released their no code interface called autogen Studio which really lower the efforts to set up those communicative agent collaboration flow to solve complex problems so I'm going to give you a quick example about how can you set up a group of agents to resolve some complex problem that even GPT four is failing today so let's get it I'm going to implement a quick communicative agent setup with two different agents one is a problem solver who is actually going to solve the problem and another is a reviewer who will review the results identify any flaw in the answer provided by Problem Solver and we will set up this communicative agents in autogen studio to install auto Studio you can just open Terminal do pip install auto Studio this will install the whole Auto package as well as the front end and once you did that the next thing is you want to set up open AI API key you can get open AI API Key by visiting platform. open.com SL API Keys let's click on create a new secret key and give a name Auto studio and come back paste in your API key here after you set this up you can just run auto J Studio UI d-port a81 once it succeed you should see a message like this all you need to do just copy this link into your browser and you will see an interface like this it has a couple sections skills is basically list of functions that agents can have access to like Google search we can create custom functions for example for the agent to call open AI e API key to generate imag it is based like functions for function calling and you can create any type of new skills by click on this new skill button and just putting the function codes here on the other hand you can also create actual agents by clicking on this add a new agent button you can give a name description as well as a max round of auto reply this basically means how many times agent can autonomously reply back until it stop because we're actually going to simulate the conversation between two different agents you want to set up a maximum here to prevent it from having the conversation infinitely and you can also give a system message here it's basically like instruction to tell agent who they are and what they should be doing also add any skills that you have defined in the skill list and in the end you can create a workflow so workflow is basically a predefined workflow of how agents should work together one workflow can be simply a group chat between two or three different agents to reveal each other's work and resolve things but it can also be sequential so I ask GPT to generate task that can be used to test system two level thinking so the task is there are four animals a line a zebra a gir and elephant they are located in four different house with different colors red blue green yellow and the task is to determine which animal is in which color house BAS on the following crew so the line is either in the first or the last house and the green house is immediately to the right of the Red House the zebra is in the third house the green house is next to the blue house and the Elephant is in the red house so this is actually pretty complicated problem for me it is not even clear how should I get started it is not that easy and next is I Tred to test whether GPT 4 can actually get it with chain of s mechanism so I can see that gp4 actually triy to syn step by step and review the crew one by one to come up with an example like this to make it easier for us to review I put all the crews on the left so for the answer that gb4 generated the first one the line is either the first or last house which is correct and green house should be immediately to the right of the red house okay so the second crew didn't seem to be correct green house is located in the end which is not immediately to the right of red house and also want to give the try to see whether I can ask GPT 4 to just reflect and figure out whether it can resolve the issue itself so I tell it reflect on the answer and see if you can do it better okay so what happened now is you can see that gbd4 goes through another thinking process but the weird thing is said that this revised approach reaffirmed the initial solution but present the reasoning in a slightly more structured manner so it's this is actually surprising that it didn't even spot issue that the greenhouse is not immediately to the right of red house I think this showcase that it is actually pretty hard for lar Dage model to self-reflect and improved answer for this complicated case and this is where the multi-agent system is useful so let's go back to the autogen studio the first thing we want to do is create two different agents one for reviewer and one for Problem Solver so for reviewer I will give a name as well a system message you are the reviewer and critic your goal is to review the answer and work delivered from Problem Solver decide if the answer is correct if not what are the floss and give feedback back to the problem solver and remember you only spot issue do not give Solutions and rest I'm going to keep the same and second is Problem Solver and our give a system message you are a helpful assistant you'll be given task you will try to solve the task and hand over to reviewer to review if the reviewer give you any feedback it three it unanswered never say terminate and I add this line so that problem solver cannot terminate task it has to be approved by the reviewer and after we create this two agents we can go to workflow create a group chat and this one problem with auto Studio at the moment is that it do seem like I can create a group chat from scratch however there should be one group chat workflow already and click on group chat manager and here you can swap in and swap out different agents and in my case I just add Problem Solver and reviewer and I will also give it a system problem let Problem Solver solve the task and let reviewer review the results and send back feedback to the problem solver to repeat this process above until the reviewer confirm and say terminate and I'm going to click okay we have set up this group chat successfully we can go back to playground click add new button and select the workflow that we just created called a group Problem Solver to create I can paste in the exact same task I gave before okay we got response back we can see the detailed chat history at the beginning the user proxy agent give the task to problem server where the problem server try to sync step by step and generate the initial answer and let's compare it with the actual criterial so the line is on the first or last which is correct the green should be immediately to the right of the red house which is not correct here so initial answer is incorrect but the good thing is that reviewer actually reviewed the results and point out that there are few flaws in the deduction process and also point out that final Arrangement didn't really take into account for the clue that the greenhouse is next to the blue house and then the problem solver take this feedback try it again and this time it come up with a new answer let's compare again so the line is either first or last which is correct the greenhouse should be immediately to the right of the red house which is also correct the zebra is on the third house the green house should be next to the blue house elephant is in the red house this is a correct answer so this a quick example of how you can set up two communicative agents to let them collaborate and solve complex problem you can take a similar structure with those type of feedback loop for solving other problems as well so those are examples of some tactics you can use today to drive system two syncing with large l model honestly I'm really Keen to see a world where large L model actually have adaptive system to solve really complex problems in a native way comment below if you know any other research and methods that can effectively drive system to syncing or continue posting interesting purchase and advancement in AI so please consider give me subscribe if you enjoy this content thank you and I see you next time
Original Description
Human think fast & slow, but how about LLM? How would GPT5 resolve this?
101 guide on how to unlock your LLM system 2 thinking to tackle bigger problems
🔗 Links
- Join my community: https://www.skool.com/ai-builder-club/about
- Follow me on twitter: https://twitter.com/jasonzhou1993
- Join my AI email list: https://crafters.ai/
- My discord: https://discord.gg/eZXprSaCDE
⏱️ Timestamps
0:00 Intro
1:00 System 1 VS System 2
2:48 How does human do system 2 thinking
3:33 GPT5 system 2 thinking
4:47 Tactics to enforce System 2 thinking
5:08 Prompt strategy
8:27 Communicative agents
11:03 Example to setup communicative agents
👋🏻 About Me
My name is Jason Zhou, a product designer who shares interesting AI experiments & products. Email me if you need help building AI apps! ask@ai-jason.com
#gpt5 #autogen #gpt4 #autogpt #ai #artificialintelligence #tutorial #stepbystep #openai #llm #chatgpt #largelanguagemodels #largelanguagemodel #bestaiagent #chatgpt #agentgpt #agent #babyagi
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from AI Jason · AI Jason · 31 of 60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
▶
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Build Your Own Auto-GPT Apps without coding Step by Step (Dust.tt Tutorial)
AI Jason
AutoGPT tutorial: Build your personal assistant WITHOUT code (Via Relevance AI)
AI Jason
Create your own AI girlfriend that talks ❤️
AI Jason
How to build with Langchain 10x easier | ⛓️ LangFlow & Flowise
AI Jason
I build an autonomous researcher via GPT | Langchain ⛓️ Tutorial
AI Jason
Smol AI tutorial in 5 mins | Build ENTIRE codebase with a single prompt
AI Jason
Hugging Face + Langchain in 5 mins | Access 200k+ FREE AI models for your AI apps
AI Jason
How to let GPT control anything & 10x powerful | 8 mins tutorial about GPT funtion calling
AI Jason
Extract data & automate EVERYTHING | 10x GPT function calling power
AI Jason
Finally, an AI agent that actually works
AI Jason
"okay, but I want GPT to perform 10x for my specific use case" - Here is how
AI Jason
"Wait..this AI Agent does research for you 24hrs without hallucination?!" - Here is how
AI Jason
"How to give GPT my business knowledge?" - Knowledge embedding 101
AI Jason
“Automation 2.0 coming…No more boring data entry job”
AI Jason
"How to 10x chatbot UX? 🤖 🖼️ " - Add Image Responses to GPT knowledge retrieval apps
AI Jason
“LLAMA2 supercharged with vision & hearing?!” | Multimodal 101 tutorial
AI Jason
"Next Level Prompts?" - 10 mins into advanced prompting
AI Jason
Build AI agent workforce - Multi agent framework with MetaGPT & chatDev
AI Jason
How to scale your AI automation pipeline
AI Jason
AI agent manages community 24/7 - Build Agent workforce ep#1
AI Jason
Autogen - Microsoft's best AI Agent framework that is controllable?
AI Jason
StreamingLLM - Extend Llama2 to 4 million token & 22x faster inference?
AI Jason
AI agent + Vision = Incredible
AI Jason
After 7 days letting AI agents control my email inbox... 📮
AI Jason
How to use New OpenAI DevDay features - GPT4V x TTS demo tutorial
AI Jason
What is Q* | Reinforcement learning 101 & Hypothesis
AI Jason
"Research agent 3.0 - Build a group of AI researchers" - Here is how
AI Jason
GPT4V + Puppeteer = AI agent browse web like human? 🤖
AI Jason
Real Gemini demo? Rebuild with GPT4V + Whisper + TTS
AI Jason
AI Robot's ChatGPT moment at 2024?
AI Jason
GPT5 unlocks LLM System 2 Thinking?
AI Jason
The REAL cost of LLM (And How to reduce 78%+ of Cost)
AI Jason
OpenAI's Agent 2.0: Excited or Scared?
AI Jason
Real time AI Conversation Co-pilot on your phone, Crazy or Creepy?
AI Jason
INSANELY Fast AI Cold Call Agent- built w/ Groq
AI Jason
AI Employees Outperform Human Employees?! Build a real Sales Agent
AI Jason
Future of E-commerce?! Virtual clothing try-on agent
AI Jason
Unlock AI Agent real power?! Long term memory & Self improving
AI Jason
"I want Llama3 to perform 10x with my private knowledge" - Local Agentic RAG w/ llama3
AI Jason
“Wait, this Agent can Scrape ANYTHING?!” - Build universal web scraping agent
AI Jason
"Make Agent 10x cheaper, faster & better?" - LLM System Evaluation 101
AI Jason
Claude 3.5 struggle too?! The $Million dollar challenge
AI Jason
Make your agents 10x more reliable? Flow engineer 101
AI Jason
"I want Llama3.1 to perform 10x with my private knowledge" - Self learning Local Llama3.1 405B
AI Jason
AI process thousands of videos?! - SAM2 deep dive 101
AI Jason
"Wait, I'm using OpenAI Structured Output wrong ?!" - Advanced Structured Output tutorial
AI Jason
How to use Cursor AI build & deploy production app in 20 mins
AI Jason
Best Cursor Workflow that no one talks about...
AI Jason
This is how I scrape 99% websites via LLM
AI Jason
Better than Cursor? Future Agentic Coding available today
AI Jason
EASIEST Way to Train LLM Train w/ unsloth (2x faster with 70% less GPU memory required)
AI Jason
1000x Cursor workflow for building apps
AI Jason
Easiest way to build fancy UI with Cursor/Windsurf/Bolt/Lovable
AI Jason
From $0 to $4m with just 2 people (ComfyUI Crash-course for E-commerce)
AI Jason
Deepseek R1 - The Era of Reasoning models
AI Jason
Yep, o3-mini is WORTH the money - Build your own reasoning agent
AI Jason
The ONLY way to run your own Deepseek on mobile...
AI Jason
Those MCP totally 10x my Cursor workflow…
AI Jason
MCP = Next Big Opportunity? EASIST way to build your own MCP business
AI Jason
Gemini 2.0 blew me away - The future of Multimodal Model
AI Jason
More on: LLM Foundations
View skill →Related Reads
Chapters (8)
Intro
1:00
System 1 VS System 2
2:48
How does human do system 2 thinking
3:33
GPT5 system 2 thinking
4:47
Tactics to enforce System 2 thinking
5:08
Prompt strategy
8:27
Communicative agents
11:03
Example to setup communicative agents
🎓
Tutor Explanation
DeepCamp AI