Tree of Thoughts - GPT-4 Reasoning is Improved 900%

Wes Roth · Advanced ·🧠 Large Language Models ·3y ago

Skills: LLM Foundations90%Prompt Craft80%Advanced Prompting80%LLM Engineering70%

Key Takeaways

The video discusses the 'Tree of Thoughts' approach, a framework that improves GPT-4's problem-solving abilities by 900%, and its applications in various tasks such as mathematical reasoning, creative writing, and crossword tasks. The approach is compared to other methods like Chain of Thought prompting and input-output methods, and its advantages and limitations are discussed.

Full Transcript

so a new scientific paper is released called tree of thoughts deliberate problem solving with large language models is by researchers at Princeton University and Google the deepmind now you may have heard that saying we only use 10 of our brain now whether or not that's true this paper seems to show that it is true for our current AI models this new approach to Chad gpt's ability to solve complex problems from four percent to 74 that's just from a new way of prompting it called tree of thoughts or tot as it's referenced in a paper the paper even comes with a warning tot is a framework that empowers LMS to more autonomously and intelligently make decisions and solve problems while current tasks are limited to reasoning and search problems future applications involving interaction with external environments or humans could bring potential danger EG facilitating harmful uses of Limbs in a nutshell this approach seems to be a hack to overclock these alarms to produce more intelligence but let's get to the chase what is this tree of thoughts these are different prompting approaches when you ask Chad GPT a question and it answers that's called input output prompting IO now a lot of researchers and individual users have pointed out that you can get better results from LMS by using Chain of Thought prompting an example this might be thinking of three topics that you want to speak about then think through where each of those topics might lead then select the best one and then produce the actual words that you're gonna say so same input but then you're thinking about the possible topics what kind of conversations those might produce then selecting the best one and then the actual output this is the thinking before you say it part yet another way to produce even better results is something called self-consistency with cot with Chain of Thought So self-consistency is generating multiple results for each query and then seeing which results seem to appear the most AKA which ones are the most consistent combining self-consistency with Chain of Thought can lead to even better results than all the previous prompting methods but what is even better than that well Trio thoughts has the potential to be many times more powerful than the previous prompting methods this approach works really well for LMS for more complex reasoning tasks here's the abstract from the paper language models are increasingly being deployed for General problem solving across a wide range of tasks but are still confined to token level left to right decision making processes during inference this means they can fall short in tasks that require exploration strategic look ahead or where initial decisions play a pivotal role to surmount these challenges we introduce tree of thoughts which generalizes over the popular chain of thoughts approach this allows the LM models self-evaluating choices to decide the next course of action as well as looking ahead or backtracking when necessary to make Global choices and this significantly improves language models problem-solving abilities on three tests that these researchers will test them on as you can see here in one of the tests gpt4 with Chain of Thought prompting only solved four percent of the tasks so China thought that's that's the next improved one that's one step better than base level while their method achieved a success rate of 74 that's a 10x Improvement in China thought think of it as two things so it's basically the breadth it's so how many different scenarios it starts up with and then the depth which is how far down each scenario it thinks the key here is that they can look forward and backtrack and save information in various thoughts that can be used elsewhere so how did they come up with this tree of thoughts so they're referencing the work of Newell and Simon from 1972. they were Pioneers in the field of AI they studied how humans solve problems proposing that our brains work like computers we take information process it and output Solutions research on how humans solve problems indicates that people look through a large number of possible solutions you can think of these Solutions like a tree where each point or node is a halfway solution and the connections or branches are actions that can change these halfway Solutions people decide which action to take based on guidelines or heuristics that helped them navigate through all these possible solutions and guide them towards finding an answer there are two shortcomings that are current prompting model has one locally they do not explore different continuations within a thought process aka the branches of the tree and two globally they do not incorporate any type of planning look ahead or backtracking to help evaluate these different options to address these shortcomings we introduced the tree of thoughts in a paradigm that allows a lamps to explore multiple reasoning paths over thoughts now this paper goes pretty deep into the math and specifics on how they did this I won't go to all that here but I'll link the study in the show notes so you can peruse it at your leisure but let's skip to the results so the first thing they did was they made it play a game called 24. game of 24. it's somewhat similar to Sudoku so game of 24 is a mathematical reasoning challenge where the goal is to use four numbers and basic arithmetic operations to obtain 24 so how they set it up is they took 1300 over 1300 games and they sorted it from easy to hard by human solving time so they start with the standard input output IO and the prompted with five in context examples they also did the Chain of Thought prompting and through each iteration the language model is conditioned on all the previous history to reflect on your mistakes and generate a refined answer if the output isn't correct note that it uses ground truth feedback signals about equation correctness so ground truth is basically the real answers the answers that we know to be correct for the AI to compare its results against so in the tree of thoughts approach the chai GPT is asked to think through all the possible combinations and then they prompt that language model to evaluate each thought candidate as sure maybe or impossible with regard to it being able to reach 24 the answer we need and then they perform a breadth first search in tree of thoughts where each step they keep the best five candidates so in the breadth first approach they expand the number of starting points before going deeper here in the results the lowercase b here refers to bread or how many columns y they decide to go before digging deeper and so here are the results the basic input output i o prompt is 7.3 Chain of Thought is four percent then they use the Oracle setup with k equals 100 samples in simpler terms they're testing how well this works by trying it many times up to 100 and then taking the best results to evaluate its performance in the most favorable conditions so it's basically running a bunch of times seeing the best outputs so that comes out with nine percent success rate then we have our tree of thoughts with a breath of one and the results are 45 that's a massive leap and then you run again with again b equals five meaning so there's more starting points five starting points that it thinks through and the results go up to 74. so it's bigger by far than any of the other methods used even these three other methods here where they rerun the results and pick the best of 100 for example so tree of thought absolutely crushes everything else notice it's almost 10x the results of just asking gpt4 to solve the problem one prompt so this absolutely crushes game of 24. next we're going to look at creative writing is this approach better for creative writing so next we invent a creative writing task where the input is four random sentences and the output should be a coherent passage with four paragraphs that end in the four input sentences respectively so here's basically how that looks so for example it's given a task of write a coherent passage of four short paragraphs the end sentence of each paragraph must be so and this is the four paragraphs that it randomly generated one it isn't difficult to do a handstand if you just stand on your hands that's true two it caught him off guard that space smelled of seared steak okay three when she didn't like a guy who was trying to pick her up she started using sign language four each person who knows you has a different perception of you who you are so this would be kind of difficult to put together in a coherent storyline and so it generates several plans and plan two it's able to wrap it up and present it in a self-help context so the handstand is part of self-help sort of as a metaphor for embracing challenges and astronauts embracing challenges including the smell of space then a woman's clever tactic so again challenges and contemplate how different perceptions of oneself can shape one's identity that's pretty smart I I gotta I gotta say it connects the paragraphs with a theme of self-improvement and embracing challenges making for a coherent passage so it takes the inputs it plans out multiple different plans that it can do it votes on which plans the best and only then does it actually produce the final results so as you can see here the tree of thoughts is definitely the best one it's a great improvement from the standard input output with chain of tasks sort of halfway point but if you're using sort of that refine function where you're basically asking it okay refine make it better refine it make it better refine and make it better over and over until it determines that it's perfectly coherent well then the input output method works you know almost as well as I would say as the tree of thought and humans of course are grading the tree of thought as the Moscow here one next we're looking at many crosswords so in Game of 24 and Creed writing tot is relatively shallow and most three thought steps are needed to reach the final output here we explore 5x5 mini crosswords as a harder search problem involving natural language so in the tot setup we leverage a depth first approach that keeps exploring the most promising subsequent word clue until this state is no longer promising then backtrack to the parent state to explore alternate thoughts so results as shown in table 3 i o and cot prompting method input output and Chain of Thought prompting methods perform poorly with a word level success rate less than 60 percent while tree of thought significantly improves all metrics achieving a word level success rate of 60 and selling 4 out of 20 games and here's that table 3 as you can see here tot again absolutely crushes everything else next they quickly dive into limitations so search methods like tot require more resources so if you're on gpt4 API it's gonna cost more to run you know it's something that's like wide and deep it's cheaper to do sampling methods in order to improve task performances but the modular flexibility of tot allows users to customize such performance trade-offs and ongoing open source efforts should readily reduce such costs in the near future by the way if you haven't heard how open source is absolutely crushing AI development more so than the best funded companies on the planet check out the we have no emote video I'll link it in the top right of the screen or in the show notes down below so this broader impact is very interesting to me so the trt is a framework that empowers LMS to more autonomously and intelligently make decisions and solve problems while current tasks are limited to reasoning and search problems of future applications involving interactions with external environments or humans could bring potential danger EG facilitating harmful uses of Olympus that is the warning like people might be able to extract much more power much more intelligence out of this than we realize at this point on the other hand tot also improved the interpretability of model decisions and the opportunity for human alignment as the resulting representations are readable high-level language reasoning instead of implicit low-level token values so that means that using these sort of chains and having it output its thoughts each time helps us better understand what it's thinking because we don't really know what it's thinking on the super deep level when it spits something out we don't fully understand how it comes to those decisions but by sort of asking it to show its work just spit out its reasoning in English or natural language at each step of the process allows us to kind of like see where it's going with this and the conclusion is the associative system one of the Lambs can be beneficially augmented by a system too based on searching a tree of possible paths to the solution to a problem they're saying is that gpd4 it's strong out of the box but this way you can increase it even further just by asking it to look down multiple paths and see which solution lies at the end of which path the trio thoughts framework provides a way to translate classical insights about problem solving into actionable methods for contemporary LMS I want to know what you think leave me a comment if you want daily AI news I have a free newsletter it's at natural20.com check it out and help to speak with you again

Original Description

🔥 Get my A.I. + Business Newsletter (free): https://natural20.com/ #ai #gpt #chatgpt So a new A.I. paper is released: “Tree of Thoughts: Deliberate Problem Solving with Large Language Models” It’s by researchers at Princeton University and Google DeepMind. It's shows how increase the ability for GPT-4 to autonomously solve complex problems... but it comes with a warning. Paper: https://arxiv.org/abs/2305.10601 PDF: https://arxiv.org/pdf/2305.10601.pdf TIMELINE 00:00 Tree of Thoughts (plus a warning) 01:00 What is it? 03:48 Human Problem Solving 05:11 Game of 24 07:58 Creative Writing 10:08 Crosswords 11:03 Impact of Findings

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Wes Roth · Wes Roth · 29 of 60

← Previous Next →

Which Vanguard index fund to buy? (hint: it's the one Warren Buffett recommends)

Which Vanguard index fund to buy? (hint: it's the one Warren Buffett recommends)

What does PALANTIR do - Palantir Stock, Founder, Controversy Explained Simply (plus why I'm BUYING).

What does PALANTIR do - Palantir Stock, Founder, Controversy Explained Simply (plus why I'm BUYING).

Paypal misinformation fine ($2,500) - Close Your Accounts ASAP!

Paypal misinformation fine ($2,500) - Close Your Accounts ASAP!

China Was Just Sent Back to the Dark Ages | US starts aggressively cutting ties

China Was Just Sent Back to the Dark Ages | US starts aggressively cutting ties

ChatGPT Business Ideas - How I Use ChatGPT to make money

ChatGPT Business Ideas - How I Use ChatGPT to make money

ChatGPT Explained - The AI revolution is happening right now... [ chat gpt ]

ChatGPT Explained - The AI revolution is happening right now... [ chat gpt ]

ChatGPT Banned - New York blocking network access to ChatGPT

ChatGPT Banned - New York blocking network access to ChatGPT

ChatGPT Trading - this [INSANE] tool A.I. built for me

ChatGPT Trading - this [INSANE] tool A.I. built for me

Small Business Grants for ChatGPT and A.I. (similar to PPP and EIDL in 2023) |

Small Business Grants for ChatGPT and A.I. (similar to PPP and EIDL in 2023) |

How to Make Passive Income with ChatGPT AI

How to Make Passive Income with ChatGPT AI

OpenAI’s GPT-4 Artificial Intelligence = AGI? TRILLIONS of Parameters Plus THIS

OpenAI’s GPT-4 Artificial Intelligence = AGI? TRILLIONS of Parameters Plus THIS

How Nvidia AI Robot Trained 42 Years In 32 Hours And Did THIS | Google DeepMind AlphaCode

How Nvidia AI Robot Trained 42 Years In 32 Hours And Did THIS | Google DeepMind AlphaCode

John Carmack | AGI by 2030 | Will John Carmack's AI company be the one to make it?

John Carmack | AGI by 2030 | Will John Carmack's AI company be the one to make it?

AI Small Business Grants

AI Small Business Grants

Elon Musk attacks OpenAI - here's Sam Altman's response

Elon Musk attacks OpenAI - here's Sam Altman's response

Bill Gates on ChatGPT and OpenAI "The Age of AI has begun"

Bill Gates on ChatGPT and OpenAI "The Age of AI has begun"

Sparks of AGI | Microsoft Researchers claim GPT-4 Is showing "Artificial General Intelligence"

Sparks of AGI | Microsoft Researchers claim GPT-4 Is showing "Artificial General Intelligence"

Elon Musk and Others Call for Pause on AI as GPT-4 shows signs of AGI.

Elon Musk and Others Call for Pause on AI as GPT-4 shows signs of AGI.

Comparing GPT-4 and Google's Bard AI - Who is getting closer to AGI?

Comparing GPT-4 and Google's Bard AI - Who is getting closer to AGI?

Sam Altman on UBI, OpenAI to $100 TRILLION and Massive Job Losses from AI Automation

Sam Altman on UBI, OpenAI to $100 TRILLION and Massive Job Losses from AI Automation

25 ChatGPTs play a videogame...

25 ChatGPTs play a videogame...

NVIDIA's new AI: Better Games, Art and... better life?

NVIDIA's new AI: Better Games, Art and... better life?

Google AI Documents Leak about "Google and OpenAI"

Google AI Documents Leak about "Google and OpenAI"

PaLM 2 vs GPT-4 | why Google is having a hard time catching up...

PaLM 2 vs GPT-4 | why Google is having a hard time catching up...

How To Access ChatGPT Plugins | They are LIVE! (but hidden)

How To Access ChatGPT Plugins | They are LIVE! (but hidden)

Sam Altman to Congress "America HAS to lead the world in AI"...

Sam Altman to Congress "America HAS to lead the world in AI"...

Sam Altman Opening Statement to Congress on AI Regulation

Sam Altman Opening Statement to Congress on AI Regulation

Sam Altman Congress Hearing "AI is the Biggest Threat to Human Race"

Sam Altman Congress Hearing "AI is the Biggest Threat to Human Race"

Tree of Thoughts - GPT-4 Reasoning is Improved 900%

Tree of Thoughts - GPT-4 Reasoning is Improved 900%

Governance of Superintelligence | OpenAI proposes measures for safe AI development.

Governance of Superintelligence | OpenAI proposes measures for safe AI development.

Model Evaluation For Extreme Risks of AI | Google DeepMind and OpenAI Paper

Model Evaluation For Extreme Risks of AI | Google DeepMind and OpenAI Paper

Minecraft AI - NVIDIA uses GPT-4 to create a SELF-IMPROVING 🤯 autonomous agent.

Minecraft AI - NVIDIA uses GPT-4 to create a SELF-IMPROVING 🤯 autonomous agent.

AI Human Extinction Risk - Experts Warn of "Serious Risk"

AI Human Extinction Risk - Experts Warn of "Serious Risk"

LoRA - Low-rank Adaption of AI Large Language Models: LoRA and QLoRA Explained Simply

LoRA - Low-rank Adaption of AI Large Language Models: LoRA and QLoRA Explained Simply

99.3% of ChatGPT Performance with OpenSource AI - [QLoRA paper]

99.3% of ChatGPT Performance with OpenSource AI - [QLoRA paper]

AlphaFold2 Explained | Google's DeepMind Solves Protein Folding

AlphaFold2 Explained | Google's DeepMind Solves Protein Folding

Illumina AI - ChatGPT for your genome...

Illumina AI - ChatGPT for your genome...

Text to Video Invasion! Runway AI releases GEN 2 text to video.

Text to Video Invasion! Runway AI releases GEN 2 text to video.

LLMs as Tool Makers [LATM] - GPT-4 *UPGRADES* lower AI Models.

LLMs as Tool Makers [LATM] - GPT-4 *UPGRADES* lower AI Models.

AlphaDev - DeepMind AI Discovers Better Algorithms for Foundational Computing

AlphaDev - DeepMind AI Discovers Better Algorithms for Foundational Computing

OpenAI GPT-4 Function Calling: *HUGE* Potential

OpenAI GPT-4 Function Calling: *HUGE* Potential

GPT-4 leaked! 🔥 All details exposed 🔥 It is over...

GPT-4 leaked! 🔥 All details exposed 🔥 It is over...

Elon Musk announced XAI - the answer to OpenAI = X.AI

Elon Musk announced XAI - the answer to OpenAI = X.AI

Andrej Karpathy GPT - Advice for building AI agents

Andrej Karpathy GPT - Advice for building AI agents

TEST TO SEE IF AI CAN MAKE $1,000,000 (modern Turing test)

TEST TO SEE IF AI CAN MAKE $1,000,000 (modern Turing test)

ChatGPT custom instructions are *POWERFUL* Replace AutoGPT and BabyAGI?

ChatGPT custom instructions are *POWERFUL* Replace AutoGPT and BabyAGI?

WORLDCOIN LAUNCH is starting! Backed by Sam Altman of OpenAI.

WORLDCOIN LAUNCH is starting! Backed by Sam Altman of OpenAI.

WORLDCOIN ORB - I went to L.A. to get my eye scanned for WorldCoin [my experience]

WORLDCOIN ORB - I went to L.A. to get my eye scanned for WorldCoin [my experience]

The Biggest Week of AI News In Months!

The Biggest Week of AI News In Months!

Google Deepmind RT 2 - Using LLMs to Build Thinking, Learning Robots

Google Deepmind RT 2 - Using LLMs to Build Thinking, Learning Robots

AI News is Getting *WEIRD* Human Brain Matter in Chips. OpenAI tutorial. Amazon unleashed it's AI.

AI News is Getting *WEIRD* Human Brain Matter in Chips. OpenAI tutorial. Amazon unleashed it's AI.

GPT 5 release date 🔥 might be closer than we think | OpenAI applies for GPT-5 Trademark in the US.

GPT 5 release date 🔥 might be closer than we think | OpenAI applies for GPT-5 Trademark in the US.

AI Agents Simulate a Town 🤯 Generative Agents: Interactive Simulacra of Human Behavior.

AI Agents Simulate a Town 🤯 Generative Agents: Interactive Simulacra of Human Behavior.

Proof that AI Understands? 👀 Andrew Ng on LLMs building mental models, Othello GPT, Geoffrey Hinton

Proof that AI Understands? 👀 Andrew Ng on LLMs building mental models, Othello GPT, Geoffrey Hinton

OpenAI acquires Biomes 👀 an open-source MMORPG. ChatGPT plus Minecraft? 🔥

OpenAI acquires Biomes 👀 an open-source MMORPG. ChatGPT plus Minecraft? 🔥

OpenAI announces FINETUNING 👀 for ChatGPT

OpenAI announces FINETUNING 👀 for ChatGPT

Autonomous AI Agents - why YOU should be building them... and HOW.

Autonomous AI Agents - why YOU should be building them... and HOW.

ChatGPT Enterprise - OpenAI launches the next BIG thing

ChatGPT Enterprise - OpenAI launches the next BIG thing

HOODWINKED - AI gets away with MURDER 👀 GPT-4 is an effective killer...

HOODWINKED - AI gets away with MURDER 👀 GPT-4 is an effective killer...

Install Open Interpreter in 2 min | The free, open source CODE INTERPRETER!

Install Open Interpreter in 2 min | The free, open source CODE INTERPRETER!

The 'Tree of Thoughts' approach is a framework that improves GPT-4's problem-solving abilities by 900%. It allows LLMs to explore multiple reasoning paths and self-evaluate choices, making them more autonomous and intelligent. The approach has been tested on various tasks and has shown significant improvements over other methods.

Key Takeaways

Develop the Tree of Thoughts paradigm to address shortcomings of current prompting models
Implement a breadth-first search to evaluate each thought candidate as sure, maybe, or impossible
Test the approach on mathematical reasoning challenges like the game of 24
Apply the approach to creative writing and crossword tasks
Customize performance trade-offs using modular flexibility

💡 The 'Tree of Thoughts' approach provides a way to translate classical insights about problem solving into actionable methods for contemporary LLMs, making them more autonomous and intelligent.

🔒 Pro feature: Ask AI to explain this lesson →

More on: LLM Foundations

View skill →

Getting Started with Vertex AI Gemini 1.5 Flash

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

How to use the ChatGPT API with Python!!

How to use the ChatGPT API with Python!!

Nicholas Renotte

Gemini 2.5: Create an interactive plot of economic data

Gemini 2.5: Create an interactive plot of economic data

Google DeepMind

LangChain Chatbots: Building a Personalized AI Assistant

LangChain Chatbots: Building a Personalized AI Assistant

Analytics Vidhya

Auto-generating meeting notes with Python

Auto-generating meeting notes with Python

Related Reads

I Wanted to Move My Best ChatGPT Conversations Into Gemini.

Learn to export and organize ChatGPT conversations for reuse without losing context

Coherence Looks Like Knowledge

A well-formed answer can appear as knowledge even if it's not entirely true, highlighting the importance of critical evaluation

How I Found a Data Leak via an LLM’s Memory Mechanism

Discover how an LLM's memory mechanism can inadvertently cause data leaks and learn to identify similar vulnerabilities

Multilingual Embedding Models in 2026: What Actually Works on CPU

Learn how to evaluate and choose the best multilingual embedding models for CPU performance, balancing speed, retrieval quality, and scalability

Medium · Machine Learning

Chapters (7)

Tree of Thoughts (plus a warning)

1:00 What is it?

3:48 Human Problem Solving

5:11 Game of 24

7:58 Creative Writing

10:08 Crosswords

11:03 Impact of Findings

5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems

Dave Ebbelaar (LLM Eng)