Tree of Thoughts - GPT-4 Reasoning is Improved 900%
Key Takeaways
The video discusses the 'Tree of Thoughts' approach, a framework that improves GPT-4's problem-solving abilities by 900%, and its applications in various tasks such as mathematical reasoning, creative writing, and crossword tasks. The approach is compared to other methods like Chain of Thought prompting and input-output methods, and its advantages and limitations are discussed.
Full Transcript
so a new scientific paper is released called tree of thoughts deliberate problem solving with large language models is by researchers at Princeton University and Google the deepmind now you may have heard that saying we only use 10 of our brain now whether or not that's true this paper seems to show that it is true for our current AI models this new approach to Chad gpt's ability to solve complex problems from four percent to 74 that's just from a new way of prompting it called tree of thoughts or tot as it's referenced in a paper the paper even comes with a warning tot is a framework that empowers LMS to more autonomously and intelligently make decisions and solve problems while current tasks are limited to reasoning and search problems future applications involving interaction with external environments or humans could bring potential danger EG facilitating harmful uses of Limbs in a nutshell this approach seems to be a hack to overclock these alarms to produce more intelligence but let's get to the chase what is this tree of thoughts these are different prompting approaches when you ask Chad GPT a question and it answers that's called input output prompting IO now a lot of researchers and individual users have pointed out that you can get better results from LMS by using Chain of Thought prompting an example this might be thinking of three topics that you want to speak about then think through where each of those topics might lead then select the best one and then produce the actual words that you're gonna say so same input but then you're thinking about the possible topics what kind of conversations those might produce then selecting the best one and then the actual output this is the thinking before you say it part yet another way to produce even better results is something called self-consistency with cot with Chain of Thought So self-consistency is generating multiple results for each query and then seeing which results seem to appear the most AKA which ones are the most consistent combining self-consistency with Chain of Thought can lead to even better results than all the previous prompting methods but what is even better than that well Trio thoughts has the potential to be many times more powerful than the previous prompting methods this approach works really well for LMS for more complex reasoning tasks here's the abstract from the paper language models are increasingly being deployed for General problem solving across a wide range of tasks but are still confined to token level left to right decision making processes during inference this means they can fall short in tasks that require exploration strategic look ahead or where initial decisions play a pivotal role to surmount these challenges we introduce tree of thoughts which generalizes over the popular chain of thoughts approach this allows the LM models self-evaluating choices to decide the next course of action as well as looking ahead or backtracking when necessary to make Global choices and this significantly improves language models problem-solving abilities on three tests that these researchers will test them on as you can see here in one of the tests gpt4 with Chain of Thought prompting only solved four percent of the tasks so China thought that's that's the next improved one that's one step better than base level while their method achieved a success rate of 74 that's a 10x Improvement in China thought think of it as two things so it's basically the breadth it's so how many different scenarios it starts up with and then the depth which is how far down each scenario it thinks the key here is that they can look forward and backtrack and save information in various thoughts that can be used elsewhere so how did they come up with this tree of thoughts so they're referencing the work of Newell and Simon from 1972. they were Pioneers in the field of AI they studied how humans solve problems proposing that our brains work like computers we take information process it and output Solutions research on how humans solve problems indicates that people look through a large number of possible solutions you can think of these Solutions like a tree where each point or node is a halfway solution and the connections or branches are actions that can change these halfway Solutions people decide which action to take based on guidelines or heuristics that helped them navigate through all these possible solutions and guide them towards finding an answer there are two shortcomings that are current prompting model has one locally they do not explore different continuations within a thought process aka the branches of the tree and two globally they do not incorporate any type of planning look ahead or backtracking to help evaluate these different options to address these shortcomings we introduced the tree of thoughts in a paradigm that allows a lamps to explore multiple reasoning paths over thoughts now this paper goes pretty deep into the math and specifics on how they did this I won't go to all that here but I'll link the study in the show notes so you can peruse it at your leisure but let's skip to the results so the first thing they did was they made it play a game called 24. game of 24. it's somewhat similar to Sudoku so game of 24 is a mathematical reasoning challenge where the goal is to use four numbers and basic arithmetic operations to obtain 24 so how they set it up is they took 1300 over 1300 games and they sorted it from easy to hard by human solving time so they start with the standard input output IO and the prompted with five in context examples they also did the Chain of Thought prompting and through each iteration the language model is conditioned on all the previous history to reflect on your mistakes and generate a refined answer if the output isn't correct note that it uses ground truth feedback signals about equation correctness so ground truth is basically the real answers the answers that we know to be correct for the AI to compare its results against so in the tree of thoughts approach the chai GPT is asked to think through all the possible combinations and then they prompt that language model to evaluate each thought candidate as sure maybe or impossible with regard to it being able to reach 24 the answer we need and then they perform a breadth first search in tree of thoughts where each step they keep the best five candidates so in the breadth first approach they expand the number of starting points before going deeper here in the results the lowercase b here refers to bread or how many columns y they decide to go before digging deeper and so here are the results the basic input output i o prompt is 7.3 Chain of Thought is four percent then they use the Oracle setup with k equals 100 samples in simpler terms they're testing how well this works by trying it many times up to 100 and then taking the best results to evaluate its performance in the most favorable conditions so it's basically running a bunch of times seeing the best outputs so that comes out with nine percent success rate then we have our tree of thoughts with a breath of one and the results are 45 that's a massive leap and then you run again with again b equals five meaning so there's more starting points five starting points that it thinks through and the results go up to 74. so it's bigger by far than any of the other methods used even these three other methods here where they rerun the results and pick the best of 100 for example so tree of thought absolutely crushes everything else notice it's almost 10x the results of just asking gpt4 to solve the problem one prompt so this absolutely crushes game of 24. next we're going to look at creative writing is this approach better for creative writing so next we invent a creative writing task where the input is four random sentences and the output should be a coherent passage with four paragraphs that end in the four input sentences respectively so here's basically how that looks so for example it's given a task of write a coherent passage of four short paragraphs the end sentence of each paragraph must be so and this is the four paragraphs that it randomly generated one it isn't difficult to do a handstand if you just stand on your hands that's true two it caught him off guard that space smelled of seared steak okay three when she didn't like a guy who was trying to pick her up she started using sign language four each person who knows you has a different perception of you who you are so this would be kind of difficult to put together in a coherent storyline and so it generates several plans and plan two it's able to wrap it up and present it in a self-help context so the handstand is part of self-help sort of as a metaphor for embracing challenges and astronauts embracing challenges including the smell of space then a woman's clever tactic so again challenges and contemplate how different perceptions of oneself can shape one's identity that's pretty smart I I gotta I gotta say it connects the paragraphs with a theme of self-improvement and embracing challenges making for a coherent passage so it takes the inputs it plans out multiple different plans that it can do it votes on which plans the best and only then does it actually produce the final results so as you can see here the tree of thoughts is definitely the best one it's a great improvement from the standard input output with chain of tasks sort of halfway point but if you're using sort of that refine function where you're basically asking it okay refine make it better refine it make it better refine and make it better over and over until it determines that it's perfectly coherent well then the input output method works you know almost as well as I would say as the tree of thought and humans of course are grading the tree of thought as the Moscow here one next we're looking at many crosswords so in Game of 24 and Creed writing tot is relatively shallow and most three thought steps are needed to reach the final output here we explore 5x5 mini crosswords as a harder search problem involving natural language so in the tot setup we leverage a depth first approach that keeps exploring the most promising subsequent word clue until this state is no longer promising then backtrack to the parent state to explore alternate thoughts so results as shown in table 3 i o and cot prompting method input output and Chain of Thought prompting methods perform poorly with a word level success rate less than 60 percent while tree of thought significantly improves all metrics achieving a word level success rate of 60 and selling 4 out of 20 games and here's that table 3 as you can see here tot again absolutely crushes everything else next they quickly dive into limitations so search methods like tot require more resources so if you're on gpt4 API it's gonna cost more to run you know it's something that's like wide and deep it's cheaper to do sampling methods in order to improve task performances but the modular flexibility of tot allows users to customize such performance trade-offs and ongoing open source efforts should readily reduce such costs in the near future by the way if you haven't heard how open source is absolutely crushing AI development more so than the best funded companies on the planet check out the we have no emote video I'll link it in the top right of the screen or in the show notes down below so this broader impact is very interesting to me so the trt is a framework that empowers LMS to more autonomously and intelligently make decisions and solve problems while current tasks are limited to reasoning and search problems of future applications involving interactions with external environments or humans could bring potential danger EG facilitating harmful uses of Olympus that is the warning like people might be able to extract much more power much more intelligence out of this than we realize at this point on the other hand tot also improved the interpretability of model decisions and the opportunity for human alignment as the resulting representations are readable high-level language reasoning instead of implicit low-level token values so that means that using these sort of chains and having it output its thoughts each time helps us better understand what it's thinking because we don't really know what it's thinking on the super deep level when it spits something out we don't fully understand how it comes to those decisions but by sort of asking it to show its work just spit out its reasoning in English or natural language at each step of the process allows us to kind of like see where it's going with this and the conclusion is the associative system one of the Lambs can be beneficially augmented by a system too based on searching a tree of possible paths to the solution to a problem they're saying is that gpd4 it's strong out of the box but this way you can increase it even further just by asking it to look down multiple paths and see which solution lies at the end of which path the trio thoughts framework provides a way to translate classical insights about problem solving into actionable methods for contemporary LMS I want to know what you think leave me a comment if you want daily AI news I have a free newsletter it's at natural20.com check it out and help to speak with you again
Original Description
🔥 Get my A.I. + Business Newsletter (free):
https://natural20.com/
#ai #gpt #chatgpt
So a new A.I. paper is released:
“Tree of Thoughts: Deliberate Problem Solving with Large Language Models”
It’s by researchers at Princeton University and Google DeepMind.
It's shows how increase the ability for GPT-4 to autonomously solve complex problems... but it comes with a warning.
Paper:
https://arxiv.org/abs/2305.10601
PDF:
https://arxiv.org/pdf/2305.10601.pdf
TIMELINE
00:00 Tree of Thoughts (plus a warning)
01:00 What is it?
03:48 Human Problem Solving
05:11 Game of 24
07:58 Creative Writing
10:08 Crosswords
11:03 Impact of Findings
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Wes Roth · Wes Roth · 29 of 60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
▶
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Which Vanguard index fund to buy? (hint: it's the one Warren Buffett recommends)
Wes Roth
What does PALANTIR do - Palantir Stock, Founder, Controversy Explained Simply (plus why I'm BUYING).
Wes Roth
Paypal misinformation fine ($2,500) - Close Your Accounts ASAP!
Wes Roth
China Was Just Sent Back to the Dark Ages | US starts aggressively cutting ties
Wes Roth
ChatGPT Business Ideas - How I Use ChatGPT to make money
Wes Roth
ChatGPT Explained - The AI revolution is happening right now... [ chat gpt ]
Wes Roth
ChatGPT Banned - New York blocking network access to ChatGPT
Wes Roth
ChatGPT Trading - this [INSANE] tool A.I. built for me
Wes Roth
Small Business Grants for ChatGPT and A.I. (similar to PPP and EIDL in 2023) |
Wes Roth
How to Make Passive Income with ChatGPT AI
Wes Roth
OpenAI’s GPT-4 Artificial Intelligence = AGI? TRILLIONS of Parameters Plus THIS
Wes Roth
How Nvidia AI Robot Trained 42 Years In 32 Hours And Did THIS | Google DeepMind AlphaCode
Wes Roth
John Carmack | AGI by 2030 | Will John Carmack's AI company be the one to make it?
Wes Roth
AI Small Business Grants
Wes Roth
Elon Musk attacks OpenAI - here's Sam Altman's response
Wes Roth
Bill Gates on ChatGPT and OpenAI "The Age of AI has begun"
Wes Roth
Sparks of AGI | Microsoft Researchers claim GPT-4 Is showing "Artificial General Intelligence"
Wes Roth
Elon Musk and Others Call for Pause on AI as GPT-4 shows signs of AGI.
Wes Roth
Comparing GPT-4 and Google's Bard AI - Who is getting closer to AGI?
Wes Roth
Sam Altman on UBI, OpenAI to $100 TRILLION and Massive Job Losses from AI Automation
Wes Roth
25 ChatGPTs play a videogame...
Wes Roth
NVIDIA's new AI: Better Games, Art and... better life?
Wes Roth
Google AI Documents Leak about "Google and OpenAI"
Wes Roth
PaLM 2 vs GPT-4 | why Google is having a hard time catching up...
Wes Roth
How To Access ChatGPT Plugins | They are LIVE! (but hidden)
Wes Roth
Sam Altman to Congress "America HAS to lead the world in AI"...
Wes Roth
Sam Altman Opening Statement to Congress on AI Regulation
Wes Roth
Sam Altman Congress Hearing "AI is the Biggest Threat to Human Race"
Wes Roth
Tree of Thoughts - GPT-4 Reasoning is Improved 900%
Wes Roth
Governance of Superintelligence | OpenAI proposes measures for safe AI development.
Wes Roth
Model Evaluation For Extreme Risks of AI | Google DeepMind and OpenAI Paper
Wes Roth
Minecraft AI - NVIDIA uses GPT-4 to create a SELF-IMPROVING 🤯 autonomous agent.
Wes Roth
AI Human Extinction Risk - Experts Warn of "Serious Risk"
Wes Roth
LoRA - Low-rank Adaption of AI Large Language Models: LoRA and QLoRA Explained Simply
Wes Roth
99.3% of ChatGPT Performance with OpenSource AI - [QLoRA paper]
Wes Roth
AlphaFold2 Explained | Google's DeepMind Solves Protein Folding
Wes Roth
Illumina AI - ChatGPT for your genome...
Wes Roth
Text to Video Invasion! Runway AI releases GEN 2 text to video.
Wes Roth
LLMs as Tool Makers [LATM] - GPT-4 *UPGRADES* lower AI Models.
Wes Roth
AlphaDev - DeepMind AI Discovers Better Algorithms for Foundational Computing
Wes Roth
OpenAI GPT-4 Function Calling: *HUGE* Potential
Wes Roth
GPT-4 leaked! 🔥 All details exposed 🔥 It is over...
Wes Roth
Elon Musk announced XAI - the answer to OpenAI = X.AI
Wes Roth
Andrej Karpathy GPT - Advice for building AI agents
Wes Roth
TEST TO SEE IF AI CAN MAKE $1,000,000 (modern Turing test)
Wes Roth
ChatGPT custom instructions are *POWERFUL* Replace AutoGPT and BabyAGI?
Wes Roth
WORLDCOIN LAUNCH is starting! Backed by Sam Altman of OpenAI.
Wes Roth
WORLDCOIN ORB - I went to L.A. to get my eye scanned for WorldCoin [my experience]
Wes Roth
The Biggest Week of AI News In Months!
Wes Roth
Google Deepmind RT 2 - Using LLMs to Build Thinking, Learning Robots
Wes Roth
AI News is Getting *WEIRD* Human Brain Matter in Chips. OpenAI tutorial. Amazon unleashed it's AI.
Wes Roth
GPT 5 release date 🔥 might be closer than we think | OpenAI applies for GPT-5 Trademark in the US.
Wes Roth
AI Agents Simulate a Town 🤯 Generative Agents: Interactive Simulacra of Human Behavior.
Wes Roth
Proof that AI Understands? 👀 Andrew Ng on LLMs building mental models, Othello GPT, Geoffrey Hinton
Wes Roth
OpenAI acquires Biomes 👀 an open-source MMORPG. ChatGPT plus Minecraft? 🔥
Wes Roth
OpenAI announces FINETUNING 👀 for ChatGPT
Wes Roth
Autonomous AI Agents - why YOU should be building them... and HOW.
Wes Roth
ChatGPT Enterprise - OpenAI launches the next BIG thing
Wes Roth
HOODWINKED - AI gets away with MURDER 👀 GPT-4 is an effective killer...
Wes Roth
Install Open Interpreter in 2 min | The free, open source CODE INTERPRETER!
Wes Roth
More on: LLM Foundations
View skill →Related Reads
Chapters (7)
Tree of Thoughts (plus a warning)
1:00
What is it?
3:48
Human Problem Solving
5:11
Game of 24
7:58
Creative Writing
10:08
Crosswords
11:03
Impact of Findings
🎓
Tutor Explanation
DeepCamp AI