Superhuman AI Cracked An Impossible Game! | DeepNash, Explained

Underfitted · Beginner ·🧬 Deep Learning ·3y ago

Key Takeaways

The video discusses DeepMind's DeepNash, an AI agent that has mastered the game of Stratego, which is more complex than chess and go, and how it achieves this through learning and matching equilibrium policy, bluffing, and self-play without human data.

Full Transcript

[Music] how old is chess I'm trying to rate here the history of Chess and the bottom line is that apparently it's kind of old now this is not a video about Chess but I wanted to bring it up because for a long time Chase was a high bar for artificial intelligence by the mid-1980s computer chess programs began challenging and occasionally beating Grand Masters it remained unclear whether they could ever defeat the world's best remember IBM's deep blue bin the world champion Gary Kasparov that was in 1997 12 years after the development of Deep Blue started and our fascination with chess did not end there fast forward 20 years and in 2017 deepmind released Alpha zero but this time the system could play chess go and Shaggy at a super human level huge accomplishment still is five years later but we just blew past that the story that I want to tell you is not about Chess not about go this is much bigger this is about artificial intelligence mastering the impossible this is Stratego not a place not a time but a battle of wit and skill and strategy that was just the beginning of a 1983 commercial about Stratego now my wife was this close to buying the game for my son but at the end we decided not to do it but it doesn't matter here is how it works Stratego is a two-player board game where you have 40 pieces that move around and the goal is to capture your opponent's flight now two specific characteristics May Stratego way more challenging for artificial intelligence than either chess or go the first day thing we need to consider is the complexity of the game the number of valid States of each one of these games now chess is very complex it has 10 to the power of 1 23 possible valid stay to put this in context we estimate there are 10 to the power of 22 grains of sand on earth and 10 to the power of 25 drops of water in the ocean that's nothing compared to this number here the sheer amount of possible States in chess is one of the reasons it took so long for AI to master It Go however is in a totally different planet 10 to the power of 360 possible States much much hotter than chess beating a professional player at go is a long-standing grand challenge of AI research okay we solved chess we solved go it's time for a new challenge so how about Stratego well 10 to the power of 535 possible States that's a number beyond anything we could ever imagine in comparison chess and go are both nothing now this is just one of the reasons that make Stratego more challenging there is something else the Stratego is an imperfect information game The key thing to understand about why improved information makes things difficult is that you have to worry not just about which actions to play but the probability that you're going to play those actions in a perfect information game like chess or go you see everything that's happening during the game there's nothing hidden from you you can see every piece every play everything we designed Alpha zero to master perfect information games but Alpha zero doesn't work with games where players don't have the full picture and when you think about the real world we usually have to make decisions with partial information if we want to to get closer to artificial intelligence that can help solve the problems we face every day we need to go beyond Alpha zero think about poker for example you don't see your opponent's cards they are completely hidden from you like Noah mentioning his conversation with Lex Freeman there is an additional layer and imperfect information game it's not only about the actions you take but the success probability of those actions Alpha zero did not solve this in fact imperfect information games have been tough for artificial intelligence to crack until now a few days ago on December 1st the mine published a new paper in science talking about their new AI agent deepmash here is their blog post not the paper you can read that one later if you want Stratego the classic board game that's more complex than chess and go and craftier than poker has now been mastered if I start talking about every cool thing about deep Nash we will be here the whole day so let me focus on a couple of details starting with the most important idea deep Nash goal is to learn and match equilibrium policy I should probably make a separate video about Nash equilibrium but this is what you need to know in a two-player zero-sum game like chess go poker or Stratego in Nash equilibrium guarantees that deep Nash will do very well even when playing against the best opponents now Stratego is hard remember some of the information hidden so deep Nash aims to find that Nash equilibrium not perfect but still good enough to win more than 97 percent of gains against the best strategor Bots out there and 84 against top expert human players now speaking about hidden information bluffing is a big part of Stratego sometimes you want to deceive the other player maybe lure them into a trap make them think you're stronger than you really are it's part of the game but deceiving your opponent is a mental state that we have we shouldn't expect it from an artificial intelligence system right well I'm sure you know where I'm going with this deep Nash Bluffs if you go and check the paper you will find links to a bunch of sample games where deep Nash clearly deceives their opponents to take advantage of them it's incredible not only that but deep Nash can make non-trivial trade where it shows how much it values information and that's something unexpected finally there is something I find fascinating deep Nash learns Stratego from scratch have you ever wondered what the meaning of the word zero in Alpha zero is alphago zero doesn't use any human data whatsoever instead what it has to do is learn for itself completely from self-play zero means no human knowledge in the loop deep Nash works the same way it learns exclusively from playing itself and this is such a beautiful and Powerful idea so it starts off extremely naive it starts off with completely random play and yet at every step of the learning process it has an opponent as exactly calibrated to its current level of performance so deep match is not about collecting more human data or having better data deep Dash is not about data at all if you think about it this is great deep Nash is not biased by the way we play the game is not trying trying to copy us instead it builds its own strategies is some playing style and we can use that we can find different tactics and unconventional ways to play just by looking at deepmatch and that is a big part of the value of these systems we can learn a ton from them and by the way Stratego is just a game but the ultimate goal here is to apply these algorithms to real life situations traffic modeling smart grid auction design there are many problems with similar characteristics that's why deep Nash is so important all of a sudden we have a chance against large scale imperfect information problems with a huge State space things that were impossible before are now disclosed if you like this type of content subscribe awesome hey

Original Description

An explanation of DeepMind's DeepNash and what it means for us. 🔔 Subscribe for more stories: https://www.youtube.com/@underfitted?sub_confirmation=1 📚 My 3 favorite Machine Learning books: • Deep Learning With Python, Second Edition — https://amzn.to/3xA3bVI • Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow — https://amzn.to/3BOX3LP • Machine Learning with PyTorch and Scikit-Learn — https://amzn.to/3f7dAC8 Twitter: https://twitter.com/svpino Disclaimer: Some of the links included in this description are affiliate links where I'll earn a small commission if you purchase something. There's no cost to you.
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Underfitted · Underfitted · 25 of 60

1 Test-Time Augmentation In Machine Learning.
Test-Time Augmentation In Machine Learning.
Underfitted
2 Don't Replace Missing Values In Your Dataset.
Don't Replace Missing Values In Your Dataset.
Underfitted
3 Introduction to Adversarial Validation In Machine Learning.
Introduction to Adversarial Validation In Machine Learning.
Underfitted
4 Introduction To Autoencoders In Machine Learning.
Introduction To Autoencoders In Machine Learning.
Underfitted
5 Active Learning. The Secret of Training Models Without Labels.
Active Learning. The Secret of Training Models Without Labels.
Underfitted
6 Early Stopping. The Most Popular Regularization Technique In Machine Learning.
Early Stopping. The Most Popular Regularization Technique In Machine Learning.
Underfitted
7 The Confusion Matrix in Machine Learning
The Confusion Matrix in Machine Learning
Underfitted
8 3 Tips to Build a Career in Machine Learning (Unconventional Advice)
3 Tips to Build a Career in Machine Learning (Unconventional Advice)
Underfitted
9 I can predict cars CRASHING. And it's 99% accurate!
I can predict cars CRASHING. And it's 99% accurate!
Underfitted
10 A Critical Skill People Learn Too LATE: Learning Curves In Machine Learning.
A Critical Skill People Learn Too LATE: Learning Curves In Machine Learning.
Underfitted
11 The BEST Machine Learning Interview Strategy.
The BEST Machine Learning Interview Strategy.
Underfitted
12 OpenAI’s Whisper is AMAZING!
OpenAI’s Whisper is AMAZING!
Underfitted
13 5 Lessons You’re NOT Taught in School
5 Lessons You’re NOT Taught in School
Underfitted
14 TensorFlow On Apple Silicon. Step-by-Step Instructions
TensorFlow On Apple Silicon. Step-by-Step Instructions
Underfitted
15 Generating Images From Text. Stable Diffusion, Explained
Generating Images From Text. Stable Diffusion, Explained
Underfitted
16 The Wrong Batch Size Will Ruin Your Model
The Wrong Batch Size Will Ruin Your Model
Underfitted
17 8 Mistakes Holding Your Career Back | Machine Learning
8 Mistakes Holding Your Career Back | Machine Learning
Underfitted
18 AI Just Solved a 53-Year-Old Problem! | AlphaTensor, Explained
AI Just Solved a 53-Year-Old Problem! | AlphaTensor, Explained
Underfitted
19 Bias and Variance, Simplified
Bias and Variance, Simplified
Underfitted
20 Should You Stop Splitting Your Data Like This?
Should You Stop Splitting Your Data Like This?
Underfitted
21 The Function That Changed Everything
The Function That Changed Everything
Underfitted
22 This Model Caused A Nuclear Disaster
This Model Caused A Nuclear Disaster
Underfitted
23 Will Your Code Write Itself?
Will Your Code Write Itself?
Underfitted
24 The Simplest Encoding You’ve Never Heard Of
The Simplest Encoding You’ve Never Heard Of
Underfitted
Superhuman AI Cracked An Impossible Game! | DeepNash, Explained
Superhuman AI Cracked An Impossible Game! | DeepNash, Explained
Underfitted
26 Can you become a Data Scientist without a Ph.D?
Can you become a Data Scientist without a Ph.D?
Underfitted
27 How to 10x your productivity with ChatGPT?
How to 10x your productivity with ChatGPT?
Underfitted
28 Cheating the Prisoner's Dilemma
Cheating the Prisoner's Dilemma
Underfitted
29 We integrated OpenAI's Whisper with Spot
We integrated OpenAI's Whisper with Spot
Underfitted
30 The Machine Learning School program
The Machine Learning School program
Underfitted
31 We integrated ChatGPT with our robots
We integrated ChatGPT with our robots
Underfitted
32 Solving complex tasks using a Large Language Model (LLM)
Solving complex tasks using a Large Language Model (LLM)
Underfitted
33 5 problems when using a Large Language Model
5 problems when using a Large Language Model
Underfitted
34 We just discovered faster sorting algorithms!
We just discovered faster sorting algorithms!
Underfitted
35 The 3 most important updates to OpenAI's API.
The 3 most important updates to OpenAI's API.
Underfitted
36 People are divided! Does GPT-4 understand what it says?
People are divided! Does GPT-4 understand what it says?
Underfitted
37 How much should you charge hourly as a Machine Learning freelancer?
How much should you charge hourly as a Machine Learning freelancer?
Underfitted
38 Building a RAG application from scratch using Python, LangChain, and the OpenAI API
Building a RAG application from scratch using Python, LangChain, and the OpenAI API
Underfitted
39 Building a RAG application using open-source models (Asking questions from a PDF using Llama2)
Building a RAG application using open-source models (Asking questions from a PDF using Llama2)
Underfitted
40 How to evaluate an LLM-powered RAG application automatically.
How to evaluate an LLM-powered RAG application automatically.
Underfitted
41 Step by step no-code RAG application using Langflow.
Step by step no-code RAG application using Langflow.
Underfitted
42 I built a simple game using Langchain. Here is a step by step tutorial.
I built a simple game using Langchain. Here is a step by step tutorial.
Underfitted
43 I used the first AI Software Engineer for a week. This is happening.
I used the first AI Software Engineer for a week. This is happening.
Underfitted
44 I deployed a recommendation model. Testing Models In Production using Interleaving Experiments.
I deployed a recommendation model. Testing Models In Production using Interleaving Experiments.
Underfitted
45 How to run PyTorch, TensorFlow, and JAX on your Mac (Apple Silicon)
How to run PyTorch, TensorFlow, and JAX on your Mac (Apple Silicon)
Underfitted
46 How to train a model to generate image embeddings from scratch
How to train a model to generate image embeddings from scratch
Underfitted
47 Building an AI assistant that listens and sees the world (Step by step tutorial)
Building an AI assistant that listens and sees the world (Step by step tutorial)
Underfitted
48 Why are vector databases so FAST?
Why are vector databases so FAST?
Underfitted
49 A Machine Learning roadmap (the one I recommend to my students)
A Machine Learning roadmap (the one I recommend to my students)
Underfitted
50 How to build a real-time AI assistant (with voice and vision)
How to build a real-time AI assistant (with voice and vision)
Underfitted
51 An introduction to Mojo (for Python developers)
An introduction to Mojo (for Python developers)
Underfitted
52 How does Lexical Scoping in Mojo 🔥 works (under 3 minutes)
How does Lexical Scoping in Mojo 🔥 works (under 3 minutes)
Underfitted
53 Building a CI workflow for those who hate it (using GitHub Actions)
Building a CI workflow for those who hate it (using GitHub Actions)
Underfitted
54 How to run Python Code in Mojo 🔥
How to run Python Code in Mojo 🔥
Underfitted
55 AI will not take your job. Here is what I think will happen instead.
AI will not take your job. Here is what I think will happen instead.
Underfitted
56 How to fine-tune a model using LoRA (step by step)
How to fine-tune a model using LoRA (step by step)
Underfitted
57 Late initialization in Mojo🔥 (Python doesn't support this)
Late initialization in Mojo🔥 (Python doesn't support this)
Underfitted
58 The $1,000,000 problem AI can't solve
The $1,000,000 problem AI can't solve
Underfitted
59 A gentle introduction to RAG (using open-source models)
A gentle introduction to RAG (using open-source models)
Underfitted
60 Automating feedback using ChatGPT and Zapier
Automating feedback using ChatGPT and Zapier
Underfitted

DeepNash, an AI agent, has mastered Stratego, a complex game with imperfect information, by learning and matching equilibrium policy through self-play without human data, and its applications go beyond games to real-life problems.

Key Takeaways
  1. Understand the basics of Stratego and its complexity
  2. Learn about Nash Equilibrium and its application to AI
  3. Study how DeepNash achieves self-play without human data
  4. Analyze the applications of DeepNash beyond games
💡 DeepNash's ability to learn and match equilibrium policy through self-play without human data makes it a powerful tool for solving complex problems with imperfect information.

Related AI Lessons

Want to get started with deep learning
Get started with deep learning by leveraging resources like Andrew Karpathy's playlist and frameworks such as TensorFlow or PyTorch
Reddit r/deeplearning
Building a Deepfake Detector From Scratch — What Nobody Tells You
Learn to build a deepfake detector from scratch and understand the challenges involved in detecting AI-generated fake media
Medium · Deep Learning
Unfolding the Meandering Path: High-Dimensional Invariance and the Flat 2D Plane of Neural…
Learn about high-dimensional invariance and its relation to the flat 2D plane of neural networks, and how to apply these concepts to improve model performance
Medium · Deep Learning
Implementing Neural Style Transfer from Scratch: The Project That Started It All
Learn to implement Neural Style Transfer from scratch and understand its significance in deep learning
Medium · Deep Learning
Up next
Image Classification with ml5.js
The Coding Train
Watch →