Superhuman AI Cracked An Impossible Game! | DeepNash, Explained
Skills:
ML Maths Basics70%
Key Takeaways
The video discusses DeepMind's DeepNash, an AI agent that has mastered the game of Stratego, which is more complex than chess and go, and how it achieves this through learning and matching equilibrium policy, bluffing, and self-play without human data.
Full Transcript
[Music] how old is chess I'm trying to rate here the history of Chess and the bottom line is that apparently it's kind of old now this is not a video about Chess but I wanted to bring it up because for a long time Chase was a high bar for artificial intelligence by the mid-1980s computer chess programs began challenging and occasionally beating Grand Masters it remained unclear whether they could ever defeat the world's best remember IBM's deep blue bin the world champion Gary Kasparov that was in 1997 12 years after the development of Deep Blue started and our fascination with chess did not end there fast forward 20 years and in 2017 deepmind released Alpha zero but this time the system could play chess go and Shaggy at a super human level huge accomplishment still is five years later but we just blew past that the story that I want to tell you is not about Chess not about go this is much bigger this is about artificial intelligence mastering the impossible this is Stratego not a place not a time but a battle of wit and skill and strategy that was just the beginning of a 1983 commercial about Stratego now my wife was this close to buying the game for my son but at the end we decided not to do it but it doesn't matter here is how it works Stratego is a two-player board game where you have 40 pieces that move around and the goal is to capture your opponent's flight now two specific characteristics May Stratego way more challenging for artificial intelligence than either chess or go the first day thing we need to consider is the complexity of the game the number of valid States of each one of these games now chess is very complex it has 10 to the power of 1 23 possible valid stay to put this in context we estimate there are 10 to the power of 22 grains of sand on earth and 10 to the power of 25 drops of water in the ocean that's nothing compared to this number here the sheer amount of possible States in chess is one of the reasons it took so long for AI to master It Go however is in a totally different planet 10 to the power of 360 possible States much much hotter than chess beating a professional player at go is a long-standing grand challenge of AI research okay we solved chess we solved go it's time for a new challenge so how about Stratego well 10 to the power of 535 possible States that's a number beyond anything we could ever imagine in comparison chess and go are both nothing now this is just one of the reasons that make Stratego more challenging there is something else the Stratego is an imperfect information game The key thing to understand about why improved information makes things difficult is that you have to worry not just about which actions to play but the probability that you're going to play those actions in a perfect information game like chess or go you see everything that's happening during the game there's nothing hidden from you you can see every piece every play everything we designed Alpha zero to master perfect information games but Alpha zero doesn't work with games where players don't have the full picture and when you think about the real world we usually have to make decisions with partial information if we want to to get closer to artificial intelligence that can help solve the problems we face every day we need to go beyond Alpha zero think about poker for example you don't see your opponent's cards they are completely hidden from you like Noah mentioning his conversation with Lex Freeman there is an additional layer and imperfect information game it's not only about the actions you take but the success probability of those actions Alpha zero did not solve this in fact imperfect information games have been tough for artificial intelligence to crack until now a few days ago on December 1st the mine published a new paper in science talking about their new AI agent deepmash here is their blog post not the paper you can read that one later if you want Stratego the classic board game that's more complex than chess and go and craftier than poker has now been mastered if I start talking about every cool thing about deep Nash we will be here the whole day so let me focus on a couple of details starting with the most important idea deep Nash goal is to learn and match equilibrium policy I should probably make a separate video about Nash equilibrium but this is what you need to know in a two-player zero-sum game like chess go poker or Stratego in Nash equilibrium guarantees that deep Nash will do very well even when playing against the best opponents now Stratego is hard remember some of the information hidden so deep Nash aims to find that Nash equilibrium not perfect but still good enough to win more than 97 percent of gains against the best strategor Bots out there and 84 against top expert human players now speaking about hidden information bluffing is a big part of Stratego sometimes you want to deceive the other player maybe lure them into a trap make them think you're stronger than you really are it's part of the game but deceiving your opponent is a mental state that we have we shouldn't expect it from an artificial intelligence system right well I'm sure you know where I'm going with this deep Nash Bluffs if you go and check the paper you will find links to a bunch of sample games where deep Nash clearly deceives their opponents to take advantage of them it's incredible not only that but deep Nash can make non-trivial trade where it shows how much it values information and that's something unexpected finally there is something I find fascinating deep Nash learns Stratego from scratch have you ever wondered what the meaning of the word zero in Alpha zero is alphago zero doesn't use any human data whatsoever instead what it has to do is learn for itself completely from self-play zero means no human knowledge in the loop deep Nash works the same way it learns exclusively from playing itself and this is such a beautiful and Powerful idea so it starts off extremely naive it starts off with completely random play and yet at every step of the learning process it has an opponent as exactly calibrated to its current level of performance so deep match is not about collecting more human data or having better data deep Dash is not about data at all if you think about it this is great deep Nash is not biased by the way we play the game is not trying trying to copy us instead it builds its own strategies is some playing style and we can use that we can find different tactics and unconventional ways to play just by looking at deepmatch and that is a big part of the value of these systems we can learn a ton from them and by the way Stratego is just a game but the ultimate goal here is to apply these algorithms to real life situations traffic modeling smart grid auction design there are many problems with similar characteristics that's why deep Nash is so important all of a sudden we have a chance against large scale imperfect information problems with a huge State space things that were impossible before are now disclosed if you like this type of content subscribe awesome hey
Original Description
An explanation of DeepMind's DeepNash and what it means for us.
🔔 Subscribe for more stories: https://www.youtube.com/@underfitted?sub_confirmation=1
📚 My 3 favorite Machine Learning books:
• Deep Learning With Python, Second Edition — https://amzn.to/3xA3bVI
• Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow — https://amzn.to/3BOX3LP
• Machine Learning with PyTorch and Scikit-Learn — https://amzn.to/3f7dAC8
Twitter: https://twitter.com/svpino
Disclaimer: Some of the links included in this description are affiliate links where I'll earn a small commission if you purchase something. There's no cost to you.
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Underfitted · Underfitted · 25 of 60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
▶
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Test-Time Augmentation In Machine Learning.
Underfitted
Don't Replace Missing Values In Your Dataset.
Underfitted
Introduction to Adversarial Validation In Machine Learning.
Underfitted
Introduction To Autoencoders In Machine Learning.
Underfitted
Active Learning. The Secret of Training Models Without Labels.
Underfitted
Early Stopping. The Most Popular Regularization Technique In Machine Learning.
Underfitted
The Confusion Matrix in Machine Learning
Underfitted
3 Tips to Build a Career in Machine Learning (Unconventional Advice)
Underfitted
I can predict cars CRASHING. And it's 99% accurate!
Underfitted
A Critical Skill People Learn Too LATE: Learning Curves In Machine Learning.
Underfitted
The BEST Machine Learning Interview Strategy.
Underfitted
OpenAI’s Whisper is AMAZING!
Underfitted
5 Lessons You’re NOT Taught in School
Underfitted
TensorFlow On Apple Silicon. Step-by-Step Instructions
Underfitted
Generating Images From Text. Stable Diffusion, Explained
Underfitted
The Wrong Batch Size Will Ruin Your Model
Underfitted
8 Mistakes Holding Your Career Back | Machine Learning
Underfitted
AI Just Solved a 53-Year-Old Problem! | AlphaTensor, Explained
Underfitted
Bias and Variance, Simplified
Underfitted
Should You Stop Splitting Your Data Like This?
Underfitted
The Function That Changed Everything
Underfitted
This Model Caused A Nuclear Disaster
Underfitted
Will Your Code Write Itself?
Underfitted
The Simplest Encoding You’ve Never Heard Of
Underfitted
Superhuman AI Cracked An Impossible Game! | DeepNash, Explained
Underfitted
Can you become a Data Scientist without a Ph.D?
Underfitted
How to 10x your productivity with ChatGPT?
Underfitted
Cheating the Prisoner's Dilemma
Underfitted
We integrated OpenAI's Whisper with Spot
Underfitted
The Machine Learning School program
Underfitted
We integrated ChatGPT with our robots
Underfitted
Solving complex tasks using a Large Language Model (LLM)
Underfitted
5 problems when using a Large Language Model
Underfitted
We just discovered faster sorting algorithms!
Underfitted
The 3 most important updates to OpenAI's API.
Underfitted
People are divided! Does GPT-4 understand what it says?
Underfitted
How much should you charge hourly as a Machine Learning freelancer?
Underfitted
Building a RAG application from scratch using Python, LangChain, and the OpenAI API
Underfitted
Building a RAG application using open-source models (Asking questions from a PDF using Llama2)
Underfitted
How to evaluate an LLM-powered RAG application automatically.
Underfitted
Step by step no-code RAG application using Langflow.
Underfitted
I built a simple game using Langchain. Here is a step by step tutorial.
Underfitted
I used the first AI Software Engineer for a week. This is happening.
Underfitted
I deployed a recommendation model. Testing Models In Production using Interleaving Experiments.
Underfitted
How to run PyTorch, TensorFlow, and JAX on your Mac (Apple Silicon)
Underfitted
How to train a model to generate image embeddings from scratch
Underfitted
Building an AI assistant that listens and sees the world (Step by step tutorial)
Underfitted
Why are vector databases so FAST?
Underfitted
A Machine Learning roadmap (the one I recommend to my students)
Underfitted
How to build a real-time AI assistant (with voice and vision)
Underfitted
An introduction to Mojo (for Python developers)
Underfitted
How does Lexical Scoping in Mojo 🔥 works (under 3 minutes)
Underfitted
Building a CI workflow for those who hate it (using GitHub Actions)
Underfitted
How to run Python Code in Mojo 🔥
Underfitted
AI will not take your job. Here is what I think will happen instead.
Underfitted
How to fine-tune a model using LoRA (step by step)
Underfitted
Late initialization in Mojo🔥 (Python doesn't support this)
Underfitted
The $1,000,000 problem AI can't solve
Underfitted
A gentle introduction to RAG (using open-source models)
Underfitted
Automating feedback using ChatGPT and Zapier
Underfitted
More on: ML Maths Basics
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
Want to get started with deep learning
Reddit r/deeplearning
Building a Deepfake Detector From Scratch — What Nobody Tells You
Medium · Deep Learning
Unfolding the Meandering Path: High-Dimensional Invariance and the Flat 2D Plane of Neural…
Medium · Deep Learning
Implementing Neural Style Transfer from Scratch: The Project That Started It All
Medium · Deep Learning
🎓
Tutor Explanation
DeepCamp AI