Reinforcement Learning: AlphaGo

Graphics in 5 Minutes · Advanced ·🧒 Coding for Kids ·2y ago

Skills: RL Foundations90%Neural Network Basics70%

Key Takeaways

The video explains how AlphaGo works using Reinforcement Learning, covering topics such as analyzing expert games, training an expert policy, value functions, search trees, and self-play to improve its policy.

Full Transcript

welcome back in this video we'll use reinforcement learning to master the game of Go to play Go you take turns placing stones on the board the goal is to surround your opponent and control territory it looks simple but the space of possibilities makes it even harder to master than chess in 2016 a computer program called alphago beat the top human go player in the world this was a major milestone in artificial intelligence Alpha goes based on two big ideas first it learns from Human experts by analyzing thousands of Prior games second it plays millions of games against itself using reinforcement learning to get better and better we'll start from the beginning where should we place the first stone let's consult the experts here are three games from some of the best go players in history you'll notice they all start in the upper right corner but that's just three games let's download some others and a few more and here are 10 000 games played by Japanese professional players since 1941. let's take a look at their opening moves more than half of these games started with this move and most of the rest started here we'll show less popular moves as more transparent now suppose we chose to start here of the 5514 games that began with that first move here's the distribution of where white played next let's again pick the most popular one we can use the same strategy to choose the next few moves cool we've created a go playing program all you do is mimic what the majority of experts did in the exact same situation but there's a big problem after a dozen moves we're down to a single matching game out of the original ten thousand it turns out go games are a bit like snowflakes no two are exactly alike so memorizing prior games won't help you beyond the first few moves we need a better approach to predict good moves let's train a policy that takes the state of the board as input and outputs probabilities for each of the 361 possible actions each of these actions corresponds to placing a stone somewhere on the board alphago uses a neural network that has 13 layers and trains on millions of moves from Human experts you can use this neural net to play complete games and it's pretty decent it can already beat most amateur players that's cool but to beat experts the program needs to think more like an expert professional go players can plan dozens of moves in the future consider this scenario if you're white you might think you should move here to keep black from surrounding you but then black will go here and then you'll have to go here to save your two stones and if you play this out eventually you'll lose all this territory this pattern is known as a ladder and it looks bad right but how bad is it well suppose we simulate a hundred different games forward from this state white loses every single one we say the value of the state is zero for white an expert can plan ahead to avoid bad outcomes like this to see how this works let's roll back to this point in time at the beginning of the ladder is this a good or a bad state for white well if you simulate a hundred games forward from this state white wins 57 a value of 0.57 that's pretty decent this value function gives away to score States and it's really useful for planning moves programs like alphago plan ahead using search trees this node represents the current state of the board white moves next and there's a branch for each possible move from the top left to the bottom right black goes next and introduces more branches and so on each path down the tree represents one sequence of moves we only have time to explore a few of these paths which ones should we choose alphago uses two strategies to prune the search first it uses the expert policy we just trained to prioritize the most promising branches these booths all have high probabilities second it uses the value function to limit the depth of the search the value function provides a likelihood of winning without simulating the entire game you might ask why search it all just choose the move with the best value function from the root the answer is that the value function is only in approximation it may not be accurate so you're better off doing as much searches you can afford during the game itself value functions are super important the more accurate they are the more often your program will win in these simulations I use gnugo an open source go program alphago uses a much better approach for Value estimation alphago starts with the move policy we train from Human experts it uses this policy for only one player white in this case and a different Frozen policy for black whose weights are fixed now let's refine White's policy using reinforcement learning this is just like we did for pong and part one use the current policy to play an entire game if it's a win all whites moves are reinforced it's a loss the moves are penalized alphago repeats this approach playing thousands of games against itself to improve its policy and it will try out different opponent policies to gain experience playing against a variety of adversaries now we simulated a bunch of wins and losses we can use this information to calculate the value function for example this state had two winning games and two losing games go through it so we estimate the probability of winning from this state is one over two whereas this state had two out of three winning games but you'll notice most States aren't visited at all how do we estimate values for unseen States well we can do a lot more simulations alphago does 30 million that sounds like a lot but the 19 by 19 go board has 10 to the 170th valid States that's an astronomically large number and it's part of what makes the game of Go so challenging to generalize to unseen States we'll use our old friend neural networks the 30 million self-play games gives us a lot of training data to train our value Network this is pretty cool think about it professionals gain experience by playing other people you might play a few thousand games over the course of a lifetime alphago gets similar experience in just a few days by playing against itself over and over this is the magic of reinforcement learning to sum it all up alphago Works in two phases first it automatically learns from Human experts then it plays millions of games against itself using reinforcement learning to get better and better and just like we did with pong in part one it's possible to drop the first step entirely and rely only on reinforcement learning that's Alpha zero a newer approach from Google deepmind that's even better stay tuned for part three to learn how reinforcement learning was used to train chat GPT

Original Description

How AlphaGo works, based on Reinforcement Learning. Part 2 of RL from scratch series. https://youtu.be/vXtfdGphr3c 0:00 - intro 0:06 - how to play Go 0:21 - introducing alphaGo 0:46 - analyzing expert games 2:17 - training an expert policy 2:47 - value functions 4:05 - search trees 5:42 - reinforcement learning 6:17 - alphaGo's value function 7:47 - alphaZero

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

The video explains how AlphaGo uses Reinforcement Learning to master the game of Go, covering topics such as analyzing expert games, training an expert policy, value functions, search trees, and self-play to improve its policy. The video also touches on the concept of Alpha Zero, a newer approach from Google DeepMind that relies only on Reinforcement Learning.

Key Takeaways

Analyze expert games to learn opening moves
Train an expert policy using neural networks
Implement value functions to estimate winning probabilities
Use search trees to plan ahead
Refine policy using self-play and reinforcement learning

💡 The video highlights the importance of self-play and reinforcement learning in improving game-playing policies, allowing AlphaGo to gain experience and improve its policy in a short amount of time.

🔒 Pro feature: Ask AI to explain this lesson →

More on: RL Foundations

View skill →

Build a Doom AI Model with Python | Gaming Reinforcement Learning Full Course

Build a Doom AI Model with Python | Gaming Reinforcement Learning Full Course

Nicholas Renotte

Deep Reinforcement Learning for Atari Games Python Tutorial | AI Plays Space Invaders

Deep Reinforcement Learning for Atari Games Python Tutorial | AI Plays Space Invaders

Nicholas Renotte

Training & Testing Deep reinforcement learning (DQN) Agent - Reinforcement Learning p.6

Training & Testing Deep reinforcement learning (DQN) Agent - Reinforcement Learning p.6

Build a Game Bot (LIVE)

Build a Game Bot (LIVE)

How to Win Slot Machines - Intro to Deep Learning #13

How to Win Slot Machines - Intro to Deep Learning #13

Build an Mario AI Model with Python | Gaming Reinforcement Learning

Build an Mario AI Model with Python | Gaming Reinforcement Learning

Nicholas Renotte

Related Reads

Computer Applications for Primary School Children: A Fun + Safe Guide for Ages 6-12

Learn how to introduce primary school children to computer applications in a fun and safe way, teaching them essential skills for the future.

Dev.to · Ogunkola Adeola

From 0 to 20 Chapters: My Story‑Driven Rust Book for Kids Now Has a Bilingual Interactive Demo

Create an interactive coding book for kids using Rust, with a focus on storytelling and bilingual support, to teach programming concepts in an engaging way

Dev.to · born1987-ir

How Kids Can Build Fighting Games Stickman in Scratch

Kids can learn coding by building a stickman fighting game in Scratch, developing skills like movement control and collision detection.

Medium · Programming

Coding Platform for Kids: A Simple Guide to Start Learning Programming

Learn how to introduce kids to coding with a simple guide to get them started with programming

Dev.to · Coding Learning Lab

Chapters (10)

intro

0:06 how to play Go

0:21 introducing alphaGo

0:46 analyzing expert games

2:17 training an expert policy

2:47 value functions

4:05 search trees

5:42 reinforcement learning

6:17 alphaGo's value function

7:47 alphaZero

Man Builds a Backyard Tiny House and Turns It into a Home Office | Start to Finish by @Elseweyr