Pong AI with Policy Gradients

Andrej Karpathy · Beginner ·📰 AI News & Updates ·9y ago
Trained for ~8000 episodes, each episode = ~30 games. Updates were done in batches of 10 episodes, so ~800 updates total. Policy network is a 2-layer neural net connected to raw pixels, with 200 hidden units. Trained with RMSProp and learning rate 1e-4. The final agent does not beat the hard-coded AI consistently, but holds its own. Should be trained longer, with ConvNets, and on GPU. This is ATARI 2600 Pong version, using OpenAI Gym.
Watch on YouTube ↗ (saves to browser)
Simplilearn Reviews | How Learning AI Helped Simrat Lead with Confidence #GetCertifiedGetAhead
Next Up
Simplilearn Reviews | How Learning AI Helped Simrat Lead with Confidence #GetCertifiedGetAhead
Simplilearn