Pong AI with Policy Gradients
Trained for ~8000 episodes, each episode = ~30 games. Updates were done in batches of 10 episodes, so ~800 updates total. Policy network is a 2-layer neural net connected to raw pixels, with 200 hidden units. Trained with RMSProp and learning rate 1e-4. The final agent does not beat the hard-coded AI consistently, but holds its own. Should be trained longer, with ConvNets, and on GPU.
This is ATARI 2600 Pong version, using OpenAI Gym.
Watch on YouTube ↗
(saves to browser)
DeepCamp AI