Learning Montezuma’s Revenge from a single demonstration

📰 OpenAI News

OpenAI's algorithm learns to play Montezuma's Revenge from a single human demonstration using PPO reinforcement learning

advanced Published 4 Jul 2018

Action Steps

Start with a human demonstration of the game
Reset the agent to states from the demonstration to reduce exploration
Use PPO reinforcement learning to optimize the game score
Train the agent to play the game from the demonstration states

Who Needs to Know This

This research benefits AI engineers and researchers working on reinforcement learning and game playing agents, as it showcases a novel approach to simplifying exploration in complex games

Key Insight

💡 Starting from demonstration states can bypass the exploration problem in reinforcement learning