Understanding RL Vision
📰 Distill.pub
Researchers apply interpretability techniques to a reinforcement learning model to understand its decision-making process in a video game environment
Action Steps
- Apply attribution techniques to reinforcement learning models to understand their decision-making process
- Use dimensionality reduction to identify key features that influence the model's value function and policy
- Analyze model failures to identify areas for improvement
- Edit model weights to modify its behavior and validate analysis results
Who Needs to Know This
AI engineers and researchers on a team can benefit from this article to improve their understanding of reinforcement learning models and their interpretability, while data scientists can apply these techniques to analyze and improve model performance
Key Insight
💡 Interpretability techniques can be used to analyze and improve reinforcement learning models in complex environments
Share This
🤖 Researchers use interpretability techniques to understand RL model decision-making in video games 🎮
Key Takeaways
Researchers apply interpretability techniques to a reinforcement learning model to understand its decision-making process in a video game environment
Full Article
# Understanding RL Vision
[Distill](https://distill.pub/)[About](https://distill.pub/about/)[Prize](https://distill.pub/prize/)[Submit](https://distill.pub/journal/)
# Understanding RL Vision
With diverse environments, we can analyze, diagnose and edit deep reinforcement learning models using attribution.
| Observation (video game still) | Positive attribution (good news) | Negative attribution (bad news) |
| --- | --- | --- |
| | | |
| Attribution from a hidden layer to the value function, showing what features of the observation (left) are used to predict success (middle) and failure (right). Applying dimensionality reduction (NMF) yields features that detect various in-game objects.  Coin  Enemy  Buzzsaw |
### Authors
### Affiliations
[Jacob Hilton](https://www.jacobh.co.uk/)
[OpenAI](https://openai.com/)
[Nick Cammarata](http://nickcammarata.com/)
[OpenAI](https://openai.com/)
[Shan Carter](http://shancarter.com/)
[Observable](http://observablehq.com/)
[Gabriel Goh](http://gabgoh.github.io/)
[OpenAI](https://openai.com/)
[Chris Olah](https://colah.github.io/)
[OpenAI](https://openai.com/)
### Published
Nov. 17, 2020
### DOI
[10.23915/distill.00029](https://doi.org/10.23915/distill.00029)
### Contents
[Introduction](https://distill.pub/2020/understanding-rl-vision#introduction)
[Our CoinRun model](https://distill.pub/2020/understanding-rl-vision#coinrun)
[Model analysis](https://distill.pub/2020/understanding-rl-vision#analysis)
* [Dissecting failure](https://distill.pub/2020/understanding-rl-vision#dissecting-failure)
* [Hallucinations](https://distill.pub/2020/understanding-rl-vision#hallucinations)
* [Model editing](https://distill.pub/2020/understanding-rl-vision#model-editing)
[The diversity hypothesis](https://distill.pub/2020/understanding-rl-vision#diversity-hypothesis)
[Feature visualization](https://distill.pub/2020/understanding-rl-vision#feature-visualization)
[Attribution](https://distill.pub/2020/understanding-rl-vision#attribution)
[Questions for further research](https://distill.pub/2020/understanding-rl-vision#questions)
In this article, we apply interpretability techniques to a reinforcement learning (RL) model trained to play the video game CoinRun . Using attribution combined with dimensionality reduction as in , we build an interface for exploring the objects detected by the model, and how they influence its value function and policy. We leverage this interface in several ways.
* **[Dissecting failure](https://distill.pub/2020/understanding-rl-vision#dissecting-failure).** We perform a step-by-step analysis of the agent’s behavior in cases where it failed to achieve the maximum reward, allowing us to understand what went wrong, and why. For example, one case of failure was caused by an obstacle being temporarily obscured from view.
* **[Hallucinations](https://distill.pub/2020/understanding-rl-vision#hallucinations).** We find situations when the model “hallucinated” a feature not present in the observation, thereby explaining inaccuracies in the model’s value function. These were brief enough that they did not affect the agent’s behavior.
* **[Model editing](https://distill.pub/2020/understanding-rl-vision#model-editing).** We hand-edit the weights of the model to blind the agent to certain hazards, without otherwise changing the agent’s behavior. We verify the effects of these edits by checking which hazards cause the new agents to fail. Such editing is only made possible by our previous analysis, and thus provides a quantitative validation of this analysis.
Our results depend on levels in CoinRun being procedurally-generated, leading us to formulate a [diversity hypothesis](https://distill.pub/2020/understanding-rl-vision#diversity-hypothesis) for interpretability. If it is correct, th
[Distill](https://distill.pub/)[About](https://distill.pub/about/)[Prize](https://distill.pub/prize/)[Submit](https://distill.pub/journal/)
# Understanding RL Vision
With diverse environments, we can analyze, diagnose and edit deep reinforcement learning models using attribution.
| Observation (video game still) | Positive attribution (good news) | Negative attribution (bad news) |
| --- | --- | --- |
| | | |
| Attribution from a hidden layer to the value function, showing what features of the observation (left) are used to predict success (middle) and failure (right). Applying dimensionality reduction (NMF) yields features that detect various in-game objects.  Coin  Enemy  Buzzsaw |
### Authors
### Affiliations
[Jacob Hilton](https://www.jacobh.co.uk/)
[OpenAI](https://openai.com/)
[Nick Cammarata](http://nickcammarata.com/)
[OpenAI](https://openai.com/)
[Shan Carter](http://shancarter.com/)
[Observable](http://observablehq.com/)
[Gabriel Goh](http://gabgoh.github.io/)
[OpenAI](https://openai.com/)
[Chris Olah](https://colah.github.io/)
[OpenAI](https://openai.com/)
### Published
Nov. 17, 2020
### DOI
[10.23915/distill.00029](https://doi.org/10.23915/distill.00029)
### Contents
[Introduction](https://distill.pub/2020/understanding-rl-vision#introduction)
[Our CoinRun model](https://distill.pub/2020/understanding-rl-vision#coinrun)
[Model analysis](https://distill.pub/2020/understanding-rl-vision#analysis)
* [Dissecting failure](https://distill.pub/2020/understanding-rl-vision#dissecting-failure)
* [Hallucinations](https://distill.pub/2020/understanding-rl-vision#hallucinations)
* [Model editing](https://distill.pub/2020/understanding-rl-vision#model-editing)
[The diversity hypothesis](https://distill.pub/2020/understanding-rl-vision#diversity-hypothesis)
[Feature visualization](https://distill.pub/2020/understanding-rl-vision#feature-visualization)
[Attribution](https://distill.pub/2020/understanding-rl-vision#attribution)
[Questions for further research](https://distill.pub/2020/understanding-rl-vision#questions)
In this article, we apply interpretability techniques to a reinforcement learning (RL) model trained to play the video game CoinRun . Using attribution combined with dimensionality reduction as in , we build an interface for exploring the objects detected by the model, and how they influence its value function and policy. We leverage this interface in several ways.
* **[Dissecting failure](https://distill.pub/2020/understanding-rl-vision#dissecting-failure).** We perform a step-by-step analysis of the agent’s behavior in cases where it failed to achieve the maximum reward, allowing us to understand what went wrong, and why. For example, one case of failure was caused by an obstacle being temporarily obscured from view.
* **[Hallucinations](https://distill.pub/2020/understanding-rl-vision#hallucinations).** We find situations when the model “hallucinated” a feature not present in the observation, thereby explaining inaccuracies in the model’s value function. These were brief enough that they did not affect the agent’s behavior.
* **[Model editing](https://distill.pub/2020/understanding-rl-vision#model-editing).** We hand-edit the weights of the model to blind the agent to certain hazards, without otherwise changing the agent’s behavior. We verify the effects of these edits by checking which hazards cause the new agents to fail. Such editing is only made possible by our previous analysis, and thus provides a quantitative validation of this analysis.
Our results depend on levels in CoinRun being procedurally-generated, leading us to formulate a [diversity hypothesis](https://distill.pub/2020/understanding-rl-vision#diversity-hypothesis) for interpretability. If it is correct, th
DeepCamp AI