Understanding RL Vision

📰 Distill.pub

Researchers apply interpretability techniques to a reinforcement learning model to understand its decision-making process in a video game environment

advanced Published 17 Nov 2020

Action Steps

Apply attribution techniques to reinforcement learning models to understand their decision-making process
Use dimensionality reduction to identify key features that influence the model's value function and policy
Analyze model failures to identify areas for improvement
Edit model weights to modify its behavior and validate analysis results

Who Needs to Know This

AI engineers and researchers on a team can benefit from this article to improve their understanding of reinforcement learning models and their interpretability, while data scientists can apply these techniques to analyze and improve model performance

Key Insight

💡 Interpretability techniques can be used to analyze and improve reinforcement learning models in complex environments

Key Takeaways

Researchers apply interpretability techniques to a reinforcement learning model to understand its decision-making process in a video game environment

Full Article

# Understanding RL Vision

[Distill](https://distill.pub/)[About](https://distill.pub/about/)[Prize](https://distill.pub/prize/)[Submit](https://distill.pub/journal/)

# Understanding RL Vision

With diverse environments, we can analyze, diagnose and edit deep reinforcement learning models using attribution.

| Observation (video game still) | Positive attribution (good news) | Negative attribution (bad news) |
| --- | --- | --- |
| | | |
| Attribution from a hidden layer to the value function, showing what features of the observation (left) are used to predict success (middle) and failure (right). Applying dimensionality reduction (NMF) yields features that detect various in-game objects. ![Image 1](https://distill.pub/2020/images/hero/coin.png) Coin ![Image 2](https://distill.pub/2020/images/hero/enemy.png) Enemy ![Image 3](https://distill.pub/2020/images/hero/saw.png) Buzzsaw |

### Authors

### Affiliations

[Jacob Hilton](https://www.jacobh.co.uk/)

[OpenAI](https://openai.com/)

[Nick Cammarata](http://nickcammarata.com/)

[OpenAI](https://openai.com/)

[Shan Carter](http://shancarter.com/)

[Observable](http://observablehq.com/)

[Gabriel Goh](http://gabgoh.github.io/)

[OpenAI](https://openai.com/)

[Chris Olah](https://colah.github.io/)

[OpenAI](https://openai.com/)

### Published

Nov. 17, 2020

### DOI

[10.23915/distill.00029](https://doi.org/10.23915/distill.00029)

### Contents

[Introduction](https://distill.pub/2020/understanding-rl-vision#introduction)

[Our CoinRun model](https://distill.pub/2020/understanding-rl-vision#coinrun)

[Model analysis](https://distill.pub/2020/understanding-rl-vision#analysis)

* [Dissecting failure](https://distill.pub/2020/understanding-rl-vision#dissecting-failure)
* [Hallucinations](https://distill.pub/2020/understanding-rl-vision#hallucinations)
* [Model editing](https://distill.pub/2020/understanding-rl-vision#model-editing)

[The diversity hypothesis](https://distill.pub/2020/understanding-rl-vision#diversity-hypothesis)

[Feature visualization](https://distill.pub/2020/understanding-rl-vision#feature-visualization)

[Attribution](https://distill.pub/2020/understanding-rl-vision#attribution)

[Questions for further research](https://distill.pub/2020/understanding-rl-vision#questions)

In this article, we apply interpretability techniques to a reinforcement learning (RL) model trained to play the video game CoinRun . Using attribution combined with dimensionality reduction as in , we build an interface for exploring the objects detected by the model, and how they influence its value function and policy. We leverage this interface in several ways.

* **[Dissecting failure](https://distill.pub/2020/understanding-rl-vision#dissecting-failure).** We perform a step-by-step analysis of the agent’s behavior in cases where it failed to achieve the maximum reward, allowing us to understand what went wrong, and why. For example, one case of failure was caused by an obstacle being temporarily obscured from view.
* **[Hallucinations](https://distill.pub/2020/understanding-rl-vision#hallucinations).** We find situations when the model “hallucinated” a feature not present in the observation, thereby explaining inaccuracies in the model’s value function. These were brief enough that they did not affect the agent’s behavior.
* **[Model editing](https://distill.pub/2020/understanding-rl-vision#model-editing).** We hand-edit the weights of the model to blind the agent to certain hazards, without otherwise changing the agent’s behavior. We verify the effects of these edits by checking which hazards cause the new agents to fail. Such editing is only made possible by our previous analysis, and thus provides a quantitative validation of this analysis.

Our results depend on levels in CoinRun being procedurally-generated, leading us to formulate a [diversity hypothesis](https://distill.pub/2020/understanding-rl-vision#diversity-hypothesis) for interpretability. If it is correct, th

Read full paper → ← Back to Reads

Understanding RL Vision

Key Takeaways

Full Article

Related Videos