Empowerment: Concrete Problems in AI Safety part 2

Robert Miles AI Safety · Advanced ·📄 Research Papers Explained ·8y ago

Skills: AI Alignment Basics80%

Key Takeaways

Explores the concept of empowerment in AI safety, discussing how to prevent AI systems from gaining too much control over their environment

Full Transcript

hi this is part of a series about the paper concrete problems in AI safety which looks at preventing possible accidents in AI systems last time we talked about avoiding negative side effects and how one way of doing that is to create systems that try not to have too much impact to not change the environment around them too much this video is about a slightly more subtle idea than penalizing impact penalizing influence so suppose we have a robot it's a cleaning robot so it's got a mop and a bucket and an apron I'm uh I'm trying something new here bear with me so the robot knows that there's a mess over here that it needs to clean up but in between the robot and the mess is the server room which is full of expensive and delicate equipment now if an AI system doesn't want to have a large impact it won't make plans that involve tipping the bucket of water over the service but maybe we can be safer than that we might want our robot to not even want to bring the bucket of water into the server room to have a preference for going around it instead we might wanted to think something like not only do I not want to have too big of an impact on my surroundings I also don't want to put myself in a situation where it would be easy for me to have a big impact on my surroundings how do we formalize that idea well perhaps we can use information Theory the paper talks about an information theoret metric called empowerment which is a measure of the maximum possible Mutual information between the agent's potential future actions and the potential future State that's equivalent to the capacity of the information Channel between the agent's actions and the environment I.E the rate that the agent's actions transmit information into the environment measured in bits the more information an agent is able to transfer into their environment with their actions the more control they have over their environment the more empowered the agent is so if you're stuck inside a solid locked box your empowerment is more or less zero none of the actions you can take will transmit much information into the world outside the box but if you have the key to the Box your empowerment is much higher because now you can take actions that will have effects on the World At Large you've got options people have used empowerment as a reward for experimental AI systems and it makes them do some interesting things like picking up Keys avoiding walls even things like uh balancing an inverted pendulum or a bicycle you don't have to tell it to keep the bike balanced it just learns that if the bike falls over the agent's actions will have less control over the environment so it wants to keep the bike upright so empowerment is a pretty neat metric because it's very simple but it captures something that humans and other intelligent agents are likely to want we want more options more freedom more capabilities more influence more control over our environment and maybe that's something we don't want our AI systems to want maybe we want to say clean up that mess but try not to gain too much control or influence over your surroundings don't have too much empowerment that could make the robot think if I bring this bucket of water into the server room I'll have the option to destroy the servers so I'll go around to avoid that empowerment okay so now we're at that part of the video what's wrong with this why might it not work pause the video and take a second to [Music] think well there are a few problems one thing is that because it's measuring information we're really measuring Precision of control rather than magnitude of impact as an extreme example suppose you've got your robot in a room and the only thing it has access to is a big button which if pressed will blow up the Moon that actually only counts as one bit of empowerment the button is either pressed or not pressed the Moon is exploded or not two choices so one bit of information one bit of empowerment on the other hand if the robot has an Ethernet cable that's feeding out lots of detailed debug information about everything the robot does and that's all being logged somewhere that's loads of information transfer loads of mutual information with the environment so loads of empowerment the robot cares way more about unplugging the debug cable than anything to do with the button and then you have another possible problem which is perverse incentives okay so this button is only one bit of empowerment nowhere near as big a deal as the debug cable but the robot still cares about it to some extent and wants to avoid putting itself in this situation where it can blow up the moon however if it finds itself already in a situation where it has one bit of empowerment because of this button the easiest way to reduce that is by pressing the button once the button's pressed the Moon is blown up the button doesn't work anymore so the robot then has basically zero bits of empowerment it's just in a box with an unconnected button and now it's content that it's managed to make itself safe it finally has no influence over the world so yeah in this admittedly contrived scenario an empowerment reducing robot will unplug its dbug cable and then blow up the moon that's not safe Behavior why did we think this might be good idea well it just makes the point that even very simple information theoretic metrics can describe interesting abstract properties like influence over the environment so maybe doing something a little bit cleverer than just penalizing empowerment might actually be useful a more sophisticated metric a better architecture around it you know there could be some way to make this work so this is an area that's probably worth looking into by AI safety researchers so that's all for now next thing in the paper is multi agent approaches which should be really interesting make sure to subscribe and hit the Bell if you want to be notified when that's out also make sure you're subscribed to computer file cuz I'm probably going to make some new videos there as well since some of the multi-agent stuff is closely related to the stop button problem that I already talked about so it might be nice to put those together thanks for watching I hope to see you next [Music] time in this video I want to thank oin flick who's supported me on patreon since April thank you and thank you again to all of my wonderful patreon supporters all of these people I've been setting up a room in my house to be a full-time studio uh I might make a behind the scenes video about that soon oh and I've got these pictures that I drew uh while making this video which I have no use for now uh does anyone want them God the internet is weird sometimes isn't it but yeah I can probably post them to supporters if anyone wants one uh let me know Bo thank flick I was close

Original Description

Maybe AI systems would be safer if they avoid gaining too much control over their environment? How might that work? This is a follow-up to this earlier video: https://youtu.be/lqJUIqZNzP8 The paper 'Concrete Problems in AI Safety': https://arxiv.org/pdf/1606.06565.pdf A book chapter about Empowerment: https://arxiv.org/pdf/1310.1863.pdf Prof Brailsford's Information Theory Videos: https://www.youtube.com/watch?v=Lto-ajuqW3w&list=PLzH6n4zXuckpKAj1_88VS-8Z6yn9zX_P6 Thanks to my amazing Patreon Supporters: Sara Tjäder Jason Strack Chad Jones Ichiro Dohi Stefan Skiles Katie Byrne Ziyang Liu Jordan Medina James McCuen Joshua Richardson Fabian Consiglio Jonatan R Øystein Flygt Björn Mosten Michael Greve robertvanduursen The Guru Of Vision Fabrizio Pisani Alexander Hartvig Nielsen Volodymyr Peggy Youell Konstantin Shabashov Almighty Dodd DGJono Matthias Meger Scott Stevens Emilio Alvarez Benjamin Aaron Degenhart Michael Ore Robert Bridges Dmitri Afanasjev Brian Sandberg Einar Ueland Lo Rez C3POehne https://www.patreon.com/robertskmiles

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Robert Miles AI Safety · Robert Miles AI Safety · 6 of 47

← Previous Next →

Predicting AI: RIP Prof. Hubert Dreyfus

Predicting AI: RIP Prof. Hubert Dreyfus

Robert Miles AI Safety

Robert Miles AI Safety

Are AI Risks like Nuclear Risks?

Are AI Risks like Nuclear Risks?

Robert Miles AI Safety

Avoiding Negative Side Effects: Concrete Problems in AI Safety part 1

Avoiding Negative Side Effects: Concrete Problems in AI Safety part 1

Robert Miles AI Safety

Avoiding Positive Side Effects: Concrete Problems in AI Safety part 1.5

Avoiding Positive Side Effects: Concrete Problems in AI Safety part 1.5

Robert Miles AI Safety

Empowerment: Concrete Problems in AI Safety part 2

Empowerment: Concrete Problems in AI Safety part 2

Robert Miles AI Safety

Why Not Just: Raise AI Like Kids?

Why Not Just: Raise AI Like Kids?

Robert Miles AI Safety

Reward Hacking: Concrete Problems in AI Safety Part 3

Reward Hacking: Concrete Problems in AI Safety Part 3

Robert Miles AI Safety

The other "Killer Robot Arms Race" Elon Musk should worry about

The other "Killer Robot Arms Race" Elon Musk should worry about

Robert Miles AI Safety

Reward Hacking Reloaded: Concrete Problems in AI Safety Part 3.5

Reward Hacking Reloaded: Concrete Problems in AI Safety Part 3.5

Robert Miles AI Safety

What Can We Do About Reward Hacking?: Concrete Problems in AI Safety Part 4

What Can We Do About Reward Hacking?: Concrete Problems in AI Safety Part 4

Robert Miles AI Safety

What can AGI do? I/O and Speed

What can AGI do? I/O and Speed

Robert Miles AI Safety

AI learns to Create ̵K̵Z̵F̵ ̵V̵i̵d̵e̵o̵s̵ Cat Pictures: Papers in Two Minutes #1

AI learns to Create ̵K̵Z̵F̵ ̵V̵i̵d̵e̵o̵s̵ Cat Pictures: Papers in Two Minutes #1

Robert Miles AI Safety

AI Safety at EAGlobal2017 Conference

AI Safety at EAGlobal2017 Conference

Robert Miles AI Safety

Scalable Supervision: Concrete Problems in AI Safety Part 5

Scalable Supervision: Concrete Problems in AI Safety Part 5

Robert Miles AI Safety

Superintelligence Mod for Civilization V

Superintelligence Mod for Civilization V

Robert Miles AI Safety

Why Would AI Want to do Bad Things? Instrumental Convergence

Why Would AI Want to do Bad Things? Instrumental Convergence

Robert Miles AI Safety

Experts' Predictions about the Future of AI

Experts' Predictions about the Future of AI

Robert Miles AI Safety

AI Safety Gridworlds

AI Safety Gridworlds

Robert Miles AI Safety

Friend or Foe? AI Safety Gridworlds extra bit

Friend or Foe? AI Safety Gridworlds extra bit

Robert Miles AI Safety

Safe Exploration: Concrete Problems in AI Safety Part 6

Safe Exploration: Concrete Problems in AI Safety Part 6

Robert Miles AI Safety

Why Not Just: Think of AGI Like a Corporation?

Why Not Just: Think of AGI Like a Corporation?

Robert Miles AI Safety

How to Keep Improving When You're Better Than Any Teacher - Iterated Distillation and Amplification

How to Keep Improving When You're Better Than Any Teacher - Iterated Distillation and Amplification

Robert Miles AI Safety

Is AI Safety a Pascal's Mugging?

Is AI Safety a Pascal's Mugging?

Robert Miles AI Safety

AI That Doesn't Try Too Hard - Maximizers and Satisficers

AI That Doesn't Try Too Hard - Maximizers and Satisficers

Robert Miles AI Safety

Training AI Without Writing A Reward Function, with Reward Modelling

Training AI Without Writing A Reward Function, with Reward Modelling

Robert Miles AI Safety

9 Examples of Specification Gaming

9 Examples of Specification Gaming

Robert Miles AI Safety

10 Reasons to Ignore AI Safety

10 Reasons to Ignore AI Safety

Robert Miles AI Safety

Sharing the Benefits of AI: The Windfall Clause

Sharing the Benefits of AI: The Windfall Clause

Robert Miles AI Safety

Quantilizers: AI That Doesn't Try Too Hard

Quantilizers: AI That Doesn't Try Too Hard

Robert Miles AI Safety

The OTHER AI Alignment Problem: Mesa-Optimizers and Inner Alignment

The OTHER AI Alignment Problem: Mesa-Optimizers and Inner Alignment

Robert Miles AI Safety

Deceptive Misaligned Mesa-Optimisers? It's More Likely Than You Think...

Deceptive Misaligned Mesa-Optimisers? It's More Likely Than You Think...

Robert Miles AI Safety

Intro to AI Safety, Remastered

Intro to AI Safety, Remastered

Robert Miles AI Safety

We Were Right! Real Inner Misalignment

We Were Right! Real Inner Misalignment

Robert Miles AI Safety

Apply to AI Safety Camp! #shorts

Apply to AI Safety Camp! #shorts

Robert Miles AI Safety

Win $50k for Solving a Single AI Problem? #Shorts

Win $50k for Solving a Single AI Problem? #Shorts

Robert Miles AI Safety

Free ML Bootcamp for Alignment #shorts

Free ML Bootcamp for Alignment #shorts

Robert Miles AI Safety

Apply Now for a Paid Residency on Interpretability #short

Apply Now for a Paid Residency on Interpretability #short

Robert Miles AI Safety

Why Does AI Lie, and What Can We Do About It?

Why Does AI Lie, and What Can We Do About It?

Robert Miles AI Safety

Apply to Study AI Safety Now! #shorts

Apply to Study AI Safety Now! #shorts

Robert Miles AI Safety

AI Ruined My Year

AI Ruined My Year

Robert Miles AI Safety

Learn AI Safety at MATS #shorts

Learn AI Safety at MATS #shorts

Robert Miles AI Safety

Using Dangerous AI, But Safely?

Using Dangerous AI, But Safely?

Robert Miles AI Safety

AI Safety Career Advice! (And So Can You!)

AI Safety Career Advice! (And So Can You!)

Robert Miles AI Safety

Robot Dog! Unitree Go2 review #shorts #robot #dog

Robot Dog! Unitree Go2 review #shorts #robot #dog

Robert Miles AI Safety

Tech is Good, AI Will Be Different

Tech is Good, AI Will Be Different

Robert Miles AI Safety

Apply for the Affine Superintelligence Alignment Seminar #shorts

Apply for the Affine Superintelligence Alignment Seminar #shorts

Robert Miles AI Safety

More on: AI Alignment Basics

View skill →

Interpretable machine learning applications: Part 5

Interpretable machine learning applications: Part 5

GenAI news from Weights & Biases CEO, Lukas Biewald

GenAI news from Weights & Biases CEO, Lukas Biewald

Weights & Biases

Responsible AI Winners, 2020 PyTorch Summer Hackathon

Responsible AI Winners, 2020 PyTorch Summer Hackathon

Near Real-Time Analytics to GenAI Centralized Observability | Amazon Web Services

Near Real-Time Analytics to GenAI Centralized Observability | Amazon Web Services

Amazon Web Services

Kiro Hooks | Event-Driven Automation for Your IDE | Amazon Web Services

Kiro Hooks | Event-Driven Automation for Your IDE | Amazon Web Services

Amazon Web Services

Get Started with Raven AGI

Get Started with Raven AGI

Related Reads

I Spent Weeks Looking for a Research Gap Before I Realized I Was Searching the Wrong Way

Learn how to effectively find research gaps by changing your approach, a crucial skill for AI researchers and academics

ICMI 2026 Reviews [D]

Learn how to interpret ICMI 2026 reviews and improve your paper's acceptance chances

Reddit r/MachineLearning

Workshop submission for main conference paper under review [D]

Learn how to navigate submitting a paper to a non-archival workshop before the final decision of a main conference like ECCV

Reddit r/MachineLearning

Kept context-switching between arxiv, OpenReview, GitHub, and HuggingFace for every paper, so I built this. Chrome extension + website with everything inline, plus citation graph + SPECTER2 neighbors. 3M papers, free, feedback welcome [P]

Streamline your research with a new Chrome extension and website that integrates 3M papers from arxiv, OpenReview, GitHub, and HuggingFace, including citation graphs and SPECTER2 neighbors, and provide feedback to improve it

Reddit r/MachineLearning

Indians Under House Arrest in America? 😱 Immigration Crisis Explained | SumanTV Classroom

SumanTV Classroom