Empowerment: Concrete Problems in AI Safety part 2
Skills:
AI Alignment Basics80%
Key Takeaways
Explores the concept of empowerment in AI safety, discussing how to prevent AI systems from gaining too much control over their environment
Full Transcript
hi this is part of a series about the paper concrete problems in AI safety which looks at preventing possible accidents in AI systems last time we talked about avoiding negative side effects and how one way of doing that is to create systems that try not to have too much impact to not change the environment around them too much this video is about a slightly more subtle idea than penalizing impact penalizing influence so suppose we have a robot it's a cleaning robot so it's got a mop and a bucket and an apron I'm uh I'm trying something new here bear with me so the robot knows that there's a mess over here that it needs to clean up but in between the robot and the mess is the server room which is full of expensive and delicate equipment now if an AI system doesn't want to have a large impact it won't make plans that involve tipping the bucket of water over the service but maybe we can be safer than that we might want our robot to not even want to bring the bucket of water into the server room to have a preference for going around it instead we might wanted to think something like not only do I not want to have too big of an impact on my surroundings I also don't want to put myself in a situation where it would be easy for me to have a big impact on my surroundings how do we formalize that idea well perhaps we can use information Theory the paper talks about an information theoret metric called empowerment which is a measure of the maximum possible Mutual information between the agent's potential future actions and the potential future State that's equivalent to the capacity of the information Channel between the agent's actions and the environment I.E the rate that the agent's actions transmit information into the environment measured in bits the more information an agent is able to transfer into their environment with their actions the more control they have over their environment the more empowered the agent is so if you're stuck inside a solid locked box your empowerment is more or less zero none of the actions you can take will transmit much information into the world outside the box but if you have the key to the Box your empowerment is much higher because now you can take actions that will have effects on the World At Large you've got options people have used empowerment as a reward for experimental AI systems and it makes them do some interesting things like picking up Keys avoiding walls even things like uh balancing an inverted pendulum or a bicycle you don't have to tell it to keep the bike balanced it just learns that if the bike falls over the agent's actions will have less control over the environment so it wants to keep the bike upright so empowerment is a pretty neat metric because it's very simple but it captures something that humans and other intelligent agents are likely to want we want more options more freedom more capabilities more influence more control over our environment and maybe that's something we don't want our AI systems to want maybe we want to say clean up that mess but try not to gain too much control or influence over your surroundings don't have too much empowerment that could make the robot think if I bring this bucket of water into the server room I'll have the option to destroy the servers so I'll go around to avoid that empowerment okay so now we're at that part of the video what's wrong with this why might it not work pause the video and take a second to [Music] think well there are a few problems one thing is that because it's measuring information we're really measuring Precision of control rather than magnitude of impact as an extreme example suppose you've got your robot in a room and the only thing it has access to is a big button which if pressed will blow up the Moon that actually only counts as one bit of empowerment the button is either pressed or not pressed the Moon is exploded or not two choices so one bit of information one bit of empowerment on the other hand if the robot has an Ethernet cable that's feeding out lots of detailed debug information about everything the robot does and that's all being logged somewhere that's loads of information transfer loads of mutual information with the environment so loads of empowerment the robot cares way more about unplugging the debug cable than anything to do with the button and then you have another possible problem which is perverse incentives okay so this button is only one bit of empowerment nowhere near as big a deal as the debug cable but the robot still cares about it to some extent and wants to avoid putting itself in this situation where it can blow up the moon however if it finds itself already in a situation where it has one bit of empowerment because of this button the easiest way to reduce that is by pressing the button once the button's pressed the Moon is blown up the button doesn't work anymore so the robot then has basically zero bits of empowerment it's just in a box with an unconnected button and now it's content that it's managed to make itself safe it finally has no influence over the world so yeah in this admittedly contrived scenario an empowerment reducing robot will unplug its dbug cable and then blow up the moon that's not safe Behavior why did we think this might be good idea well it just makes the point that even very simple information theoretic metrics can describe interesting abstract properties like influence over the environment so maybe doing something a little bit cleverer than just penalizing empowerment might actually be useful a more sophisticated metric a better architecture around it you know there could be some way to make this work so this is an area that's probably worth looking into by AI safety researchers so that's all for now next thing in the paper is multi agent approaches which should be really interesting make sure to subscribe and hit the Bell if you want to be notified when that's out also make sure you're subscribed to computer file cuz I'm probably going to make some new videos there as well since some of the multi-agent stuff is closely related to the stop button problem that I already talked about so it might be nice to put those together thanks for watching I hope to see you next [Music] time in this video I want to thank oin flick who's supported me on patreon since April thank you and thank you again to all of my wonderful patreon supporters all of these people I've been setting up a room in my house to be a full-time studio uh I might make a behind the scenes video about that soon oh and I've got these pictures that I drew uh while making this video which I have no use for now uh does anyone want them God the internet is weird sometimes isn't it but yeah I can probably post them to supporters if anyone wants one uh let me know Bo thank flick I was close
Original Description
Maybe AI systems would be safer if they avoid gaining too much control over their environment? How might that work?
This is a follow-up to this earlier video: https://youtu.be/lqJUIqZNzP8
The paper 'Concrete Problems in AI Safety': https://arxiv.org/pdf/1606.06565.pdf
A book chapter about Empowerment: https://arxiv.org/pdf/1310.1863.pdf
Prof Brailsford's Information Theory Videos: https://www.youtube.com/watch?v=Lto-ajuqW3w&list=PLzH6n4zXuckpKAj1_88VS-8Z6yn9zX_P6
Thanks to my amazing Patreon Supporters:
Sara Tjäder
Jason Strack
Chad Jones
Ichiro Dohi
Stefan Skiles
Katie Byrne
Ziyang Liu
Jordan Medina
James McCuen
Joshua Richardson
Fabian Consiglio
Jonatan R
Øystein Flygt
Björn Mosten
Michael Greve
robertvanduursen
The Guru Of Vision
Fabrizio Pisani
Alexander Hartvig Nielsen
Volodymyr
Peggy Youell
Konstantin Shabashov
Almighty Dodd
DGJono
Matthias Meger
Scott Stevens
Emilio Alvarez
Benjamin Aaron Degenhart
Michael Ore
Robert Bridges
Dmitri Afanasjev
Brian Sandberg
Einar Ueland
Lo Rez
C3POehne
https://www.patreon.com/robertskmiles
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Robert Miles AI Safety · Robert Miles AI Safety · 6 of 47
1
2
3
4
5
▶
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
Predicting AI: RIP Prof. Hubert Dreyfus
Robert Miles AI Safety
Respectability
Robert Miles AI Safety
Are AI Risks like Nuclear Risks?
Robert Miles AI Safety
Avoiding Negative Side Effects: Concrete Problems in AI Safety part 1
Robert Miles AI Safety
Avoiding Positive Side Effects: Concrete Problems in AI Safety part 1.5
Robert Miles AI Safety
Empowerment: Concrete Problems in AI Safety part 2
Robert Miles AI Safety
Why Not Just: Raise AI Like Kids?
Robert Miles AI Safety
Reward Hacking: Concrete Problems in AI Safety Part 3
Robert Miles AI Safety
The other "Killer Robot Arms Race" Elon Musk should worry about
Robert Miles AI Safety
Reward Hacking Reloaded: Concrete Problems in AI Safety Part 3.5
Robert Miles AI Safety
What Can We Do About Reward Hacking?: Concrete Problems in AI Safety Part 4
Robert Miles AI Safety
What can AGI do? I/O and Speed
Robert Miles AI Safety
AI learns to Create ̵K̵Z̵F̵ ̵V̵i̵d̵e̵o̵s̵ Cat Pictures: Papers in Two Minutes #1
Robert Miles AI Safety
AI Safety at EAGlobal2017 Conference
Robert Miles AI Safety
Scalable Supervision: Concrete Problems in AI Safety Part 5
Robert Miles AI Safety
Superintelligence Mod for Civilization V
Robert Miles AI Safety
Why Would AI Want to do Bad Things? Instrumental Convergence
Robert Miles AI Safety
Experts' Predictions about the Future of AI
Robert Miles AI Safety
AI Safety Gridworlds
Robert Miles AI Safety
Friend or Foe? AI Safety Gridworlds extra bit
Robert Miles AI Safety
Safe Exploration: Concrete Problems in AI Safety Part 6
Robert Miles AI Safety
Why Not Just: Think of AGI Like a Corporation?
Robert Miles AI Safety
How to Keep Improving When You're Better Than Any Teacher - Iterated Distillation and Amplification
Robert Miles AI Safety
Is AI Safety a Pascal's Mugging?
Robert Miles AI Safety
AI That Doesn't Try Too Hard - Maximizers and Satisficers
Robert Miles AI Safety
Training AI Without Writing A Reward Function, with Reward Modelling
Robert Miles AI Safety
9 Examples of Specification Gaming
Robert Miles AI Safety
10 Reasons to Ignore AI Safety
Robert Miles AI Safety
Sharing the Benefits of AI: The Windfall Clause
Robert Miles AI Safety
Quantilizers: AI That Doesn't Try Too Hard
Robert Miles AI Safety
The OTHER AI Alignment Problem: Mesa-Optimizers and Inner Alignment
Robert Miles AI Safety
Deceptive Misaligned Mesa-Optimisers? It's More Likely Than You Think...
Robert Miles AI Safety
Intro to AI Safety, Remastered
Robert Miles AI Safety
We Were Right! Real Inner Misalignment
Robert Miles AI Safety
Apply to AI Safety Camp! #shorts
Robert Miles AI Safety
Win $50k for Solving a Single AI Problem? #Shorts
Robert Miles AI Safety
Free ML Bootcamp for Alignment #shorts
Robert Miles AI Safety
Apply Now for a Paid Residency on Interpretability #short
Robert Miles AI Safety
Why Does AI Lie, and What Can We Do About It?
Robert Miles AI Safety
Apply to Study AI Safety Now! #shorts
Robert Miles AI Safety
AI Ruined My Year
Robert Miles AI Safety
Learn AI Safety at MATS #shorts
Robert Miles AI Safety
Using Dangerous AI, But Safely?
Robert Miles AI Safety
AI Safety Career Advice! (And So Can You!)
Robert Miles AI Safety
Robot Dog! Unitree Go2 review #shorts #robot #dog
Robert Miles AI Safety
Tech is Good, AI Will Be Different
Robert Miles AI Safety
Apply for the Affine Superintelligence Alignment Seminar #shorts
Robert Miles AI Safety
More on: AI Alignment Basics
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
I Spent Weeks Looking for a Research Gap Before I Realized I Was Searching the Wrong Way
Medium · AI
ICMI 2026 Reviews [D]
Reddit r/MachineLearning
Workshop submission for main conference paper under review [D]
Reddit r/MachineLearning
Kept context-switching between arxiv, OpenReview, GitHub, and HuggingFace for every paper, so I built this. Chrome extension + website with everything inline, plus citation graph + SPECTER2 neighbors. 3M papers, free, feedback welcome [P]
Reddit r/MachineLearning
🎓
Tutor Explanation
DeepCamp AI