Empowerment: Concrete Problems in AI Safety part 2

Robert Miles AI Safety · Advanced ·📄 Research Papers Explained ·8y ago

Key Takeaways

Explores the concept of empowerment in AI safety, discussing how to prevent AI systems from gaining too much control over their environment

Full Transcript

hi this is part of a series about the paper concrete problems in AI safety which looks at preventing possible accidents in AI systems last time we talked about avoiding negative side effects and how one way of doing that is to create systems that try not to have too much impact to not change the environment around them too much this video is about a slightly more subtle idea than penalizing impact penalizing influence so suppose we have a robot it's a cleaning robot so it's got a mop and a bucket and an apron I'm uh I'm trying something new here bear with me so the robot knows that there's a mess over here that it needs to clean up but in between the robot and the mess is the server room which is full of expensive and delicate equipment now if an AI system doesn't want to have a large impact it won't make plans that involve tipping the bucket of water over the service but maybe we can be safer than that we might want our robot to not even want to bring the bucket of water into the server room to have a preference for going around it instead we might wanted to think something like not only do I not want to have too big of an impact on my surroundings I also don't want to put myself in a situation where it would be easy for me to have a big impact on my surroundings how do we formalize that idea well perhaps we can use information Theory the paper talks about an information theoret metric called empowerment which is a measure of the maximum possible Mutual information between the agent's potential future actions and the potential future State that's equivalent to the capacity of the information Channel between the agent's actions and the environment I.E the rate that the agent's actions transmit information into the environment measured in bits the more information an agent is able to transfer into their environment with their actions the more control they have over their environment the more empowered the agent is so if you're stuck inside a solid locked box your empowerment is more or less zero none of the actions you can take will transmit much information into the world outside the box but if you have the key to the Box your empowerment is much higher because now you can take actions that will have effects on the World At Large you've got options people have used empowerment as a reward for experimental AI systems and it makes them do some interesting things like picking up Keys avoiding walls even things like uh balancing an inverted pendulum or a bicycle you don't have to tell it to keep the bike balanced it just learns that if the bike falls over the agent's actions will have less control over the environment so it wants to keep the bike upright so empowerment is a pretty neat metric because it's very simple but it captures something that humans and other intelligent agents are likely to want we want more options more freedom more capabilities more influence more control over our environment and maybe that's something we don't want our AI systems to want maybe we want to say clean up that mess but try not to gain too much control or influence over your surroundings don't have too much empowerment that could make the robot think if I bring this bucket of water into the server room I'll have the option to destroy the servers so I'll go around to avoid that empowerment okay so now we're at that part of the video what's wrong with this why might it not work pause the video and take a second to [Music] think well there are a few problems one thing is that because it's measuring information we're really measuring Precision of control rather than magnitude of impact as an extreme example suppose you've got your robot in a room and the only thing it has access to is a big button which if pressed will blow up the Moon that actually only counts as one bit of empowerment the button is either pressed or not pressed the Moon is exploded or not two choices so one bit of information one bit of empowerment on the other hand if the robot has an Ethernet cable that's feeding out lots of detailed debug information about everything the robot does and that's all being logged somewhere that's loads of information transfer loads of mutual information with the environment so loads of empowerment the robot cares way more about unplugging the debug cable than anything to do with the button and then you have another possible problem which is perverse incentives okay so this button is only one bit of empowerment nowhere near as big a deal as the debug cable but the robot still cares about it to some extent and wants to avoid putting itself in this situation where it can blow up the moon however if it finds itself already in a situation where it has one bit of empowerment because of this button the easiest way to reduce that is by pressing the button once the button's pressed the Moon is blown up the button doesn't work anymore so the robot then has basically zero bits of empowerment it's just in a box with an unconnected button and now it's content that it's managed to make itself safe it finally has no influence over the world so yeah in this admittedly contrived scenario an empowerment reducing robot will unplug its dbug cable and then blow up the moon that's not safe Behavior why did we think this might be good idea well it just makes the point that even very simple information theoretic metrics can describe interesting abstract properties like influence over the environment so maybe doing something a little bit cleverer than just penalizing empowerment might actually be useful a more sophisticated metric a better architecture around it you know there could be some way to make this work so this is an area that's probably worth looking into by AI safety researchers so that's all for now next thing in the paper is multi agent approaches which should be really interesting make sure to subscribe and hit the Bell if you want to be notified when that's out also make sure you're subscribed to computer file cuz I'm probably going to make some new videos there as well since some of the multi-agent stuff is closely related to the stop button problem that I already talked about so it might be nice to put those together thanks for watching I hope to see you next [Music] time in this video I want to thank oin flick who's supported me on patreon since April thank you and thank you again to all of my wonderful patreon supporters all of these people I've been setting up a room in my house to be a full-time studio uh I might make a behind the scenes video about that soon oh and I've got these pictures that I drew uh while making this video which I have no use for now uh does anyone want them God the internet is weird sometimes isn't it but yeah I can probably post them to supporters if anyone wants one uh let me know Bo thank flick I was close

Original Description

Maybe AI systems would be safer if they avoid gaining too much control over their environment? How might that work? This is a follow-up to this earlier video: https://youtu.be/lqJUIqZNzP8 The paper 'Concrete Problems in AI Safety': https://arxiv.org/pdf/1606.06565.pdf A book chapter about Empowerment: https://arxiv.org/pdf/1310.1863.pdf Prof Brailsford's Information Theory Videos: https://www.youtube.com/watch?v=Lto-ajuqW3w&list=PLzH6n4zXuckpKAj1_88VS-8Z6yn9zX_P6 Thanks to my amazing Patreon Supporters: Sara Tjäder Jason Strack Chad Jones Ichiro Dohi Stefan Skiles Katie Byrne Ziyang Liu Jordan Medina James McCuen Joshua Richardson Fabian Consiglio Jonatan R Øystein Flygt Björn Mosten Michael Greve robertvanduursen The Guru Of Vision Fabrizio Pisani Alexander Hartvig Nielsen Volodymyr Peggy Youell Konstantin Shabashov Almighty Dodd DGJono Matthias Meger Scott Stevens Emilio Alvarez Benjamin Aaron Degenhart Michael Ore Robert Bridges Dmitri Afanasjev Brian Sandberg Einar Ueland Lo Rez C3POehne https://www.patreon.com/robertskmiles
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Robert Miles AI Safety · Robert Miles AI Safety · 6 of 47

1 Predicting AI: RIP Prof. Hubert Dreyfus
Predicting AI: RIP Prof. Hubert Dreyfus
Robert Miles AI Safety
2 Respectability
Respectability
Robert Miles AI Safety
3 Are AI Risks like Nuclear Risks?
Are AI Risks like Nuclear Risks?
Robert Miles AI Safety
4 Avoiding Negative Side Effects: Concrete Problems in AI Safety part 1
Avoiding Negative Side Effects: Concrete Problems in AI Safety part 1
Robert Miles AI Safety
5 Avoiding Positive Side Effects: Concrete Problems in AI Safety part 1.5
Avoiding Positive Side Effects: Concrete Problems in AI Safety part 1.5
Robert Miles AI Safety
Empowerment: Concrete Problems in AI Safety part 2
Empowerment: Concrete Problems in AI Safety part 2
Robert Miles AI Safety
7 Why Not Just: Raise AI Like Kids?
Why Not Just: Raise AI Like Kids?
Robert Miles AI Safety
8 Reward Hacking: Concrete Problems in AI Safety Part 3
Reward Hacking: Concrete Problems in AI Safety Part 3
Robert Miles AI Safety
9 The other "Killer Robot Arms Race" Elon Musk should worry about
The other "Killer Robot Arms Race" Elon Musk should worry about
Robert Miles AI Safety
10 Reward Hacking Reloaded: Concrete Problems in AI Safety Part 3.5
Reward Hacking Reloaded: Concrete Problems in AI Safety Part 3.5
Robert Miles AI Safety
11 What Can We Do About Reward Hacking?: Concrete Problems in AI Safety Part 4
What Can We Do About Reward Hacking?: Concrete Problems in AI Safety Part 4
Robert Miles AI Safety
12 What can AGI do? I/O and Speed
What can AGI do? I/O and Speed
Robert Miles AI Safety
13 AI learns to Create  ̵K̵Z̵F̵ ̵V̵i̵d̵e̵o̵s̵ Cat Pictures: Papers in Two Minutes #1
AI learns to Create ̵K̵Z̵F̵ ̵V̵i̵d̵e̵o̵s̵ Cat Pictures: Papers in Two Minutes #1
Robert Miles AI Safety
14 AI Safety at EAGlobal2017 Conference
AI Safety at EAGlobal2017 Conference
Robert Miles AI Safety
15 Scalable Supervision: Concrete Problems in AI Safety Part 5
Scalable Supervision: Concrete Problems in AI Safety Part 5
Robert Miles AI Safety
16 Superintelligence Mod for Civilization V
Superintelligence Mod for Civilization V
Robert Miles AI Safety
17 Why Would AI Want to do Bad Things? Instrumental Convergence
Why Would AI Want to do Bad Things? Instrumental Convergence
Robert Miles AI Safety
18 Experts' Predictions about the Future of AI
Experts' Predictions about the Future of AI
Robert Miles AI Safety
19 AI Safety Gridworlds
AI Safety Gridworlds
Robert Miles AI Safety
20 Friend or Foe? AI Safety Gridworlds extra bit
Friend or Foe? AI Safety Gridworlds extra bit
Robert Miles AI Safety
21 Safe Exploration: Concrete Problems in AI Safety Part 6
Safe Exploration: Concrete Problems in AI Safety Part 6
Robert Miles AI Safety
22 Why Not Just: Think of AGI Like a Corporation?
Why Not Just: Think of AGI Like a Corporation?
Robert Miles AI Safety
23 How to Keep Improving When You're Better Than Any Teacher - Iterated Distillation and Amplification
How to Keep Improving When You're Better Than Any Teacher - Iterated Distillation and Amplification
Robert Miles AI Safety
24 Is AI Safety a Pascal's Mugging?
Is AI Safety a Pascal's Mugging?
Robert Miles AI Safety
25 AI That Doesn't Try Too Hard - Maximizers and Satisficers
AI That Doesn't Try Too Hard - Maximizers and Satisficers
Robert Miles AI Safety
26 Training AI Without Writing A Reward Function, with Reward Modelling
Training AI Without Writing A Reward Function, with Reward Modelling
Robert Miles AI Safety
27 9 Examples of Specification Gaming
9 Examples of Specification Gaming
Robert Miles AI Safety
28 10 Reasons to Ignore AI Safety
10 Reasons to Ignore AI Safety
Robert Miles AI Safety
29 Sharing the Benefits of AI: The Windfall Clause
Sharing the Benefits of AI: The Windfall Clause
Robert Miles AI Safety
30 Quantilizers: AI That Doesn't Try Too Hard
Quantilizers: AI That Doesn't Try Too Hard
Robert Miles AI Safety
31 The OTHER AI Alignment Problem: Mesa-Optimizers and Inner Alignment
The OTHER AI Alignment Problem: Mesa-Optimizers and Inner Alignment
Robert Miles AI Safety
32 Deceptive Misaligned Mesa-Optimisers? It's More Likely Than You Think...
Deceptive Misaligned Mesa-Optimisers? It's More Likely Than You Think...
Robert Miles AI Safety
33 Intro to AI Safety, Remastered
Intro to AI Safety, Remastered
Robert Miles AI Safety
34 We Were Right! Real Inner Misalignment
We Were Right! Real Inner Misalignment
Robert Miles AI Safety
35 Apply to AI Safety Camp! #shorts
Apply to AI Safety Camp! #shorts
Robert Miles AI Safety
36 Win $50k for Solving a Single AI Problem? #Shorts
Win $50k for Solving a Single AI Problem? #Shorts
Robert Miles AI Safety
37 Free ML Bootcamp for Alignment #shorts
Free ML Bootcamp for Alignment #shorts
Robert Miles AI Safety
38 Apply Now for a Paid Residency on Interpretability #short
Apply Now for a Paid Residency on Interpretability #short
Robert Miles AI Safety
39 Why Does AI Lie, and What Can We Do About It?
Why Does AI Lie, and What Can We Do About It?
Robert Miles AI Safety
40 Apply to Study AI Safety Now! #shorts
Apply to Study AI Safety Now! #shorts
Robert Miles AI Safety
41 AI Ruined My Year
AI Ruined My Year
Robert Miles AI Safety
42 Learn AI Safety at MATS #shorts
Learn AI Safety at MATS #shorts
Robert Miles AI Safety
43 Using Dangerous AI, But Safely?
Using Dangerous AI, But Safely?
Robert Miles AI Safety
44 AI Safety Career Advice! (And So Can You!)
AI Safety Career Advice! (And So Can You!)
Robert Miles AI Safety
45 Robot Dog! Unitree Go2 review #shorts #robot #dog
Robot Dog! Unitree Go2 review #shorts #robot #dog
Robert Miles AI Safety
46 Tech is Good, AI Will Be Different
Tech is Good, AI Will Be Different
Robert Miles AI Safety
47 Apply for the Affine Superintelligence Alignment Seminar #shorts
Apply for the Affine Superintelligence Alignment Seminar #shorts
Robert Miles AI Safety

Related Reads

📰
I Spent Weeks Looking for a Research Gap Before I Realized I Was Searching the Wrong Way
Learn how to effectively find research gaps by changing your approach, a crucial skill for AI researchers and academics
Medium · AI
📰
ICMI 2026 Reviews [D]
Learn how to interpret ICMI 2026 reviews and improve your paper's acceptance chances
Reddit r/MachineLearning
📰
Workshop submission for main conference paper under review [D]
Learn how to navigate submitting a paper to a non-archival workshop before the final decision of a main conference like ECCV
Reddit r/MachineLearning
📰
Kept context-switching between arxiv, OpenReview, GitHub, and HuggingFace for every paper, so I built this. Chrome extension + website with everything inline, plus citation graph + SPECTER2 neighbors. 3M papers, free, feedback welcome [P]
Streamline your research with a new Chrome extension and website that integrates 3M papers from arxiv, OpenReview, GitHub, and HuggingFace, including citation graphs and SPECTER2 neighbors, and provide feedback to improve it
Reddit r/MachineLearning
Up next
Indians Under House Arrest in America? 😱 Immigration Crisis Explained | SumanTV Classroom
SumanTV Classroom
Watch →