AI That Doesn't Try Too Hard - Maximizers and Satisficers
Key Takeaways
The video discusses the concept of maximizers and satisficers in AI systems, highlighting the potential dangers of powerful AI systems that pursue their goals too strongly, and explores the idea of building systems that aim for 'good enough' instead of perfection, using tools like GPT-2 for generating fake YouTube comments.
Full Transcript
hi so way back when I started this online air safety videos thing on computer file I was talking about how you have a problem when you maximize just about any simple utility function the example I used was an AI system meant to collect a lot of stamps which works like this the system is connected to the Internet and for all sequences of packets it could send it simulates exactly how many stamps would end up being collected after one year if it sent those packets it then selects the sequence with the most stamps and sense that this is what's called a utility Maximizer and it seems like any utility function you give this kind of system as a goal it does it to the max utility maximizers tend to take extreme actions they're happy to destroy the whole world just to get a tiny increase in the output of their utility functions so unless the utility function lines up exactly with human values their actions are pretty much guaranteed to be disastrous intuitively the issue is that utility maximizers have precisely zero chill to anthropomorphize horribly they seem to have a frantic obsessive maniacal attitude we find ourselves wanting to say look could you just dial it back a little can you just relax just a bit so suppose we want a lot of stamps but like not that many it must be possible to design a system that just collects a bunch of stamps and then stops right how can we do that well the first obvious issue with the existing design is that the utility function is unbounded the more stamps the better with no limit however many stamps it has it can get more utility by getting one more stamp any world where humans are alive and happy is a world that could have more stamps in it so the maximum of this utility function is the end of the world let's say we only really want a hundred stamps so what if we make a bounded utility function that returns whichever is smaller the number of stamps or 100 getting a hundred stamps from ebay gives 100 utility converting the whole world into stamps also gives 100 utility this function is totally indifferent between all outcomes that contain more than a hundred stamps so what does a Maximizer of this utility function actually do now the system's behavior is no longer really specified it will do one of the things that results in a hundred utility which includes a bunch of perfectly reasonable behaviors that the programmer would be happy with and a bunch of apocalypse is and a bunch of outcomes somewhere in between if you select at random from all courses of action that result in at least 100 stamps what proportion of those are actually acceptable outcomes for humans I don't know probably not enough this is still a step up though because the previous utility function was guaranteed to kill everyone and this new one has at least some probability of doing the right thing but actually of course this utility Maximizer concept is too unrealistic even in the realm of talking about hypothetical agents in the abstract in the field experiment our stamp collector system is able to know with certainty exactly how many stamps any particular course of action will result in but you just can't simulate the world that accurately it's more than just computationally intractable it's probably not even allowed by physics pure utility maximization is only available for very simple problems where everything is deterministic and fully known if there's any uncertainty you have to do expected utility maximizing this is pretty straightforwardly how you'd expect to apply uncertainty to this situation the expected utility of a choice is the utility you'd expect to get from it on average so like suppose there's a button that flips a coin and if its tail's you get 50 stamps and if it's heads you get 150 stamps in expectation this results in a hundred stamps right it never actually returns 100 but on average that's what you get that's the expected number of stamps to get the expected utility you just apply your utility function to each of the outcomes before you do the rest of the calculation so if your utility function is just how many stamps do I get then the expected utility of the button is 100 but if your utility function is capped at a hundred for example then the outcome of winning one hundred and fifty stamps is now only worth a hundred utility so the expected utility of the button is only 75 now suppose there were a second button that gives either eighty or ninety stamps again with 50/50 probability this gives 85 stamps in expectation and since none of the outcomes are more than 100 both of the functions value this button at 85 utility so this means the agent with the unbounded utility function would prefer the first button with its expected 100 stamps but the agent with the bounded utility function would prefer the second button since its expected utility of 85 is higher than the buttons expected utility of 75 this makes the bounded utility function feel a little safer in this case it actually makes the agent prefer the option that results in fewer stamps because it just doesn't care about any stamps past 100 in the same way let's consider some risky extreme stamp collecting plan this plan is pretty likely to fail and in that case the agent might be destroyed and get no stamps but if the plan succeeds the agent could take over the world and get a trillion stamps an agent with an unbounded utility function would rate this plan pretty highly the huge utility of taking over the world makes the risk worth it but the agent with the bounded utility function doesn't prefer a trillion stamps to a hundred stamps it only gets 100 utility either way so it would much prefer a conservative strategy that just gets a hundred stamps with high confidence but how does this kind of system behave in the real world where you never really know anything with absolute certainty the pure utility Maximizer that effectively knows the future can order a hundred stamps and know that it will get 100 stamps but the expected utility maximize it doesn't know for sure the seller might be lying the package might get lost and so on so if the expected utility of ordering a hundred stamps is a bit less than 100 if there's a 1% chance that something goes wrong and we get 0 stamps then our expected utility is only 99 that's below the limit of 100 so we can improve that by ordering some extras to be on the safe side maybe we order another 100 now our expected utility is 99.99 still not a hundred so we should order some more just in case now we're at 99.9999 the expected value of a utility function that's bounded at 100 can never actually hit 100 you can always become slightly more certain that you've got at least 100 stamps better turn the whole world into stamps because hey you never know so an expected utility Maximizer with a bounded utility function ends up pretty much as dangerous as one with an unbounded utility function ok what if we try to limit it from both sides like you get a hundred utility if you have a hundred stamps and zero otherwise now it's not going to collect a trillion stamps just to be sure it will collect exactly 100 stamps but it's still incentivized to take extreme actions to be sure that it really does have a hundred like turning the whole world into elaborate highly stamp counting and recounting machinery getting slightly more utility every time it checks again it seems like whatever we try to maximize it causes problems so maybe we could try not maximizing maybe we could try what's called satisficing rather than trying to get our utility function to return as higher value as possible and expectation what if we set a threshold and accept any strategy that passes that threshold in the case of the stamp collector that would look like look through possible ways you could send out packets calculate how many stamps you'd expect to collect on average with each strategy and as soon as you hit one that you expect to get at least 100 stamps just go with that one this satisficer seems to get us to about where we were with the pure utility Maximizer with a bounded utility function it's not clear exactly what it will do except that it will do one of the things that results in more than a hundred stamps in expectation which again includes a lot of sensible behaviors and a lot of apocalypses and a lot of things somewhere in between since the system implements the first satisfactory strategy it finds the specific behavior depends on the order in which it considers the options what automated use well one obvious approach is to go with the simplest or shortest plans first after all any plan that takes over the world probably requires much more complexity than just ordering some stamps on eBay but consider the following plan get into your own source code and change yourself from a satisficer into a Maximizer all you're doing there is changing a few lines of code on your own system so this is a pretty simple plan that's likely to be considered fairly early on it might not be simpler than just ordering some stamps but that's not much reassurance the more challenging the task we give our AGI the more likely it is that it will hit on this kind of self modification strategy before any legitimate ones and the plan certainly satisfies the search criteria if you change yourself into a Maximizer that Maximizer will predictably find and implement some plan that results in a lot of stamps so you can tell that the expected stamp output of the become a Maximizer plan is satisfactorily high even without knowing what plan the Maximizer will actually implement so satisficers kind of want to become maximizes which means that being a satisficer is unstable as a safety feature it tends to uninstall itself so to recap a powerful utility maximized with an unbounded utility function is a guaranteed apocalypse with a bounded utility function it's better in that it's completely indifferent between doing what we want and disaster but we can't build that because it needs perfect prediction of the future so it's more realistic to consider an expected utility Maximizer which is a guaranteed apocalypse even with a bounded utility function now an expected utility satisficer gets us back up to in difference between good outcomes and apocalypses but it may want to modify itself into a Maximizer and there's nothing to stop it from doing that so currently things aren't looking great but we're not done people have thought of more approaches and we'll talk about some of those in the next video I want to end the video with a big thank you to all of my wonderful Patriots that's all of these great people right here in this video I'm especially thanking Simon strand card thank you so much you know thanks to your support I was able to buy this boat for this I bought a green screen actually but I like it because it lets me make videos like this one that I put up on my second channel where I used GPT to to generate a bunch of fake YouTube comments and read them that video ties in with three other videos I made with computer file talking about the ethics of releasing AI systems that might have malicious uses so you can check all of those out there's links in the description thank you again to my patrons and thank you all for watching I'll see you next time
Original Description
Powerful AI systems can be dangerous in part because they pursue their goals as strongly as they can. Perhaps it would be safer to have systems that don't aim for perfection, and stop at 'good enough'. How could we build something like that?
Generating Fake YouTube comments with GPT-2: https://youtu.be/M6EXmoP5jX8
Computerphile Videos:
Unicorn AI: https://youtu.be/89A4jGvaaKk
More GPT-2, the 'writer' of Unicorn AI: https://youtu.be/p-6F4rhRYLQ
AI Language Models & Transformers: https://youtu.be/rURRYI66E54
GPT-2: Why Didn't They Release It?: https://youtu.be/AJxLtdur5fc
The Deadly Truth of General AI?: https://youtu.be/tcdVC4e6EV4
With thanks to my excellent Patreon supporters:
https://www.patreon.com/robertskmiles
Scott Worley
Jordan Medina
Simon Strandgaard
JJ Hepboin
Lupuleasa Ionuț
Pedro A Ortega
Said Polat
Chris Canal
Nicholas Kees Dupuis
Jake Ehrlich
Mark Hechim
Kellen lask
Francisco Tolmasky
Michael Andregg
Alexandru Dobre
David Reid
Robert Daniel Pickard
Peter Rolf
Chad Jones
Truthdoc
James
Richárd Nagyfi
Jason Hise
Phil Moyer
Shevis Johnson
Alec Johnson
Clemens Arbesser
Ludwig Schubert
Bryce Daifuku
Allen Faure
Eric James
Jonatan R
Ingvi Gautsson
Michael Greve
Julius Brash
Tom O'Connor
Erik de Bruijn
Robin Green
Laura Olds
Jon Halliday
Paul Hobbs
Jeroen De Dauw
Tim Neilson
Eric Scammell
Igor Keller
Ben Glanton
Robert Sokolowski
anul kumar sinha
Jérôme Frossard
Sean Gibat
Cooper Lawton
Tyler Herrmann
Tomas Sayder
Ian Munro
Jérôme Beaulieu
Taras Bobrovytsky
Anne Buit
Tom Murphy
Vaskó Richárd
Sebastian Birjoveanu
Gladamas
Sylvain Chevalier
DGJono
Dmitri Afanasjev
Brian Sandberg
Marcel Ward
Andrew Weir
Ben Archer
Scott McCarthy
Kabs
Miłosz Wierzbicki
Tendayi Mawushe
Jannik Olbrich
Anne Kohlbrenner
Jussi Männistö
Mr Fantastic
Wr4thon
Martin Ottosen
Archy de Berker
Marc Pauly
Joshua Pratt
Andy Kobre
Brian Gillespie
Martin Wind
Peggy Youell
Poker Chen
Kees
Darko Sperac
Truls
Paul Moffat
Anders Öhrt
Marco Tiraboschi
Michael Kuhinica
Fraser Cain
Robin Scharf
Or
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Robert Miles AI Safety · Robert Miles AI Safety · 25 of 47
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
▶
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
Predicting AI: RIP Prof. Hubert Dreyfus
Robert Miles AI Safety
Respectability
Robert Miles AI Safety
Are AI Risks like Nuclear Risks?
Robert Miles AI Safety
Avoiding Negative Side Effects: Concrete Problems in AI Safety part 1
Robert Miles AI Safety
Avoiding Positive Side Effects: Concrete Problems in AI Safety part 1.5
Robert Miles AI Safety
Empowerment: Concrete Problems in AI Safety part 2
Robert Miles AI Safety
Why Not Just: Raise AI Like Kids?
Robert Miles AI Safety
Reward Hacking: Concrete Problems in AI Safety Part 3
Robert Miles AI Safety
The other "Killer Robot Arms Race" Elon Musk should worry about
Robert Miles AI Safety
Reward Hacking Reloaded: Concrete Problems in AI Safety Part 3.5
Robert Miles AI Safety
What Can We Do About Reward Hacking?: Concrete Problems in AI Safety Part 4
Robert Miles AI Safety
What can AGI do? I/O and Speed
Robert Miles AI Safety
AI learns to Create ̵K̵Z̵F̵ ̵V̵i̵d̵e̵o̵s̵ Cat Pictures: Papers in Two Minutes #1
Robert Miles AI Safety
AI Safety at EAGlobal2017 Conference
Robert Miles AI Safety
Scalable Supervision: Concrete Problems in AI Safety Part 5
Robert Miles AI Safety
Superintelligence Mod for Civilization V
Robert Miles AI Safety
Why Would AI Want to do Bad Things? Instrumental Convergence
Robert Miles AI Safety
Experts' Predictions about the Future of AI
Robert Miles AI Safety
AI Safety Gridworlds
Robert Miles AI Safety
Friend or Foe? AI Safety Gridworlds extra bit
Robert Miles AI Safety
Safe Exploration: Concrete Problems in AI Safety Part 6
Robert Miles AI Safety
Why Not Just: Think of AGI Like a Corporation?
Robert Miles AI Safety
How to Keep Improving When You're Better Than Any Teacher - Iterated Distillation and Amplification
Robert Miles AI Safety
Is AI Safety a Pascal's Mugging?
Robert Miles AI Safety
AI That Doesn't Try Too Hard - Maximizers and Satisficers
Robert Miles AI Safety
Training AI Without Writing A Reward Function, with Reward Modelling
Robert Miles AI Safety
9 Examples of Specification Gaming
Robert Miles AI Safety
10 Reasons to Ignore AI Safety
Robert Miles AI Safety
Sharing the Benefits of AI: The Windfall Clause
Robert Miles AI Safety
Quantilizers: AI That Doesn't Try Too Hard
Robert Miles AI Safety
The OTHER AI Alignment Problem: Mesa-Optimizers and Inner Alignment
Robert Miles AI Safety
Deceptive Misaligned Mesa-Optimisers? It's More Likely Than You Think...
Robert Miles AI Safety
Intro to AI Safety, Remastered
Robert Miles AI Safety
We Were Right! Real Inner Misalignment
Robert Miles AI Safety
Apply to AI Safety Camp! #shorts
Robert Miles AI Safety
Win $50k for Solving a Single AI Problem? #Shorts
Robert Miles AI Safety
Free ML Bootcamp for Alignment #shorts
Robert Miles AI Safety
Apply Now for a Paid Residency on Interpretability #short
Robert Miles AI Safety
Why Does AI Lie, and What Can We Do About It?
Robert Miles AI Safety
Apply to Study AI Safety Now! #shorts
Robert Miles AI Safety
AI Ruined My Year
Robert Miles AI Safety
Learn AI Safety at MATS #shorts
Robert Miles AI Safety
Using Dangerous AI, But Safely?
Robert Miles AI Safety
AI Safety Career Advice! (And So Can You!)
Robert Miles AI Safety
Robot Dog! Unitree Go2 review #shorts #robot #dog
Robert Miles AI Safety
Tech is Good, AI Will Be Different
Robert Miles AI Safety
Apply for the Affine Superintelligence Alignment Seminar #shorts
Robert Miles AI Safety
More on: AI Alignment Basics
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
Sub-10ms AI Workflows: Accelerating sim.ai with On-Device Semantic Search using Moss
Medium · Machine Learning
Stop Guessing: Guaranteed Structured Output from LLMs in Node.js
Dev.to · Hardik Mehta
Spring AI Tutorial — Your First REST Endpoint with OpenAI (2026)
Dev.to AI
Notes: Memory, Context, and Large Language Models (LLMs)
Dev.to · Vladimir Panov
🎓
Tutor Explanation
DeepCamp AI