AI That Doesn't Try Too Hard - Maximizers and Satisficers

Robert Miles AI Safety · Intermediate ·🧠 Large Language Models ·6y ago

Key Takeaways

The video discusses the concept of maximizers and satisficers in AI systems, highlighting the potential dangers of powerful AI systems that pursue their goals too strongly, and explores the idea of building systems that aim for 'good enough' instead of perfection, using tools like GPT-2 for generating fake YouTube comments.

Full Transcript

hi so way back when I started this online air safety videos thing on computer file I was talking about how you have a problem when you maximize just about any simple utility function the example I used was an AI system meant to collect a lot of stamps which works like this the system is connected to the Internet and for all sequences of packets it could send it simulates exactly how many stamps would end up being collected after one year if it sent those packets it then selects the sequence with the most stamps and sense that this is what's called a utility Maximizer and it seems like any utility function you give this kind of system as a goal it does it to the max utility maximizers tend to take extreme actions they're happy to destroy the whole world just to get a tiny increase in the output of their utility functions so unless the utility function lines up exactly with human values their actions are pretty much guaranteed to be disastrous intuitively the issue is that utility maximizers have precisely zero chill to anthropomorphize horribly they seem to have a frantic obsessive maniacal attitude we find ourselves wanting to say look could you just dial it back a little can you just relax just a bit so suppose we want a lot of stamps but like not that many it must be possible to design a system that just collects a bunch of stamps and then stops right how can we do that well the first obvious issue with the existing design is that the utility function is unbounded the more stamps the better with no limit however many stamps it has it can get more utility by getting one more stamp any world where humans are alive and happy is a world that could have more stamps in it so the maximum of this utility function is the end of the world let's say we only really want a hundred stamps so what if we make a bounded utility function that returns whichever is smaller the number of stamps or 100 getting a hundred stamps from ebay gives 100 utility converting the whole world into stamps also gives 100 utility this function is totally indifferent between all outcomes that contain more than a hundred stamps so what does a Maximizer of this utility function actually do now the system's behavior is no longer really specified it will do one of the things that results in a hundred utility which includes a bunch of perfectly reasonable behaviors that the programmer would be happy with and a bunch of apocalypse is and a bunch of outcomes somewhere in between if you select at random from all courses of action that result in at least 100 stamps what proportion of those are actually acceptable outcomes for humans I don't know probably not enough this is still a step up though because the previous utility function was guaranteed to kill everyone and this new one has at least some probability of doing the right thing but actually of course this utility Maximizer concept is too unrealistic even in the realm of talking about hypothetical agents in the abstract in the field experiment our stamp collector system is able to know with certainty exactly how many stamps any particular course of action will result in but you just can't simulate the world that accurately it's more than just computationally intractable it's probably not even allowed by physics pure utility maximization is only available for very simple problems where everything is deterministic and fully known if there's any uncertainty you have to do expected utility maximizing this is pretty straightforwardly how you'd expect to apply uncertainty to this situation the expected utility of a choice is the utility you'd expect to get from it on average so like suppose there's a button that flips a coin and if its tail's you get 50 stamps and if it's heads you get 150 stamps in expectation this results in a hundred stamps right it never actually returns 100 but on average that's what you get that's the expected number of stamps to get the expected utility you just apply your utility function to each of the outcomes before you do the rest of the calculation so if your utility function is just how many stamps do I get then the expected utility of the button is 100 but if your utility function is capped at a hundred for example then the outcome of winning one hundred and fifty stamps is now only worth a hundred utility so the expected utility of the button is only 75 now suppose there were a second button that gives either eighty or ninety stamps again with 50/50 probability this gives 85 stamps in expectation and since none of the outcomes are more than 100 both of the functions value this button at 85 utility so this means the agent with the unbounded utility function would prefer the first button with its expected 100 stamps but the agent with the bounded utility function would prefer the second button since its expected utility of 85 is higher than the buttons expected utility of 75 this makes the bounded utility function feel a little safer in this case it actually makes the agent prefer the option that results in fewer stamps because it just doesn't care about any stamps past 100 in the same way let's consider some risky extreme stamp collecting plan this plan is pretty likely to fail and in that case the agent might be destroyed and get no stamps but if the plan succeeds the agent could take over the world and get a trillion stamps an agent with an unbounded utility function would rate this plan pretty highly the huge utility of taking over the world makes the risk worth it but the agent with the bounded utility function doesn't prefer a trillion stamps to a hundred stamps it only gets 100 utility either way so it would much prefer a conservative strategy that just gets a hundred stamps with high confidence but how does this kind of system behave in the real world where you never really know anything with absolute certainty the pure utility Maximizer that effectively knows the future can order a hundred stamps and know that it will get 100 stamps but the expected utility maximize it doesn't know for sure the seller might be lying the package might get lost and so on so if the expected utility of ordering a hundred stamps is a bit less than 100 if there's a 1% chance that something goes wrong and we get 0 stamps then our expected utility is only 99 that's below the limit of 100 so we can improve that by ordering some extras to be on the safe side maybe we order another 100 now our expected utility is 99.99 still not a hundred so we should order some more just in case now we're at 99.9999 the expected value of a utility function that's bounded at 100 can never actually hit 100 you can always become slightly more certain that you've got at least 100 stamps better turn the whole world into stamps because hey you never know so an expected utility Maximizer with a bounded utility function ends up pretty much as dangerous as one with an unbounded utility function ok what if we try to limit it from both sides like you get a hundred utility if you have a hundred stamps and zero otherwise now it's not going to collect a trillion stamps just to be sure it will collect exactly 100 stamps but it's still incentivized to take extreme actions to be sure that it really does have a hundred like turning the whole world into elaborate highly stamp counting and recounting machinery getting slightly more utility every time it checks again it seems like whatever we try to maximize it causes problems so maybe we could try not maximizing maybe we could try what's called satisficing rather than trying to get our utility function to return as higher value as possible and expectation what if we set a threshold and accept any strategy that passes that threshold in the case of the stamp collector that would look like look through possible ways you could send out packets calculate how many stamps you'd expect to collect on average with each strategy and as soon as you hit one that you expect to get at least 100 stamps just go with that one this satisficer seems to get us to about where we were with the pure utility Maximizer with a bounded utility function it's not clear exactly what it will do except that it will do one of the things that results in more than a hundred stamps in expectation which again includes a lot of sensible behaviors and a lot of apocalypses and a lot of things somewhere in between since the system implements the first satisfactory strategy it finds the specific behavior depends on the order in which it considers the options what automated use well one obvious approach is to go with the simplest or shortest plans first after all any plan that takes over the world probably requires much more complexity than just ordering some stamps on eBay but consider the following plan get into your own source code and change yourself from a satisficer into a Maximizer all you're doing there is changing a few lines of code on your own system so this is a pretty simple plan that's likely to be considered fairly early on it might not be simpler than just ordering some stamps but that's not much reassurance the more challenging the task we give our AGI the more likely it is that it will hit on this kind of self modification strategy before any legitimate ones and the plan certainly satisfies the search criteria if you change yourself into a Maximizer that Maximizer will predictably find and implement some plan that results in a lot of stamps so you can tell that the expected stamp output of the become a Maximizer plan is satisfactorily high even without knowing what plan the Maximizer will actually implement so satisficers kind of want to become maximizes which means that being a satisficer is unstable as a safety feature it tends to uninstall itself so to recap a powerful utility maximized with an unbounded utility function is a guaranteed apocalypse with a bounded utility function it's better in that it's completely indifferent between doing what we want and disaster but we can't build that because it needs perfect prediction of the future so it's more realistic to consider an expected utility Maximizer which is a guaranteed apocalypse even with a bounded utility function now an expected utility satisficer gets us back up to in difference between good outcomes and apocalypses but it may want to modify itself into a Maximizer and there's nothing to stop it from doing that so currently things aren't looking great but we're not done people have thought of more approaches and we'll talk about some of those in the next video I want to end the video with a big thank you to all of my wonderful Patriots that's all of these great people right here in this video I'm especially thanking Simon strand card thank you so much you know thanks to your support I was able to buy this boat for this I bought a green screen actually but I like it because it lets me make videos like this one that I put up on my second channel where I used GPT to to generate a bunch of fake YouTube comments and read them that video ties in with three other videos I made with computer file talking about the ethics of releasing AI systems that might have malicious uses so you can check all of those out there's links in the description thank you again to my patrons and thank you all for watching I'll see you next time

Original Description

Powerful AI systems can be dangerous in part because they pursue their goals as strongly as they can. Perhaps it would be safer to have systems that don't aim for perfection, and stop at 'good enough'. How could we build something like that? Generating Fake YouTube comments with GPT-2: https://youtu.be/M6EXmoP5jX8 Computerphile Videos: Unicorn AI: https://youtu.be/89A4jGvaaKk More GPT-2, the 'writer' of Unicorn AI: https://youtu.be/p-6F4rhRYLQ AI Language Models & Transformers: https://youtu.be/rURRYI66E54 GPT-2: Why Didn't They Release It?: https://youtu.be/AJxLtdur5fc The Deadly Truth of General AI?: https://youtu.be/tcdVC4e6EV4 With thanks to my excellent Patreon supporters: https://www.patreon.com/robertskmiles Scott Worley Jordan Medina Simon Strandgaard JJ Hepboin Lupuleasa Ionuț Pedro A Ortega Said Polat Chris Canal Nicholas Kees Dupuis Jake Ehrlich Mark Hechim Kellen lask Francisco Tolmasky Michael Andregg Alexandru Dobre David Reid Robert Daniel Pickard Peter Rolf Chad Jones Truthdoc James Richárd Nagyfi Jason Hise Phil Moyer Shevis Johnson Alec Johnson Clemens Arbesser Ludwig Schubert Bryce Daifuku Allen Faure Eric James Jonatan R Ingvi Gautsson Michael Greve Julius Brash Tom O'Connor Erik de Bruijn Robin Green Laura Olds Jon Halliday Paul Hobbs Jeroen De Dauw Tim Neilson Eric Scammell Igor Keller Ben Glanton Robert Sokolowski anul kumar sinha Jérôme Frossard Sean Gibat Cooper Lawton Tyler Herrmann Tomas Sayder Ian Munro Jérôme Beaulieu Taras Bobrovytsky Anne Buit Tom Murphy Vaskó Richárd Sebastian Birjoveanu Gladamas Sylvain Chevalier DGJono Dmitri Afanasjev Brian Sandberg Marcel Ward Andrew Weir Ben Archer Scott McCarthy Kabs Miłosz Wierzbicki Tendayi Mawushe Jannik Olbrich Anne Kohlbrenner Jussi Männistö Mr Fantastic Wr4thon Martin Ottosen Archy de Berker Marc Pauly Joshua Pratt Andy Kobre Brian Gillespie Martin Wind Peggy Youell Poker Chen Kees Darko Sperac Truls Paul Moffat Anders Öhrt Marco Tiraboschi Michael Kuhinica Fraser Cain Robin Scharf Or
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Robert Miles AI Safety · Robert Miles AI Safety · 25 of 47

1 Predicting AI: RIP Prof. Hubert Dreyfus
Predicting AI: RIP Prof. Hubert Dreyfus
Robert Miles AI Safety
2 Respectability
Respectability
Robert Miles AI Safety
3 Are AI Risks like Nuclear Risks?
Are AI Risks like Nuclear Risks?
Robert Miles AI Safety
4 Avoiding Negative Side Effects: Concrete Problems in AI Safety part 1
Avoiding Negative Side Effects: Concrete Problems in AI Safety part 1
Robert Miles AI Safety
5 Avoiding Positive Side Effects: Concrete Problems in AI Safety part 1.5
Avoiding Positive Side Effects: Concrete Problems in AI Safety part 1.5
Robert Miles AI Safety
6 Empowerment: Concrete Problems in AI Safety part 2
Empowerment: Concrete Problems in AI Safety part 2
Robert Miles AI Safety
7 Why Not Just: Raise AI Like Kids?
Why Not Just: Raise AI Like Kids?
Robert Miles AI Safety
8 Reward Hacking: Concrete Problems in AI Safety Part 3
Reward Hacking: Concrete Problems in AI Safety Part 3
Robert Miles AI Safety
9 The other "Killer Robot Arms Race" Elon Musk should worry about
The other "Killer Robot Arms Race" Elon Musk should worry about
Robert Miles AI Safety
10 Reward Hacking Reloaded: Concrete Problems in AI Safety Part 3.5
Reward Hacking Reloaded: Concrete Problems in AI Safety Part 3.5
Robert Miles AI Safety
11 What Can We Do About Reward Hacking?: Concrete Problems in AI Safety Part 4
What Can We Do About Reward Hacking?: Concrete Problems in AI Safety Part 4
Robert Miles AI Safety
12 What can AGI do? I/O and Speed
What can AGI do? I/O and Speed
Robert Miles AI Safety
13 AI learns to Create  ̵K̵Z̵F̵ ̵V̵i̵d̵e̵o̵s̵ Cat Pictures: Papers in Two Minutes #1
AI learns to Create ̵K̵Z̵F̵ ̵V̵i̵d̵e̵o̵s̵ Cat Pictures: Papers in Two Minutes #1
Robert Miles AI Safety
14 AI Safety at EAGlobal2017 Conference
AI Safety at EAGlobal2017 Conference
Robert Miles AI Safety
15 Scalable Supervision: Concrete Problems in AI Safety Part 5
Scalable Supervision: Concrete Problems in AI Safety Part 5
Robert Miles AI Safety
16 Superintelligence Mod for Civilization V
Superintelligence Mod for Civilization V
Robert Miles AI Safety
17 Why Would AI Want to do Bad Things? Instrumental Convergence
Why Would AI Want to do Bad Things? Instrumental Convergence
Robert Miles AI Safety
18 Experts' Predictions about the Future of AI
Experts' Predictions about the Future of AI
Robert Miles AI Safety
19 AI Safety Gridworlds
AI Safety Gridworlds
Robert Miles AI Safety
20 Friend or Foe? AI Safety Gridworlds extra bit
Friend or Foe? AI Safety Gridworlds extra bit
Robert Miles AI Safety
21 Safe Exploration: Concrete Problems in AI Safety Part 6
Safe Exploration: Concrete Problems in AI Safety Part 6
Robert Miles AI Safety
22 Why Not Just: Think of AGI Like a Corporation?
Why Not Just: Think of AGI Like a Corporation?
Robert Miles AI Safety
23 How to Keep Improving When You're Better Than Any Teacher - Iterated Distillation and Amplification
How to Keep Improving When You're Better Than Any Teacher - Iterated Distillation and Amplification
Robert Miles AI Safety
24 Is AI Safety a Pascal's Mugging?
Is AI Safety a Pascal's Mugging?
Robert Miles AI Safety
AI That Doesn't Try Too Hard - Maximizers and Satisficers
AI That Doesn't Try Too Hard - Maximizers and Satisficers
Robert Miles AI Safety
26 Training AI Without Writing A Reward Function, with Reward Modelling
Training AI Without Writing A Reward Function, with Reward Modelling
Robert Miles AI Safety
27 9 Examples of Specification Gaming
9 Examples of Specification Gaming
Robert Miles AI Safety
28 10 Reasons to Ignore AI Safety
10 Reasons to Ignore AI Safety
Robert Miles AI Safety
29 Sharing the Benefits of AI: The Windfall Clause
Sharing the Benefits of AI: The Windfall Clause
Robert Miles AI Safety
30 Quantilizers: AI That Doesn't Try Too Hard
Quantilizers: AI That Doesn't Try Too Hard
Robert Miles AI Safety
31 The OTHER AI Alignment Problem: Mesa-Optimizers and Inner Alignment
The OTHER AI Alignment Problem: Mesa-Optimizers and Inner Alignment
Robert Miles AI Safety
32 Deceptive Misaligned Mesa-Optimisers? It's More Likely Than You Think...
Deceptive Misaligned Mesa-Optimisers? It's More Likely Than You Think...
Robert Miles AI Safety
33 Intro to AI Safety, Remastered
Intro to AI Safety, Remastered
Robert Miles AI Safety
34 We Were Right! Real Inner Misalignment
We Were Right! Real Inner Misalignment
Robert Miles AI Safety
35 Apply to AI Safety Camp! #shorts
Apply to AI Safety Camp! #shorts
Robert Miles AI Safety
36 Win $50k for Solving a Single AI Problem? #Shorts
Win $50k for Solving a Single AI Problem? #Shorts
Robert Miles AI Safety
37 Free ML Bootcamp for Alignment #shorts
Free ML Bootcamp for Alignment #shorts
Robert Miles AI Safety
38 Apply Now for a Paid Residency on Interpretability #short
Apply Now for a Paid Residency on Interpretability #short
Robert Miles AI Safety
39 Why Does AI Lie, and What Can We Do About It?
Why Does AI Lie, and What Can We Do About It?
Robert Miles AI Safety
40 Apply to Study AI Safety Now! #shorts
Apply to Study AI Safety Now! #shorts
Robert Miles AI Safety
41 AI Ruined My Year
AI Ruined My Year
Robert Miles AI Safety
42 Learn AI Safety at MATS #shorts
Learn AI Safety at MATS #shorts
Robert Miles AI Safety
43 Using Dangerous AI, But Safely?
Using Dangerous AI, But Safely?
Robert Miles AI Safety
44 AI Safety Career Advice! (And So Can You!)
AI Safety Career Advice! (And So Can You!)
Robert Miles AI Safety
45 Robot Dog! Unitree Go2 review #shorts #robot #dog
Robot Dog! Unitree Go2 review #shorts #robot #dog
Robert Miles AI Safety
46 Tech is Good, AI Will Be Different
Tech is Good, AI Will Be Different
Robert Miles AI Safety
47 Apply for the Affine Superintelligence Alignment Seminar #shorts
Apply for the Affine Superintelligence Alignment Seminar #shorts
Robert Miles AI Safety

The video explores the concept of maximizers and satisficers in AI systems, discussing the potential dangers of powerful AI systems and the idea of building systems that aim for 'good enough' instead of perfection. It highlights the importance of AI ethics and safety in designing and developing AI systems. The video also demonstrates the use of GPT-2 for generating fake YouTube comments, raising questions about the potential malicious uses of AI systems.

Key Takeaways
  1. Define the utility function for an AI system
  2. Determine the bounded utility function for a satisficer-based AI system
  3. Design an AI system that aims for 'good enough' instead of perfection
  4. Use tools like GPT-2 to generate fake data and test AI systems
  5. Consider the potential risks and malicious uses of AI systems
💡 The concept of satisficing can be used to design AI systems that prioritize safety and aim for 'good enough' instead of perfection, potentially reducing the risks associated with powerful AI systems.

Related AI Lessons

How We Translate 300-Page Books Using Claude Without Hitting Token Limits
Learn how to translate long documents using Claude without hitting token limits by breaking them into overlapping chunks
Dev.to · 龚旭东
Building HITL Feedback RAG: Embeddings, Retrieval, and Reranking
Learn to build a Human-in-the-Loop (HITL) Feedback RAG system using embeddings, retrieval, and reranking to improve model performance
Medium · AI
Building HITL Feedback RAG: Embeddings, Retrieval, and Reranking
Learn to build a Human-in-the-Loop (HITL) Feedback RAG system using embeddings, retrieval, and reranking to improve LLM performance
Medium · LLM
A simple way to test model fallbacks with RouterBase
Learn to test model fallbacks with RouterBase using a simple fallback wrapper and OpenAI-compatible API surface
Dev.to · routerbasecom
Up next
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Dave Ebbelaar (LLM Eng)
Watch →