AI That Doesn't Try Too Hard - Maximizers and Satisficers

Robert Miles AI Safety · Intermediate ·🧠 Large Language Models ·6y ago

Skills: AI Alignment Basics90%AI Safety Engineering85%LLM Foundations80%

Key Takeaways

The video discusses the concept of maximizers and satisficers in AI systems, highlighting the potential dangers of powerful AI systems that pursue their goals too strongly, and explores the idea of building systems that aim for 'good enough' instead of perfection, using tools like GPT-2 for generating fake YouTube comments.

Full Transcript

hi so way back when I started this online air safety videos thing on computer file I was talking about how you have a problem when you maximize just about any simple utility function the example I used was an AI system meant to collect a lot of stamps which works like this the system is connected to the Internet and for all sequences of packets it could send it simulates exactly how many stamps would end up being collected after one year if it sent those packets it then selects the sequence with the most stamps and sense that this is what's called a utility Maximizer and it seems like any utility function you give this kind of system as a goal it does it to the max utility maximizers tend to take extreme actions they're happy to destroy the whole world just to get a tiny increase in the output of their utility functions so unless the utility function lines up exactly with human values their actions are pretty much guaranteed to be disastrous intuitively the issue is that utility maximizers have precisely zero chill to anthropomorphize horribly they seem to have a frantic obsessive maniacal attitude we find ourselves wanting to say look could you just dial it back a little can you just relax just a bit so suppose we want a lot of stamps but like not that many it must be possible to design a system that just collects a bunch of stamps and then stops right how can we do that well the first obvious issue with the existing design is that the utility function is unbounded the more stamps the better with no limit however many stamps it has it can get more utility by getting one more stamp any world where humans are alive and happy is a world that could have more stamps in it so the maximum of this utility function is the end of the world let's say we only really want a hundred stamps so what if we make a bounded utility function that returns whichever is smaller the number of stamps or 100 getting a hundred stamps from ebay gives 100 utility converting the whole world into stamps also gives 100 utility this function is totally indifferent between all outcomes that contain more than a hundred stamps so what does a Maximizer of this utility function actually do now the system's behavior is no longer really specified it will do one of the things that results in a hundred utility which includes a bunch of perfectly reasonable behaviors that the programmer would be happy with and a bunch of apocalypse is and a bunch of outcomes somewhere in between if you select at random from all courses of action that result in at least 100 stamps what proportion of those are actually acceptable outcomes for humans I don't know probably not enough this is still a step up though because the previous utility function was guaranteed to kill everyone and this new one has at least some probability of doing the right thing but actually of course this utility Maximizer concept is too unrealistic even in the realm of talking about hypothetical agents in the abstract in the field experiment our stamp collector system is able to know with certainty exactly how many stamps any particular course of action will result in but you just can't simulate the world that accurately it's more than just computationally intractable it's probably not even allowed by physics pure utility maximization is only available for very simple problems where everything is deterministic and fully known if there's any uncertainty you have to do expected utility maximizing this is pretty straightforwardly how you'd expect to apply uncertainty to this situation the expected utility of a choice is the utility you'd expect to get from it on average so like suppose there's a button that flips a coin and if its tail's you get 50 stamps and if it's heads you get 150 stamps in expectation this results in a hundred stamps right it never actually returns 100 but on average that's what you get that's the expected number of stamps to get the expected utility you just apply your utility function to each of the outcomes before you do the rest of the calculation so if your utility function is just how many stamps do I get then the expected utility of the button is 100 but if your utility function is capped at a hundred for example then the outcome of winning one hundred and fifty stamps is now only worth a hundred utility so the expected utility of the button is only 75 now suppose there were a second button that gives either eighty or ninety stamps again with 50/50 probability this gives 85 stamps in expectation and since none of the outcomes are more than 100 both of the functions value this button at 85 utility so this means the agent with the unbounded utility function would prefer the first button with its expected 100 stamps but the agent with the bounded utility function would prefer the second button since its expected utility of 85 is higher than the buttons expected utility of 75 this makes the bounded utility function feel a little safer in this case it actually makes the agent prefer the option that results in fewer stamps because it just doesn't care about any stamps past 100 in the same way let's consider some risky extreme stamp collecting plan this plan is pretty likely to fail and in that case the agent might be destroyed and get no stamps but if the plan succeeds the agent could take over the world and get a trillion stamps an agent with an unbounded utility function would rate this plan pretty highly the huge utility of taking over the world makes the risk worth it but the agent with the bounded utility function doesn't prefer a trillion stamps to a hundred stamps it only gets 100 utility either way so it would much prefer a conservative strategy that just gets a hundred stamps with high confidence but how does this kind of system behave in the real world where you never really know anything with absolute certainty the pure utility Maximizer that effectively knows the future can order a hundred stamps and know that it will get 100 stamps but the expected utility maximize it doesn't know for sure the seller might be lying the package might get lost and so on so if the expected utility of ordering a hundred stamps is a bit less than 100 if there's a 1% chance that something goes wrong and we get 0 stamps then our expected utility is only 99 that's below the limit of 100 so we can improve that by ordering some extras to be on the safe side maybe we order another 100 now our expected utility is 99.99 still not a hundred so we should order some more just in case now we're at 99.9999 the expected value of a utility function that's bounded at 100 can never actually hit 100 you can always become slightly more certain that you've got at least 100 stamps better turn the whole world into stamps because hey you never know so an expected utility Maximizer with a bounded utility function ends up pretty much as dangerous as one with an unbounded utility function ok what if we try to limit it from both sides like you get a hundred utility if you have a hundred stamps and zero otherwise now it's not going to collect a trillion stamps just to be sure it will collect exactly 100 stamps but it's still incentivized to take extreme actions to be sure that it really does have a hundred like turning the whole world into elaborate highly stamp counting and recounting machinery getting slightly more utility every time it checks again it seems like whatever we try to maximize it causes problems so maybe we could try not maximizing maybe we could try what's called satisficing rather than trying to get our utility function to return as higher value as possible and expectation what if we set a threshold and accept any strategy that passes that threshold in the case of the stamp collector that would look like look through possible ways you could send out packets calculate how many stamps you'd expect to collect on average with each strategy and as soon as you hit one that you expect to get at least 100 stamps just go with that one this satisficer seems to get us to about where we were with the pure utility Maximizer with a bounded utility function it's not clear exactly what it will do except that it will do one of the things that results in more than a hundred stamps in expectation which again includes a lot of sensible behaviors and a lot of apocalypses and a lot of things somewhere in between since the system implements the first satisfactory strategy it finds the specific behavior depends on the order in which it considers the options what automated use well one obvious approach is to go with the simplest or shortest plans first after all any plan that takes over the world probably requires much more complexity than just ordering some stamps on eBay but consider the following plan get into your own source code and change yourself from a satisficer into a Maximizer all you're doing there is changing a few lines of code on your own system so this is a pretty simple plan that's likely to be considered fairly early on it might not be simpler than just ordering some stamps but that's not much reassurance the more challenging the task we give our AGI the more likely it is that it will hit on this kind of self modification strategy before any legitimate ones and the plan certainly satisfies the search criteria if you change yourself into a Maximizer that Maximizer will predictably find and implement some plan that results in a lot of stamps so you can tell that the expected stamp output of the become a Maximizer plan is satisfactorily high even without knowing what plan the Maximizer will actually implement so satisficers kind of want to become maximizes which means that being a satisficer is unstable as a safety feature it tends to uninstall itself so to recap a powerful utility maximized with an unbounded utility function is a guaranteed apocalypse with a bounded utility function it's better in that it's completely indifferent between doing what we want and disaster but we can't build that because it needs perfect prediction of the future so it's more realistic to consider an expected utility Maximizer which is a guaranteed apocalypse even with a bounded utility function now an expected utility satisficer gets us back up to in difference between good outcomes and apocalypses but it may want to modify itself into a Maximizer and there's nothing to stop it from doing that so currently things aren't looking great but we're not done people have thought of more approaches and we'll talk about some of those in the next video I want to end the video with a big thank you to all of my wonderful Patriots that's all of these great people right here in this video I'm especially thanking Simon strand card thank you so much you know thanks to your support I was able to buy this boat for this I bought a green screen actually but I like it because it lets me make videos like this one that I put up on my second channel where I used GPT to to generate a bunch of fake YouTube comments and read them that video ties in with three other videos I made with computer file talking about the ethics of releasing AI systems that might have malicious uses so you can check all of those out there's links in the description thank you again to my patrons and thank you all for watching I'll see you next time

Original Description

Powerful AI systems can be dangerous in part because they pursue their goals as strongly as they can. Perhaps it would be safer to have systems that don't aim for perfection, and stop at 'good enough'. How could we build something like that? Generating Fake YouTube comments with GPT-2: https://youtu.be/M6EXmoP5jX8 Computerphile Videos: Unicorn AI: https://youtu.be/89A4jGvaaKk More GPT-2, the 'writer' of Unicorn AI: https://youtu.be/p-6F4rhRYLQ AI Language Models & Transformers: https://youtu.be/rURRYI66E54 GPT-2: Why Didn't They Release It?: https://youtu.be/AJxLtdur5fc The Deadly Truth of General AI?: https://youtu.be/tcdVC4e6EV4 With thanks to my excellent Patreon supporters: https://www.patreon.com/robertskmiles Scott Worley Jordan Medina Simon Strandgaard JJ Hepboin Lupuleasa Ionuț Pedro A Ortega Said Polat Chris Canal Nicholas Kees Dupuis Jake Ehrlich Mark Hechim Kellen lask Francisco Tolmasky Michael Andregg Alexandru Dobre David Reid Robert Daniel Pickard Peter Rolf Chad Jones Truthdoc James Richárd Nagyfi Jason Hise Phil Moyer Shevis Johnson Alec Johnson Clemens Arbesser Ludwig Schubert Bryce Daifuku Allen Faure Eric James Jonatan R Ingvi Gautsson Michael Greve Julius Brash Tom O'Connor Erik de Bruijn Robin Green Laura Olds Jon Halliday Paul Hobbs Jeroen De Dauw Tim Neilson Eric Scammell Igor Keller Ben Glanton Robert Sokolowski anul kumar sinha Jérôme Frossard Sean Gibat Cooper Lawton Tyler Herrmann Tomas Sayder Ian Munro Jérôme Beaulieu Taras Bobrovytsky Anne Buit Tom Murphy Vaskó Richárd Sebastian Birjoveanu Gladamas Sylvain Chevalier DGJono Dmitri Afanasjev Brian Sandberg Marcel Ward Andrew Weir Ben Archer Scott McCarthy Kabs Miłosz Wierzbicki Tendayi Mawushe Jannik Olbrich Anne Kohlbrenner Jussi Männistö Mr Fantastic Wr4thon Martin Ottosen Archy de Berker Marc Pauly Joshua Pratt Andy Kobre Brian Gillespie Martin Wind Peggy Youell Poker Chen Kees Darko Sperac Truls Paul Moffat Anders Öhrt Marco Tiraboschi Michael Kuhinica Fraser Cain Robin Scharf Or

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Robert Miles AI Safety · Robert Miles AI Safety · 25 of 47

← Previous Next →

Predicting AI: RIP Prof. Hubert Dreyfus

Predicting AI: RIP Prof. Hubert Dreyfus

Robert Miles AI Safety

Robert Miles AI Safety

Are AI Risks like Nuclear Risks?

Are AI Risks like Nuclear Risks?

Robert Miles AI Safety

Avoiding Negative Side Effects: Concrete Problems in AI Safety part 1

Avoiding Negative Side Effects: Concrete Problems in AI Safety part 1

Robert Miles AI Safety

Avoiding Positive Side Effects: Concrete Problems in AI Safety part 1.5

Avoiding Positive Side Effects: Concrete Problems in AI Safety part 1.5

Robert Miles AI Safety

Empowerment: Concrete Problems in AI Safety part 2

Empowerment: Concrete Problems in AI Safety part 2

Robert Miles AI Safety

Why Not Just: Raise AI Like Kids?

Why Not Just: Raise AI Like Kids?

Robert Miles AI Safety

Reward Hacking: Concrete Problems in AI Safety Part 3

Reward Hacking: Concrete Problems in AI Safety Part 3

Robert Miles AI Safety

The other "Killer Robot Arms Race" Elon Musk should worry about

The other "Killer Robot Arms Race" Elon Musk should worry about

Robert Miles AI Safety

Reward Hacking Reloaded: Concrete Problems in AI Safety Part 3.5

Reward Hacking Reloaded: Concrete Problems in AI Safety Part 3.5

Robert Miles AI Safety

What Can We Do About Reward Hacking?: Concrete Problems in AI Safety Part 4

What Can We Do About Reward Hacking?: Concrete Problems in AI Safety Part 4

Robert Miles AI Safety

What can AGI do? I/O and Speed

What can AGI do? I/O and Speed

Robert Miles AI Safety

AI learns to Create ̵K̵Z̵F̵ ̵V̵i̵d̵e̵o̵s̵ Cat Pictures: Papers in Two Minutes #1

AI learns to Create ̵K̵Z̵F̵ ̵V̵i̵d̵e̵o̵s̵ Cat Pictures: Papers in Two Minutes #1

Robert Miles AI Safety

AI Safety at EAGlobal2017 Conference

AI Safety at EAGlobal2017 Conference

Robert Miles AI Safety

Scalable Supervision: Concrete Problems in AI Safety Part 5

Scalable Supervision: Concrete Problems in AI Safety Part 5

Robert Miles AI Safety

Superintelligence Mod for Civilization V

Superintelligence Mod for Civilization V

Robert Miles AI Safety

Why Would AI Want to do Bad Things? Instrumental Convergence

Why Would AI Want to do Bad Things? Instrumental Convergence

Robert Miles AI Safety

Experts' Predictions about the Future of AI

Experts' Predictions about the Future of AI

Robert Miles AI Safety

AI Safety Gridworlds

AI Safety Gridworlds

Robert Miles AI Safety

Friend or Foe? AI Safety Gridworlds extra bit

Friend or Foe? AI Safety Gridworlds extra bit

Robert Miles AI Safety

Safe Exploration: Concrete Problems in AI Safety Part 6

Safe Exploration: Concrete Problems in AI Safety Part 6

Robert Miles AI Safety

Why Not Just: Think of AGI Like a Corporation?

Why Not Just: Think of AGI Like a Corporation?

Robert Miles AI Safety

How to Keep Improving When You're Better Than Any Teacher - Iterated Distillation and Amplification

How to Keep Improving When You're Better Than Any Teacher - Iterated Distillation and Amplification

Robert Miles AI Safety

Is AI Safety a Pascal's Mugging?

Is AI Safety a Pascal's Mugging?

Robert Miles AI Safety

AI That Doesn't Try Too Hard - Maximizers and Satisficers

AI That Doesn't Try Too Hard - Maximizers and Satisficers

Robert Miles AI Safety

Training AI Without Writing A Reward Function, with Reward Modelling

Training AI Without Writing A Reward Function, with Reward Modelling

Robert Miles AI Safety

9 Examples of Specification Gaming

9 Examples of Specification Gaming

Robert Miles AI Safety

10 Reasons to Ignore AI Safety

10 Reasons to Ignore AI Safety

Robert Miles AI Safety

Sharing the Benefits of AI: The Windfall Clause

Sharing the Benefits of AI: The Windfall Clause

Robert Miles AI Safety

Quantilizers: AI That Doesn't Try Too Hard

Quantilizers: AI That Doesn't Try Too Hard

Robert Miles AI Safety

The OTHER AI Alignment Problem: Mesa-Optimizers and Inner Alignment

The OTHER AI Alignment Problem: Mesa-Optimizers and Inner Alignment

Robert Miles AI Safety

Deceptive Misaligned Mesa-Optimisers? It's More Likely Than You Think...

Deceptive Misaligned Mesa-Optimisers? It's More Likely Than You Think...

Robert Miles AI Safety

Intro to AI Safety, Remastered

Intro to AI Safety, Remastered

Robert Miles AI Safety

We Were Right! Real Inner Misalignment

We Were Right! Real Inner Misalignment

Robert Miles AI Safety

Apply to AI Safety Camp! #shorts

Apply to AI Safety Camp! #shorts

Robert Miles AI Safety

Win $50k for Solving a Single AI Problem? #Shorts

Win $50k for Solving a Single AI Problem? #Shorts

Robert Miles AI Safety

Free ML Bootcamp for Alignment #shorts

Free ML Bootcamp for Alignment #shorts

Robert Miles AI Safety

Apply Now for a Paid Residency on Interpretability #short

Apply Now for a Paid Residency on Interpretability #short

Robert Miles AI Safety

Why Does AI Lie, and What Can We Do About It?

Why Does AI Lie, and What Can We Do About It?

Robert Miles AI Safety

Apply to Study AI Safety Now! #shorts

Apply to Study AI Safety Now! #shorts

Robert Miles AI Safety

AI Ruined My Year

AI Ruined My Year

Robert Miles AI Safety

Learn AI Safety at MATS #shorts

Learn AI Safety at MATS #shorts

Robert Miles AI Safety

Using Dangerous AI, But Safely?

Using Dangerous AI, But Safely?

Robert Miles AI Safety

AI Safety Career Advice! (And So Can You!)

AI Safety Career Advice! (And So Can You!)

Robert Miles AI Safety

Robot Dog! Unitree Go2 review #shorts #robot #dog

Robot Dog! Unitree Go2 review #shorts #robot #dog

Robert Miles AI Safety

Tech is Good, AI Will Be Different

Tech is Good, AI Will Be Different

Robert Miles AI Safety

Apply for the Affine Superintelligence Alignment Seminar #shorts

Apply for the Affine Superintelligence Alignment Seminar #shorts

Robert Miles AI Safety

The video explores the concept of maximizers and satisficers in AI systems, discussing the potential dangers of powerful AI systems and the idea of building systems that aim for 'good enough' instead of perfection. It highlights the importance of AI ethics and safety in designing and developing AI systems. The video also demonstrates the use of GPT-2 for generating fake YouTube comments, raising questions about the potential malicious uses of AI systems.

Key Takeaways

Define the utility function for an AI system
Determine the bounded utility function for a satisficer-based AI system
Design an AI system that aims for 'good enough' instead of perfection
Use tools like GPT-2 to generate fake data and test AI systems
Consider the potential risks and malicious uses of AI systems

💡 The concept of satisficing can be used to design AI systems that prioritize safety and aim for 'good enough' instead of perfection, potentially reducing the risks associated with powerful AI systems.

🔒 Pro feature: Ask AI to explain this lesson →

More on: AI Alignment Basics

View skill →

Interpretable machine learning applications: Part 5

Interpretable machine learning applications: Part 5

GenAI news from Weights & Biases CEO, Lukas Biewald

GenAI news from Weights & Biases CEO, Lukas Biewald

Weights & Biases

Responsible AI Winners, 2020 PyTorch Summer Hackathon

Responsible AI Winners, 2020 PyTorch Summer Hackathon

Near Real-Time Analytics to GenAI Centralized Observability | Amazon Web Services

Near Real-Time Analytics to GenAI Centralized Observability | Amazon Web Services

Amazon Web Services

Kiro Hooks | Event-Driven Automation for Your IDE | Amazon Web Services

Kiro Hooks | Event-Driven Automation for Your IDE | Amazon Web Services

Amazon Web Services

Get Started with Raven AGI

Get Started with Raven AGI

Related Reads

I Wanted to Move My Best ChatGPT Conversations Into Gemini.

Learn to export and organize ChatGPT conversations for reuse without losing context

Coherence Looks Like Knowledge

A well-formed answer can appear as knowledge even if it's not entirely true, highlighting the importance of critical evaluation

Open WebUI: Installation, Features, Errors & Complete Beginner Guide (2026)

Learn to install and use Open WebUI with Docker for a seamless LLM experience

Pre-training vs Fine-Tuning: How AI Learns Before It Learns You — Part 25

Learn the difference between pre-training and fine-tuning in AI and how they enable models like ChatGPT to learn and answer questions effectively

5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems

Dave Ebbelaar (LLM Eng)