Quantilizers: AI That Doesn't Try Too Hard
Skills:
Research Methods90%Reading ML Papers90%AI Alignment Basics90%AI Ethics & Policy90%Paper Reproduction80%
Key Takeaways
The video discusses Quantilizers, an AI system that combines human imitation and expected utility maximization to achieve better outcomes without extreme utility maximizing strategies, and explores its potential as a safer alternative to traditional AI approaches.
Full Transcript
hi so way back in the before time i made a video about maximizers and satisfices the plan was that was going to be the first half of a two-parter now i did script out that second video and shoot it and even start to edit it and then certain events transpired and i never finished that video so that's what this is part two of a video that i started ages ago which i think most people have forgotten about so i do recommend going back and watching that video if you haven't already or even re-watching it to remind yourself so i'll put a link to that in the description and with that here's part two take it away past me hi in the previous video we looked at utility maximizers expected utility maximizers and satisfices using unbounded and bounded utility functions a powerful utility maximizer with an unbounded utility function is a guaranteed apocalypse with a bounded utility function it's better in that it's completely indifferent between doing what we want and disaster but we can't build that because it needs perfect prediction of the future so it's more realistic to consider an expected utility maximizer which is a guaranteed apocalypse even with a bounded utility function now an expected utility satisficer gets us back up to indifference between good outcomes and apocalypses but it may want to modify itself into a maximizer and there's nothing to stop it from doing that the situation doesn't look great so let's try looking at something completely different let's try to get away from this utility function stuff that seems so dangerous what if we just tried to directly imitate humans if we can get enough data about human behavior maybe we can train a model that for any given situation predicts what a human being would do in that scenario if the model's good enough you've basically got a human level agi right it's able to do a wide range of cognitive tasks just like a human can because it's just exactly copying humans that kind of system won't do a lot of the dangerous counterproductive things that a maximizer would do simply because a human wouldn't do them but i wouldn't exactly call it safe because a perfect imitation of a human isn't safer than the human it's perfectly imitating and humans aren't really safe in principle a truly safe agi could be given just about any level of power and responsibility and it would tend to produce good outcomes but the same can't really be said for humans and an imperfect human imitation would almost certainly be even worse i mean what are the chances that introducing random errors and inaccuracies to the imitation would just happen to make it more safe rather than less still it does seem like it would be safer than a utility maximizer at least we're out of guaranteed apocalypse territory but the other thing that makes this kind of approach unsatisfactory is a human imitation can't exceed human capabilities by much because it's just copying them a big part of why we want agi in the first place is to get it to solve problems that we can't you might be able to run the thing faster to allow it more thinking time or something like that but that's a pretty limited form of super intelligence and you have to be very careful with anything along those lines because it means putting the system in a situation that's very different from anything any human being has ever experienced your model might not generalize well to a situation so different from anything in its training data which could lead to unpredictable and potentially dangerous behavior relatively recently a new approach was proposed called quantalizing the idea is that this lets you combine human imitation and expected utility maximization to hopefully get some of the advantages of both without all of the downsides it works like this you have your human imitation model given a situation it can give you a probability distribution over actions that's like for each of the possible actions you could take in this situation how likely is it that a human would take that action so in our stamp collecting example that would be if a human were trying to collect a lot of stamps how likely would they be to do this action then you have whatever system you'd use for a utility maximizer that's able to figure out the expected utility of different actions according to some utility function for any given action it can tell you how much utility you'd expect to get if you did that so in our example that's how many stamps would you expect this action to result in so for every action you have these two numbers the human probability and the expected utility quantalizing sort of mixes these together and you get to choose how they're mixed with a variable that we'll call q if q is zero the system acts like an expected utility maximizer if it's one the system acts like a human imitation by setting it somewhere in between we can hopefully get a quantizer that's more effective than the human imitation but not as dangerous as the utility maximizer so what exactly is a quantizer let's look at the definition in the paper a q quantilyzer is an agent that when faced with a decision problem returns a random action in the top q proportion of some base distribution over actions sorted by the expected utility achieved if that action is executed so let's break this down and go through how it works step by step first we pick a value for q the variable that determines how we're going to mix imitation and utility maximization let's set it to 0.1 for this example 10 now we take all of the available actions and sort them by expected utility so on one end you've got the actions that kick off all of the crazy extreme utility maximizing strategies you know killing everyone and turning the whole world into stamps all the way down through the moderate strategies like buying some stamps and down to all of the strategies that do nothing and collect no stamps at all then we look at our base distribution over actions what is that in the version i'm talking about we're using the human imitation system's probability distribution over actions for this so our base distribution is how likely a human is to do each action that might look something like this no human is ever going to try the wacky extreme maximizing strategies so our human imitator gives them a probability of basically zero then there are some really good strategies that humans probably won't think of but they might if they're really smart or lucky then a big bump of normal strategies that humans are quite likely to use that tend to do okay then tailing off into less and less good strategies and eventually stupider and stupider mistakes the humans are less and less likely to make then what we do is we find the point in our action list such that 10 of the probability mass is on the high expected utility side so that's what q is really changing it's where we make this cutoff note that it's not ten percent of the actions that would be over here it's ten percent of the probability mass then we throw away everything on the right all the stupid and useless choices we set them to zero and we keep the top ten percent now this is no longer a valid probability distribution because it only sums up to 0.1 so we multiply all of these by 10 so that the whole thing sums to 1 again and that's our final probability distribution which we sample from to get our chosen action so let's look at some different actions here and see how they do consider something like misremember your credit card details and keep trying to order stamps with the wrong number and you can't figure out why it's not working a human is reasonably likely to do that not very likely but we've all met people who point is a pure human imitation might do that but the expected utility maximizer can see that this results in very few stamps so it ends up low on the list and doesn't make the 10 cutoff so there are lots of mistakes that a human imitation might make that a quantalizer won't and note that for our stamp collecting utility function the worst case is zero stamps but you could imagine with other utility functions a human imitator could make arbitrarily bad mistakes that a quantizer would be able to avoid now the most common boring human strategies that the human imitator is very likely to use also don't make the cut off a 50 quantilizer would have a decent chance of going with one of them but a 10 quantizer aims higher than that the bulk of the probability mass for the 10 quantilyzer is in strategies that a human might try that works significantly better than average so the quantalizer is kind of like a human on a really good day it uses the power of the expected utility calculation to be more effective than a pure imitation of a human is it safe though after all many of the insane maximizing strategies are still in our distribution with hopefully small but still non-zero probabilities and in fact we multiplied them all by 10 when we renormalized if there's some chance that a human would go for an extreme utility maximizing strategy the 10 percent quantilizer is 10 times more likely than that but the probability will still be small unless you've chosen a very small value for q your quantalizer is much more likely to go for one of the reasonably high performing human plausible strategies and what about stability satisficers tend to want to turn themselves into maximizes does a quantizer have that problem well the human model should give that kind of strategy a very low probability a human is extremely unlikely to try to modify themselves into an expected utility maximizer to better pursue their goals humans can't really self-modify like that anyway but a human might try to build an expected utility maximizer rather than trying to become one that's kind of worrying since it's a plan that a human definitely might try that would result in extremely high expected utility so although a quantalizer might seem like a relatively safe system it still might end up building an unsafe one so how's our safety meter looking well it's progress let's keep working on it some of you may have noticed your questions in the youtube comments being answered by a mysterious bot named stampy the way that works is stampy cross posts youtube questions to the rob miles ai discord where me and a bunch of patrons discuss them and write replies oh yeah there's a discord now for patrons thank you to everyone on the discord who helps reply to comments and thank you to all of my patrons all of these amazing people in this video i'm especially thanking timothy lillarcrap thank you so much for your support and thank you all for watching i'll see you next time you
Original Description
How do you get an AI system that does better than a human could, without doing anything a human wouldn't?
A follow-up to "Maximizers and Satisficers": https://youtu.be/Ao4jwLwT36M
Links:
The Paper: https://intelligence.org/files/QuantilizersSaferAlternative.pdf
More about this area of research: https://www.alignmentforum.org/tag/mild-optimization
Quantilizers: https://aisafety.info/questions/6380/
Softer quantilization: https://aisafety.info/questions/6449/
Satisficers: https://aisafety.info/questions/NK44/
What’s an optimizer: https://aisafety.info/questions/8C7W/
With thanks to my excellent Patreon supporters:
https://www.patreon.com/robertskmiles
Timothy Lillicrap
Gladamas
James
Scott Worley
Chad Jones
Shevis Johnson
JJ Hepboin
Pedro A Ortega
Said Polat
Chris Canal
Jake Ehrlich
Kellen lask
Francisco Tolmasky
Michael Andregg
David Reid
Peter Rolf
Teague Lasser
Andrew Blackledge
Frank Marsman
Brad Brookshire
Cam MacFarlane
Vivek Nayak
Jason Hise
Phil Moyer
Erik de Bruijn
Alec Johnson
Clemens Arbesser
Ludwig Schubert
Allen Faure
Eric James
Matheson Bayley
Qeith Wreid
jugettje dutchking
Owen Campbell-Moore
Atzin Espino-Murnane
Johnny Vaughan
Jacob Van Buren
Jonatan R
Ingvi Gautsson
Michael Greve
Tom O'Connor
Laura Olds
Jon Halliday
Paul Hobbs
Jeroen De Dauw
Lupuleasa Ionuț
Cooper Lawton
Tim Neilson
Eric Scammell
Igor Keller
Ben Glanton
anul kumar sinha
Duncan Orr
Will Glynn
Tyler Herrmann
Tomas Sayder
Ian Munro
Jérôme Beaulieu
Nathan Fish
Taras Bobrovytsky
Jeremy
Vaskó Richárd
Benjamin Watkin
Sebastian Birjoveanu
Andrew Harcourt
Luc Ritchie
Nicholas Guyett
James Hinchcliffe
12tone
Chris Beacham
Zachary Gidwitz
Nikita Kiriy
Parker
Andrew Schreiber
Steve Trambert
Mario Lois
Abigail Novick
heino hulsey-vincent
Fionn
Dmitri Afanasjev
Marcel Ward
Richárd Nagyfi
Andrew Weir
Kabs
Miłosz Wierzbicki
Tendayi Mawushe
Jannik Olbrich
Jake Fish
Wr4thon
Martin Ottosen
Robert Hildebrandt
Andy Kobre
Poker Chen
Kees
Darko Sperac
Paul Moffat
Robert Valdimarsson
Marco Tirabo
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Robert Miles AI Safety · Robert Miles AI Safety · 30 of 47
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
▶
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
Predicting AI: RIP Prof. Hubert Dreyfus
Robert Miles AI Safety
Respectability
Robert Miles AI Safety
Are AI Risks like Nuclear Risks?
Robert Miles AI Safety
Avoiding Negative Side Effects: Concrete Problems in AI Safety part 1
Robert Miles AI Safety
Avoiding Positive Side Effects: Concrete Problems in AI Safety part 1.5
Robert Miles AI Safety
Empowerment: Concrete Problems in AI Safety part 2
Robert Miles AI Safety
Why Not Just: Raise AI Like Kids?
Robert Miles AI Safety
Reward Hacking: Concrete Problems in AI Safety Part 3
Robert Miles AI Safety
The other "Killer Robot Arms Race" Elon Musk should worry about
Robert Miles AI Safety
Reward Hacking Reloaded: Concrete Problems in AI Safety Part 3.5
Robert Miles AI Safety
What Can We Do About Reward Hacking?: Concrete Problems in AI Safety Part 4
Robert Miles AI Safety
What can AGI do? I/O and Speed
Robert Miles AI Safety
AI learns to Create ̵K̵Z̵F̵ ̵V̵i̵d̵e̵o̵s̵ Cat Pictures: Papers in Two Minutes #1
Robert Miles AI Safety
AI Safety at EAGlobal2017 Conference
Robert Miles AI Safety
Scalable Supervision: Concrete Problems in AI Safety Part 5
Robert Miles AI Safety
Superintelligence Mod for Civilization V
Robert Miles AI Safety
Why Would AI Want to do Bad Things? Instrumental Convergence
Robert Miles AI Safety
Experts' Predictions about the Future of AI
Robert Miles AI Safety
AI Safety Gridworlds
Robert Miles AI Safety
Friend or Foe? AI Safety Gridworlds extra bit
Robert Miles AI Safety
Safe Exploration: Concrete Problems in AI Safety Part 6
Robert Miles AI Safety
Why Not Just: Think of AGI Like a Corporation?
Robert Miles AI Safety
How to Keep Improving When You're Better Than Any Teacher - Iterated Distillation and Amplification
Robert Miles AI Safety
Is AI Safety a Pascal's Mugging?
Robert Miles AI Safety
AI That Doesn't Try Too Hard - Maximizers and Satisficers
Robert Miles AI Safety
Training AI Without Writing A Reward Function, with Reward Modelling
Robert Miles AI Safety
9 Examples of Specification Gaming
Robert Miles AI Safety
10 Reasons to Ignore AI Safety
Robert Miles AI Safety
Sharing the Benefits of AI: The Windfall Clause
Robert Miles AI Safety
Quantilizers: AI That Doesn't Try Too Hard
Robert Miles AI Safety
The OTHER AI Alignment Problem: Mesa-Optimizers and Inner Alignment
Robert Miles AI Safety
Deceptive Misaligned Mesa-Optimisers? It's More Likely Than You Think...
Robert Miles AI Safety
Intro to AI Safety, Remastered
Robert Miles AI Safety
We Were Right! Real Inner Misalignment
Robert Miles AI Safety
Apply to AI Safety Camp! #shorts
Robert Miles AI Safety
Win $50k for Solving a Single AI Problem? #Shorts
Robert Miles AI Safety
Free ML Bootcamp for Alignment #shorts
Robert Miles AI Safety
Apply Now for a Paid Residency on Interpretability #short
Robert Miles AI Safety
Why Does AI Lie, and What Can We Do About It?
Robert Miles AI Safety
Apply to Study AI Safety Now! #shorts
Robert Miles AI Safety
AI Ruined My Year
Robert Miles AI Safety
Learn AI Safety at MATS #shorts
Robert Miles AI Safety
Using Dangerous AI, But Safely?
Robert Miles AI Safety
AI Safety Career Advice! (And So Can You!)
Robert Miles AI Safety
Robot Dog! Unitree Go2 review #shorts #robot #dog
Robert Miles AI Safety
Tech is Good, AI Will Be Different
Robert Miles AI Safety
Apply for the Affine Superintelligence Alignment Seminar #shorts
Robert Miles AI Safety
More on: Research Methods
View skill →Related Reads
📰
📰
📰
📰
I Spent Weeks Looking for a Research Gap Before I Realized I Was Searching the Wrong Way
Medium · AI
ICMI 2026 Reviews [D]
Reddit r/MachineLearning
Workshop submission for main conference paper under review [D]
Reddit r/MachineLearning
Kept context-switching between arxiv, OpenReview, GitHub, and HuggingFace for every paper, so I built this. Chrome extension + website with everything inline, plus citation graph + SPECTER2 neighbors. 3M papers, free, feedback welcome [P]
Reddit r/MachineLearning
🎓
Tutor Explanation
DeepCamp AI