Quantilizers: AI That Doesn't Try Too Hard

Robert Miles AI Safety · Advanced ·📄 Research Papers Explained ·5y ago

Key Takeaways

The video discusses Quantilizers, an AI system that combines human imitation and expected utility maximization to achieve better outcomes without extreme utility maximizing strategies, and explores its potential as a safer alternative to traditional AI approaches.

Full Transcript

hi so way back in the before time i made a video about maximizers and satisfices the plan was that was going to be the first half of a two-parter now i did script out that second video and shoot it and even start to edit it and then certain events transpired and i never finished that video so that's what this is part two of a video that i started ages ago which i think most people have forgotten about so i do recommend going back and watching that video if you haven't already or even re-watching it to remind yourself so i'll put a link to that in the description and with that here's part two take it away past me hi in the previous video we looked at utility maximizers expected utility maximizers and satisfices using unbounded and bounded utility functions a powerful utility maximizer with an unbounded utility function is a guaranteed apocalypse with a bounded utility function it's better in that it's completely indifferent between doing what we want and disaster but we can't build that because it needs perfect prediction of the future so it's more realistic to consider an expected utility maximizer which is a guaranteed apocalypse even with a bounded utility function now an expected utility satisficer gets us back up to indifference between good outcomes and apocalypses but it may want to modify itself into a maximizer and there's nothing to stop it from doing that the situation doesn't look great so let's try looking at something completely different let's try to get away from this utility function stuff that seems so dangerous what if we just tried to directly imitate humans if we can get enough data about human behavior maybe we can train a model that for any given situation predicts what a human being would do in that scenario if the model's good enough you've basically got a human level agi right it's able to do a wide range of cognitive tasks just like a human can because it's just exactly copying humans that kind of system won't do a lot of the dangerous counterproductive things that a maximizer would do simply because a human wouldn't do them but i wouldn't exactly call it safe because a perfect imitation of a human isn't safer than the human it's perfectly imitating and humans aren't really safe in principle a truly safe agi could be given just about any level of power and responsibility and it would tend to produce good outcomes but the same can't really be said for humans and an imperfect human imitation would almost certainly be even worse i mean what are the chances that introducing random errors and inaccuracies to the imitation would just happen to make it more safe rather than less still it does seem like it would be safer than a utility maximizer at least we're out of guaranteed apocalypse territory but the other thing that makes this kind of approach unsatisfactory is a human imitation can't exceed human capabilities by much because it's just copying them a big part of why we want agi in the first place is to get it to solve problems that we can't you might be able to run the thing faster to allow it more thinking time or something like that but that's a pretty limited form of super intelligence and you have to be very careful with anything along those lines because it means putting the system in a situation that's very different from anything any human being has ever experienced your model might not generalize well to a situation so different from anything in its training data which could lead to unpredictable and potentially dangerous behavior relatively recently a new approach was proposed called quantalizing the idea is that this lets you combine human imitation and expected utility maximization to hopefully get some of the advantages of both without all of the downsides it works like this you have your human imitation model given a situation it can give you a probability distribution over actions that's like for each of the possible actions you could take in this situation how likely is it that a human would take that action so in our stamp collecting example that would be if a human were trying to collect a lot of stamps how likely would they be to do this action then you have whatever system you'd use for a utility maximizer that's able to figure out the expected utility of different actions according to some utility function for any given action it can tell you how much utility you'd expect to get if you did that so in our example that's how many stamps would you expect this action to result in so for every action you have these two numbers the human probability and the expected utility quantalizing sort of mixes these together and you get to choose how they're mixed with a variable that we'll call q if q is zero the system acts like an expected utility maximizer if it's one the system acts like a human imitation by setting it somewhere in between we can hopefully get a quantizer that's more effective than the human imitation but not as dangerous as the utility maximizer so what exactly is a quantizer let's look at the definition in the paper a q quantilyzer is an agent that when faced with a decision problem returns a random action in the top q proportion of some base distribution over actions sorted by the expected utility achieved if that action is executed so let's break this down and go through how it works step by step first we pick a value for q the variable that determines how we're going to mix imitation and utility maximization let's set it to 0.1 for this example 10 now we take all of the available actions and sort them by expected utility so on one end you've got the actions that kick off all of the crazy extreme utility maximizing strategies you know killing everyone and turning the whole world into stamps all the way down through the moderate strategies like buying some stamps and down to all of the strategies that do nothing and collect no stamps at all then we look at our base distribution over actions what is that in the version i'm talking about we're using the human imitation system's probability distribution over actions for this so our base distribution is how likely a human is to do each action that might look something like this no human is ever going to try the wacky extreme maximizing strategies so our human imitator gives them a probability of basically zero then there are some really good strategies that humans probably won't think of but they might if they're really smart or lucky then a big bump of normal strategies that humans are quite likely to use that tend to do okay then tailing off into less and less good strategies and eventually stupider and stupider mistakes the humans are less and less likely to make then what we do is we find the point in our action list such that 10 of the probability mass is on the high expected utility side so that's what q is really changing it's where we make this cutoff note that it's not ten percent of the actions that would be over here it's ten percent of the probability mass then we throw away everything on the right all the stupid and useless choices we set them to zero and we keep the top ten percent now this is no longer a valid probability distribution because it only sums up to 0.1 so we multiply all of these by 10 so that the whole thing sums to 1 again and that's our final probability distribution which we sample from to get our chosen action so let's look at some different actions here and see how they do consider something like misremember your credit card details and keep trying to order stamps with the wrong number and you can't figure out why it's not working a human is reasonably likely to do that not very likely but we've all met people who point is a pure human imitation might do that but the expected utility maximizer can see that this results in very few stamps so it ends up low on the list and doesn't make the 10 cutoff so there are lots of mistakes that a human imitation might make that a quantalizer won't and note that for our stamp collecting utility function the worst case is zero stamps but you could imagine with other utility functions a human imitator could make arbitrarily bad mistakes that a quantizer would be able to avoid now the most common boring human strategies that the human imitator is very likely to use also don't make the cut off a 50 quantilizer would have a decent chance of going with one of them but a 10 quantizer aims higher than that the bulk of the probability mass for the 10 quantilyzer is in strategies that a human might try that works significantly better than average so the quantalizer is kind of like a human on a really good day it uses the power of the expected utility calculation to be more effective than a pure imitation of a human is it safe though after all many of the insane maximizing strategies are still in our distribution with hopefully small but still non-zero probabilities and in fact we multiplied them all by 10 when we renormalized if there's some chance that a human would go for an extreme utility maximizing strategy the 10 percent quantilizer is 10 times more likely than that but the probability will still be small unless you've chosen a very small value for q your quantalizer is much more likely to go for one of the reasonably high performing human plausible strategies and what about stability satisficers tend to want to turn themselves into maximizes does a quantizer have that problem well the human model should give that kind of strategy a very low probability a human is extremely unlikely to try to modify themselves into an expected utility maximizer to better pursue their goals humans can't really self-modify like that anyway but a human might try to build an expected utility maximizer rather than trying to become one that's kind of worrying since it's a plan that a human definitely might try that would result in extremely high expected utility so although a quantalizer might seem like a relatively safe system it still might end up building an unsafe one so how's our safety meter looking well it's progress let's keep working on it some of you may have noticed your questions in the youtube comments being answered by a mysterious bot named stampy the way that works is stampy cross posts youtube questions to the rob miles ai discord where me and a bunch of patrons discuss them and write replies oh yeah there's a discord now for patrons thank you to everyone on the discord who helps reply to comments and thank you to all of my patrons all of these amazing people in this video i'm especially thanking timothy lillarcrap thank you so much for your support and thank you all for watching i'll see you next time you

Original Description

How do you get an AI system that does better than a human could, without doing anything a human wouldn't? A follow-up to "Maximizers and Satisficers": https://youtu.be/Ao4jwLwT36M Links: The Paper: https://intelligence.org/files/QuantilizersSaferAlternative.pdf More about this area of research: https://www.alignmentforum.org/tag/mild-optimization Quantilizers: https://aisafety.info/questions/6380/ Softer quantilization: https://aisafety.info/questions/6449/ Satisficers: https://aisafety.info/questions/NK44/ What’s an optimizer: https://aisafety.info/questions/8C7W/ With thanks to my excellent Patreon supporters: https://www.patreon.com/robertskmiles Timothy Lillicrap Gladamas James Scott Worley Chad Jones Shevis Johnson JJ Hepboin Pedro A Ortega Said Polat Chris Canal Jake Ehrlich Kellen lask Francisco Tolmasky Michael Andregg David Reid Peter Rolf Teague Lasser Andrew Blackledge Frank Marsman Brad Brookshire Cam MacFarlane Vivek Nayak Jason Hise Phil Moyer Erik de Bruijn Alec Johnson Clemens Arbesser Ludwig Schubert Allen Faure Eric James Matheson Bayley Qeith Wreid jugettje dutchking Owen Campbell-Moore Atzin Espino-Murnane Johnny Vaughan Jacob Van Buren Jonatan R Ingvi Gautsson Michael Greve Tom O'Connor Laura Olds Jon Halliday Paul Hobbs Jeroen De Dauw Lupuleasa Ionuț Cooper Lawton Tim Neilson Eric Scammell Igor Keller Ben Glanton anul kumar sinha Duncan Orr Will Glynn Tyler Herrmann Tomas Sayder Ian Munro Jérôme Beaulieu Nathan Fish Taras Bobrovytsky Jeremy Vaskó Richárd Benjamin Watkin Sebastian Birjoveanu Andrew Harcourt Luc Ritchie Nicholas Guyett James Hinchcliffe 12tone Chris Beacham Zachary Gidwitz Nikita Kiriy Parker Andrew Schreiber Steve Trambert Mario Lois Abigail Novick heino hulsey-vincent Fionn Dmitri Afanasjev Marcel Ward Richárd Nagyfi Andrew Weir Kabs Miłosz Wierzbicki Tendayi Mawushe Jannik Olbrich Jake Fish Wr4thon Martin Ottosen Robert Hildebrandt Andy Kobre Poker Chen Kees Darko Sperac Paul Moffat Robert Valdimarsson Marco Tirabo
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Robert Miles AI Safety · Robert Miles AI Safety · 30 of 47

1 Predicting AI: RIP Prof. Hubert Dreyfus
Predicting AI: RIP Prof. Hubert Dreyfus
Robert Miles AI Safety
2 Respectability
Respectability
Robert Miles AI Safety
3 Are AI Risks like Nuclear Risks?
Are AI Risks like Nuclear Risks?
Robert Miles AI Safety
4 Avoiding Negative Side Effects: Concrete Problems in AI Safety part 1
Avoiding Negative Side Effects: Concrete Problems in AI Safety part 1
Robert Miles AI Safety
5 Avoiding Positive Side Effects: Concrete Problems in AI Safety part 1.5
Avoiding Positive Side Effects: Concrete Problems in AI Safety part 1.5
Robert Miles AI Safety
6 Empowerment: Concrete Problems in AI Safety part 2
Empowerment: Concrete Problems in AI Safety part 2
Robert Miles AI Safety
7 Why Not Just: Raise AI Like Kids?
Why Not Just: Raise AI Like Kids?
Robert Miles AI Safety
8 Reward Hacking: Concrete Problems in AI Safety Part 3
Reward Hacking: Concrete Problems in AI Safety Part 3
Robert Miles AI Safety
9 The other "Killer Robot Arms Race" Elon Musk should worry about
The other "Killer Robot Arms Race" Elon Musk should worry about
Robert Miles AI Safety
10 Reward Hacking Reloaded: Concrete Problems in AI Safety Part 3.5
Reward Hacking Reloaded: Concrete Problems in AI Safety Part 3.5
Robert Miles AI Safety
11 What Can We Do About Reward Hacking?: Concrete Problems in AI Safety Part 4
What Can We Do About Reward Hacking?: Concrete Problems in AI Safety Part 4
Robert Miles AI Safety
12 What can AGI do? I/O and Speed
What can AGI do? I/O and Speed
Robert Miles AI Safety
13 AI learns to Create  ̵K̵Z̵F̵ ̵V̵i̵d̵e̵o̵s̵ Cat Pictures: Papers in Two Minutes #1
AI learns to Create ̵K̵Z̵F̵ ̵V̵i̵d̵e̵o̵s̵ Cat Pictures: Papers in Two Minutes #1
Robert Miles AI Safety
14 AI Safety at EAGlobal2017 Conference
AI Safety at EAGlobal2017 Conference
Robert Miles AI Safety
15 Scalable Supervision: Concrete Problems in AI Safety Part 5
Scalable Supervision: Concrete Problems in AI Safety Part 5
Robert Miles AI Safety
16 Superintelligence Mod for Civilization V
Superintelligence Mod for Civilization V
Robert Miles AI Safety
17 Why Would AI Want to do Bad Things? Instrumental Convergence
Why Would AI Want to do Bad Things? Instrumental Convergence
Robert Miles AI Safety
18 Experts' Predictions about the Future of AI
Experts' Predictions about the Future of AI
Robert Miles AI Safety
19 AI Safety Gridworlds
AI Safety Gridworlds
Robert Miles AI Safety
20 Friend or Foe? AI Safety Gridworlds extra bit
Friend or Foe? AI Safety Gridworlds extra bit
Robert Miles AI Safety
21 Safe Exploration: Concrete Problems in AI Safety Part 6
Safe Exploration: Concrete Problems in AI Safety Part 6
Robert Miles AI Safety
22 Why Not Just: Think of AGI Like a Corporation?
Why Not Just: Think of AGI Like a Corporation?
Robert Miles AI Safety
23 How to Keep Improving When You're Better Than Any Teacher - Iterated Distillation and Amplification
How to Keep Improving When You're Better Than Any Teacher - Iterated Distillation and Amplification
Robert Miles AI Safety
24 Is AI Safety a Pascal's Mugging?
Is AI Safety a Pascal's Mugging?
Robert Miles AI Safety
25 AI That Doesn't Try Too Hard - Maximizers and Satisficers
AI That Doesn't Try Too Hard - Maximizers and Satisficers
Robert Miles AI Safety
26 Training AI Without Writing A Reward Function, with Reward Modelling
Training AI Without Writing A Reward Function, with Reward Modelling
Robert Miles AI Safety
27 9 Examples of Specification Gaming
9 Examples of Specification Gaming
Robert Miles AI Safety
28 10 Reasons to Ignore AI Safety
10 Reasons to Ignore AI Safety
Robert Miles AI Safety
29 Sharing the Benefits of AI: The Windfall Clause
Sharing the Benefits of AI: The Windfall Clause
Robert Miles AI Safety
Quantilizers: AI That Doesn't Try Too Hard
Quantilizers: AI That Doesn't Try Too Hard
Robert Miles AI Safety
31 The OTHER AI Alignment Problem: Mesa-Optimizers and Inner Alignment
The OTHER AI Alignment Problem: Mesa-Optimizers and Inner Alignment
Robert Miles AI Safety
32 Deceptive Misaligned Mesa-Optimisers? It's More Likely Than You Think...
Deceptive Misaligned Mesa-Optimisers? It's More Likely Than You Think...
Robert Miles AI Safety
33 Intro to AI Safety, Remastered
Intro to AI Safety, Remastered
Robert Miles AI Safety
34 We Were Right! Real Inner Misalignment
We Were Right! Real Inner Misalignment
Robert Miles AI Safety
35 Apply to AI Safety Camp! #shorts
Apply to AI Safety Camp! #shorts
Robert Miles AI Safety
36 Win $50k for Solving a Single AI Problem? #Shorts
Win $50k for Solving a Single AI Problem? #Shorts
Robert Miles AI Safety
37 Free ML Bootcamp for Alignment #shorts
Free ML Bootcamp for Alignment #shorts
Robert Miles AI Safety
38 Apply Now for a Paid Residency on Interpretability #short
Apply Now for a Paid Residency on Interpretability #short
Robert Miles AI Safety
39 Why Does AI Lie, and What Can We Do About It?
Why Does AI Lie, and What Can We Do About It?
Robert Miles AI Safety
40 Apply to Study AI Safety Now! #shorts
Apply to Study AI Safety Now! #shorts
Robert Miles AI Safety
41 AI Ruined My Year
AI Ruined My Year
Robert Miles AI Safety
42 Learn AI Safety at MATS #shorts
Learn AI Safety at MATS #shorts
Robert Miles AI Safety
43 Using Dangerous AI, But Safely?
Using Dangerous AI, But Safely?
Robert Miles AI Safety
44 AI Safety Career Advice! (And So Can You!)
AI Safety Career Advice! (And So Can You!)
Robert Miles AI Safety
45 Robot Dog! Unitree Go2 review #shorts #robot #dog
Robot Dog! Unitree Go2 review #shorts #robot #dog
Robert Miles AI Safety
46 Tech is Good, AI Will Be Different
Tech is Good, AI Will Be Different
Robert Miles AI Safety
47 Apply for the Affine Superintelligence Alignment Seminar #shorts
Apply for the Affine Superintelligence Alignment Seminar #shorts
Robert Miles AI Safety

The video discusses Quantilizers, an AI system that combines human imitation and expected utility maximization to achieve better outcomes without extreme utility maximizing strategies. Quantilizers have the potential to be a safer alternative to traditional AI approaches. By understanding how Quantilizers work, researchers and developers can design and implement safer AI systems that align with human values.

Key Takeaways
  1. Pick a value for q
  2. Sort actions by expected utility
  3. Take the top q proportion of the base distribution
  4. Throw away everything on the right
  5. Multiply by 10 to get a valid probability distribution
💡 Quantilizers can be used to develop AI systems that are more effective than human imitation alone, while avoiding the extreme utility maximizing strategies that can lead to unsafe outcomes.

Related Reads

📰
I Spent Weeks Looking for a Research Gap Before I Realized I Was Searching the Wrong Way
Learn how to effectively find research gaps by changing your approach, a crucial skill for AI researchers and academics
Medium · AI
📰
ICMI 2026 Reviews [D]
Learn how to interpret ICMI 2026 reviews and improve your paper's acceptance chances
Reddit r/MachineLearning
📰
Workshop submission for main conference paper under review [D]
Learn how to navigate submitting a paper to a non-archival workshop before the final decision of a main conference like ECCV
Reddit r/MachineLearning
📰
Kept context-switching between arxiv, OpenReview, GitHub, and HuggingFace for every paper, so I built this. Chrome extension + website with everything inline, plus citation graph + SPECTER2 neighbors. 3M papers, free, feedback welcome [P]
Streamline your research with a new Chrome extension and website that integrates 3M papers from arxiv, OpenReview, GitHub, and HuggingFace, including citation graphs and SPECTER2 neighbors, and provide feedback to improve it
Reddit r/MachineLearning
Up next
Indians Under House Arrest in America? 😱 Immigration Crisis Explained | SumanTV Classroom
SumanTV Classroom
Watch →