Quantilizers: AI That Doesn't Try Too Hard

Robert Miles AI Safety · Advanced ·📄 Research Papers Explained ·5y ago

Skills: Research Methods90%Reading ML Papers90%AI Alignment Basics90%AI Ethics & Policy90%Paper Reproduction80%

Key Takeaways

The video discusses Quantilizers, an AI system that combines human imitation and expected utility maximization to achieve better outcomes without extreme utility maximizing strategies, and explores its potential as a safer alternative to traditional AI approaches.

Full Transcript

hi so way back in the before time i made a video about maximizers and satisfices the plan was that was going to be the first half of a two-parter now i did script out that second video and shoot it and even start to edit it and then certain events transpired and i never finished that video so that's what this is part two of a video that i started ages ago which i think most people have forgotten about so i do recommend going back and watching that video if you haven't already or even re-watching it to remind yourself so i'll put a link to that in the description and with that here's part two take it away past me hi in the previous video we looked at utility maximizers expected utility maximizers and satisfices using unbounded and bounded utility functions a powerful utility maximizer with an unbounded utility function is a guaranteed apocalypse with a bounded utility function it's better in that it's completely indifferent between doing what we want and disaster but we can't build that because it needs perfect prediction of the future so it's more realistic to consider an expected utility maximizer which is a guaranteed apocalypse even with a bounded utility function now an expected utility satisficer gets us back up to indifference between good outcomes and apocalypses but it may want to modify itself into a maximizer and there's nothing to stop it from doing that the situation doesn't look great so let's try looking at something completely different let's try to get away from this utility function stuff that seems so dangerous what if we just tried to directly imitate humans if we can get enough data about human behavior maybe we can train a model that for any given situation predicts what a human being would do in that scenario if the model's good enough you've basically got a human level agi right it's able to do a wide range of cognitive tasks just like a human can because it's just exactly copying humans that kind of system won't do a lot of the dangerous counterproductive things that a maximizer would do simply because a human wouldn't do them but i wouldn't exactly call it safe because a perfect imitation of a human isn't safer than the human it's perfectly imitating and humans aren't really safe in principle a truly safe agi could be given just about any level of power and responsibility and it would tend to produce good outcomes but the same can't really be said for humans and an imperfect human imitation would almost certainly be even worse i mean what are the chances that introducing random errors and inaccuracies to the imitation would just happen to make it more safe rather than less still it does seem like it would be safer than a utility maximizer at least we're out of guaranteed apocalypse territory but the other thing that makes this kind of approach unsatisfactory is a human imitation can't exceed human capabilities by much because it's just copying them a big part of why we want agi in the first place is to get it to solve problems that we can't you might be able to run the thing faster to allow it more thinking time or something like that but that's a pretty limited form of super intelligence and you have to be very careful with anything along those lines because it means putting the system in a situation that's very different from anything any human being has ever experienced your model might not generalize well to a situation so different from anything in its training data which could lead to unpredictable and potentially dangerous behavior relatively recently a new approach was proposed called quantalizing the idea is that this lets you combine human imitation and expected utility maximization to hopefully get some of the advantages of both without all of the downsides it works like this you have your human imitation model given a situation it can give you a probability distribution over actions that's like for each of the possible actions you could take in this situation how likely is it that a human would take that action so in our stamp collecting example that would be if a human were trying to collect a lot of stamps how likely would they be to do this action then you have whatever system you'd use for a utility maximizer that's able to figure out the expected utility of different actions according to some utility function for any given action it can tell you how much utility you'd expect to get if you did that so in our example that's how many stamps would you expect this action to result in so for every action you have these two numbers the human probability and the expected utility quantalizing sort of mixes these together and you get to choose how they're mixed with a variable that we'll call q if q is zero the system acts like an expected utility maximizer if it's one the system acts like a human imitation by setting it somewhere in between we can hopefully get a quantizer that's more effective than the human imitation but not as dangerous as the utility maximizer so what exactly is a quantizer let's look at the definition in the paper a q quantilyzer is an agent that when faced with a decision problem returns a random action in the top q proportion of some base distribution over actions sorted by the expected utility achieved if that action is executed so let's break this down and go through how it works step by step first we pick a value for q the variable that determines how we're going to mix imitation and utility maximization let's set it to 0.1 for this example 10 now we take all of the available actions and sort them by expected utility so on one end you've got the actions that kick off all of the crazy extreme utility maximizing strategies you know killing everyone and turning the whole world into stamps all the way down through the moderate strategies like buying some stamps and down to all of the strategies that do nothing and collect no stamps at all then we look at our base distribution over actions what is that in the version i'm talking about we're using the human imitation system's probability distribution over actions for this so our base distribution is how likely a human is to do each action that might look something like this no human is ever going to try the wacky extreme maximizing strategies so our human imitator gives them a probability of basically zero then there are some really good strategies that humans probably won't think of but they might if they're really smart or lucky then a big bump of normal strategies that humans are quite likely to use that tend to do okay then tailing off into less and less good strategies and eventually stupider and stupider mistakes the humans are less and less likely to make then what we do is we find the point in our action list such that 10 of the probability mass is on the high expected utility side so that's what q is really changing it's where we make this cutoff note that it's not ten percent of the actions that would be over here it's ten percent of the probability mass then we throw away everything on the right all the stupid and useless choices we set them to zero and we keep the top ten percent now this is no longer a valid probability distribution because it only sums up to 0.1 so we multiply all of these by 10 so that the whole thing sums to 1 again and that's our final probability distribution which we sample from to get our chosen action so let's look at some different actions here and see how they do consider something like misremember your credit card details and keep trying to order stamps with the wrong number and you can't figure out why it's not working a human is reasonably likely to do that not very likely but we've all met people who point is a pure human imitation might do that but the expected utility maximizer can see that this results in very few stamps so it ends up low on the list and doesn't make the 10 cutoff so there are lots of mistakes that a human imitation might make that a quantalizer won't and note that for our stamp collecting utility function the worst case is zero stamps but you could imagine with other utility functions a human imitator could make arbitrarily bad mistakes that a quantizer would be able to avoid now the most common boring human strategies that the human imitator is very likely to use also don't make the cut off a 50 quantilizer would have a decent chance of going with one of them but a 10 quantizer aims higher than that the bulk of the probability mass for the 10 quantilyzer is in strategies that a human might try that works significantly better than average so the quantalizer is kind of like a human on a really good day it uses the power of the expected utility calculation to be more effective than a pure imitation of a human is it safe though after all many of the insane maximizing strategies are still in our distribution with hopefully small but still non-zero probabilities and in fact we multiplied them all by 10 when we renormalized if there's some chance that a human would go for an extreme utility maximizing strategy the 10 percent quantilizer is 10 times more likely than that but the probability will still be small unless you've chosen a very small value for q your quantalizer is much more likely to go for one of the reasonably high performing human plausible strategies and what about stability satisficers tend to want to turn themselves into maximizes does a quantizer have that problem well the human model should give that kind of strategy a very low probability a human is extremely unlikely to try to modify themselves into an expected utility maximizer to better pursue their goals humans can't really self-modify like that anyway but a human might try to build an expected utility maximizer rather than trying to become one that's kind of worrying since it's a plan that a human definitely might try that would result in extremely high expected utility so although a quantalizer might seem like a relatively safe system it still might end up building an unsafe one so how's our safety meter looking well it's progress let's keep working on it some of you may have noticed your questions in the youtube comments being answered by a mysterious bot named stampy the way that works is stampy cross posts youtube questions to the rob miles ai discord where me and a bunch of patrons discuss them and write replies oh yeah there's a discord now for patrons thank you to everyone on the discord who helps reply to comments and thank you to all of my patrons all of these amazing people in this video i'm especially thanking timothy lillarcrap thank you so much for your support and thank you all for watching i'll see you next time you

Original Description

How do you get an AI system that does better than a human could, without doing anything a human wouldn't? A follow-up to "Maximizers and Satisficers": https://youtu.be/Ao4jwLwT36M Links: The Paper: https://intelligence.org/files/QuantilizersSaferAlternative.pdf More about this area of research: https://www.alignmentforum.org/tag/mild-optimization Quantilizers: https://aisafety.info/questions/6380/ Softer quantilization: https://aisafety.info/questions/6449/ Satisficers: https://aisafety.info/questions/NK44/ What’s an optimizer: https://aisafety.info/questions/8C7W/ With thanks to my excellent Patreon supporters: https://www.patreon.com/robertskmiles Timothy Lillicrap Gladamas James Scott Worley Chad Jones Shevis Johnson JJ Hepboin Pedro A Ortega Said Polat Chris Canal Jake Ehrlich Kellen lask Francisco Tolmasky Michael Andregg David Reid Peter Rolf Teague Lasser Andrew Blackledge Frank Marsman Brad Brookshire Cam MacFarlane Vivek Nayak Jason Hise Phil Moyer Erik de Bruijn Alec Johnson Clemens Arbesser Ludwig Schubert Allen Faure Eric James Matheson Bayley Qeith Wreid jugettje dutchking Owen Campbell-Moore Atzin Espino-Murnane Johnny Vaughan Jacob Van Buren Jonatan R Ingvi Gautsson Michael Greve Tom O'Connor Laura Olds Jon Halliday Paul Hobbs Jeroen De Dauw Lupuleasa Ionuț Cooper Lawton Tim Neilson Eric Scammell Igor Keller Ben Glanton anul kumar sinha Duncan Orr Will Glynn Tyler Herrmann Tomas Sayder Ian Munro Jérôme Beaulieu Nathan Fish Taras Bobrovytsky Jeremy Vaskó Richárd Benjamin Watkin Sebastian Birjoveanu Andrew Harcourt Luc Ritchie Nicholas Guyett James Hinchcliffe 12tone Chris Beacham Zachary Gidwitz Nikita Kiriy Parker Andrew Schreiber Steve Trambert Mario Lois Abigail Novick heino hulsey-vincent Fionn Dmitri Afanasjev Marcel Ward Richárd Nagyfi Andrew Weir Kabs Miłosz Wierzbicki Tendayi Mawushe Jannik Olbrich Jake Fish Wr4thon Martin Ottosen Robert Hildebrandt Andy Kobre Poker Chen Kees Darko Sperac Paul Moffat Robert Valdimarsson Marco Tirabo

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Robert Miles AI Safety · Robert Miles AI Safety · 30 of 47

← Previous Next →

Predicting AI: RIP Prof. Hubert Dreyfus

Predicting AI: RIP Prof. Hubert Dreyfus

Robert Miles AI Safety

Robert Miles AI Safety

Are AI Risks like Nuclear Risks?

Are AI Risks like Nuclear Risks?

Robert Miles AI Safety

Avoiding Negative Side Effects: Concrete Problems in AI Safety part 1

Avoiding Negative Side Effects: Concrete Problems in AI Safety part 1

Robert Miles AI Safety

Avoiding Positive Side Effects: Concrete Problems in AI Safety part 1.5

Avoiding Positive Side Effects: Concrete Problems in AI Safety part 1.5

Robert Miles AI Safety

Empowerment: Concrete Problems in AI Safety part 2

Empowerment: Concrete Problems in AI Safety part 2

Robert Miles AI Safety

Why Not Just: Raise AI Like Kids?

Why Not Just: Raise AI Like Kids?

Robert Miles AI Safety

Reward Hacking: Concrete Problems in AI Safety Part 3

Reward Hacking: Concrete Problems in AI Safety Part 3

Robert Miles AI Safety

The other "Killer Robot Arms Race" Elon Musk should worry about

The other "Killer Robot Arms Race" Elon Musk should worry about

Robert Miles AI Safety

Reward Hacking Reloaded: Concrete Problems in AI Safety Part 3.5

Reward Hacking Reloaded: Concrete Problems in AI Safety Part 3.5

Robert Miles AI Safety

What Can We Do About Reward Hacking?: Concrete Problems in AI Safety Part 4

What Can We Do About Reward Hacking?: Concrete Problems in AI Safety Part 4

Robert Miles AI Safety

What can AGI do? I/O and Speed

What can AGI do? I/O and Speed

Robert Miles AI Safety

AI learns to Create ̵K̵Z̵F̵ ̵V̵i̵d̵e̵o̵s̵ Cat Pictures: Papers in Two Minutes #1

AI learns to Create ̵K̵Z̵F̵ ̵V̵i̵d̵e̵o̵s̵ Cat Pictures: Papers in Two Minutes #1

Robert Miles AI Safety

AI Safety at EAGlobal2017 Conference

AI Safety at EAGlobal2017 Conference

Robert Miles AI Safety

Scalable Supervision: Concrete Problems in AI Safety Part 5

Scalable Supervision: Concrete Problems in AI Safety Part 5

Robert Miles AI Safety

Superintelligence Mod for Civilization V

Superintelligence Mod for Civilization V

Robert Miles AI Safety

Why Would AI Want to do Bad Things? Instrumental Convergence

Why Would AI Want to do Bad Things? Instrumental Convergence

Robert Miles AI Safety

Experts' Predictions about the Future of AI

Experts' Predictions about the Future of AI

Robert Miles AI Safety

AI Safety Gridworlds

AI Safety Gridworlds

Robert Miles AI Safety

Friend or Foe? AI Safety Gridworlds extra bit

Friend or Foe? AI Safety Gridworlds extra bit

Robert Miles AI Safety

Safe Exploration: Concrete Problems in AI Safety Part 6

Safe Exploration: Concrete Problems in AI Safety Part 6

Robert Miles AI Safety

Why Not Just: Think of AGI Like a Corporation?

Why Not Just: Think of AGI Like a Corporation?

Robert Miles AI Safety

How to Keep Improving When You're Better Than Any Teacher - Iterated Distillation and Amplification

How to Keep Improving When You're Better Than Any Teacher - Iterated Distillation and Amplification

Robert Miles AI Safety

Is AI Safety a Pascal's Mugging?

Is AI Safety a Pascal's Mugging?

Robert Miles AI Safety

AI That Doesn't Try Too Hard - Maximizers and Satisficers

AI That Doesn't Try Too Hard - Maximizers and Satisficers

Robert Miles AI Safety

Training AI Without Writing A Reward Function, with Reward Modelling

Training AI Without Writing A Reward Function, with Reward Modelling

Robert Miles AI Safety

9 Examples of Specification Gaming

9 Examples of Specification Gaming

Robert Miles AI Safety

10 Reasons to Ignore AI Safety

10 Reasons to Ignore AI Safety

Robert Miles AI Safety

Sharing the Benefits of AI: The Windfall Clause

Sharing the Benefits of AI: The Windfall Clause

Robert Miles AI Safety

Quantilizers: AI That Doesn't Try Too Hard

Quantilizers: AI That Doesn't Try Too Hard

Robert Miles AI Safety

The OTHER AI Alignment Problem: Mesa-Optimizers and Inner Alignment

The OTHER AI Alignment Problem: Mesa-Optimizers and Inner Alignment

Robert Miles AI Safety

Deceptive Misaligned Mesa-Optimisers? It's More Likely Than You Think...

Deceptive Misaligned Mesa-Optimisers? It's More Likely Than You Think...

Robert Miles AI Safety

Intro to AI Safety, Remastered

Intro to AI Safety, Remastered

Robert Miles AI Safety

We Were Right! Real Inner Misalignment

We Were Right! Real Inner Misalignment

Robert Miles AI Safety

Apply to AI Safety Camp! #shorts

Apply to AI Safety Camp! #shorts

Robert Miles AI Safety

Win $50k for Solving a Single AI Problem? #Shorts

Win $50k for Solving a Single AI Problem? #Shorts

Robert Miles AI Safety

Free ML Bootcamp for Alignment #shorts

Free ML Bootcamp for Alignment #shorts

Robert Miles AI Safety

Apply Now for a Paid Residency on Interpretability #short

Apply Now for a Paid Residency on Interpretability #short

Robert Miles AI Safety

Why Does AI Lie, and What Can We Do About It?

Why Does AI Lie, and What Can We Do About It?

Robert Miles AI Safety

Apply to Study AI Safety Now! #shorts

Apply to Study AI Safety Now! #shorts

Robert Miles AI Safety

AI Ruined My Year

AI Ruined My Year

Robert Miles AI Safety

Learn AI Safety at MATS #shorts

Learn AI Safety at MATS #shorts

Robert Miles AI Safety

Using Dangerous AI, But Safely?

Using Dangerous AI, But Safely?

Robert Miles AI Safety

AI Safety Career Advice! (And So Can You!)

AI Safety Career Advice! (And So Can You!)

Robert Miles AI Safety

Robot Dog! Unitree Go2 review #shorts #robot #dog

Robot Dog! Unitree Go2 review #shorts #robot #dog

Robert Miles AI Safety

Tech is Good, AI Will Be Different

Tech is Good, AI Will Be Different

Robert Miles AI Safety

Apply for the Affine Superintelligence Alignment Seminar #shorts

Apply for the Affine Superintelligence Alignment Seminar #shorts

Robert Miles AI Safety

The video discusses Quantilizers, an AI system that combines human imitation and expected utility maximization to achieve better outcomes without extreme utility maximizing strategies. Quantilizers have the potential to be a safer alternative to traditional AI approaches. By understanding how Quantilizers work, researchers and developers can design and implement safer AI systems that align with human values.

Key Takeaways

Pick a value for q
Sort actions by expected utility
Take the top q proportion of the base distribution
Throw away everything on the right
Multiply by 10 to get a valid probability distribution

💡 Quantilizers can be used to develop AI systems that are more effective than human imitation alone, while avoiding the extreme utility maximizing strategies that can lead to unsafe outcomes.

🔒 Pro feature: Ask AI to explain this lesson →

More on: Research Methods

View skill →

Mechanics of Materials III: Beam Bending

Mechanics of Materials III: Beam Bending

Inaugural Lecture: Juliane Reinecke

Inaugural Lecture: Juliane Reinecke

Saïd Business School, University of Oxford

Hands-On Learning: How and Why You Should Build a Home Lab

Hands-On Learning: How and Why You Should Build a Home Lab

SANS Live Online Interactive Remote Lab and Range Demo – SEC599: Defeating Advanced Adversaries

SANS Live Online Interactive Remote Lab and Range Demo – SEC599: Defeating Advanced Adversaries

Does Water Swirl the Other Way in the Southern Hemisphere?

Does Water Swirl the Other Way in the Southern Hemisphere?

Undergraduate Research Forum 2026

Undergraduate Research Forum 2026

Related Reads

On July 1, 2026, arXiv will spin out from Cornell University, its home for the past 25 years, to become an independent nonprofit organization. Major funding support from Simons Foundation and Schmidt Sciences. Ditching the red for their website. [N]

arXiv is becoming an independent nonprofit organization after 25 years at Cornell University, backed by major funding, which will impact the future of research and academia

Reddit r/MachineLearning

CS-NRRM™ Official Publications: Paper 1 and Paper 2 Are Now Available

Learn about the CS-NRRM's official publications on a 12-year longitudinal human observation archive and its significance in research and development

Medium · Data Science

Found a potential mistake in an ICLR 2026 blogpost [D]

Verify a potential mistake in an ICLR 2026 blog post and learn how to effectively report errors in academic publications

Reddit r/MachineLearning

Rebuttals Move Peer-Review Scores, but Initial-Review Structure Bounds the Movement

Learn how author rebuttals impact peer-review scores and the factors that influence their effectiveness in ICLR 2024-2025, using LLMs for measurement

Indians Under House Arrest in America? 😱 Immigration Crisis Explained | SumanTV Classroom

SumanTV Classroom