Paul Christiano — Preventing an AI takeover

Dwarkesh Patel · Advanced ·🛡️ AI Safety & Ethics ·2y ago

Skills: AI Alignment Basics95%AI Ethics & Policy80%

Talked with Paul Christiano (world’s leading AI safety researcher) about: * Does he regret inventing RLHF? * What do we want post-AGI world to look like (do we want to keep gods enslaved forever)? * Why he has relatively modest timelines (40% by 2040, 15% by 2030), * Why he’s leading the push to get to labs develop responsible scaling policies, & what it would take to prevent an AI coup or bioweapon, * His current research into a new proof system, and how this could solve alignment by explaining model's behavior, * and much more. 𝐎𝐏𝐄𝐍 𝐏𝐇𝐈𝐋𝐀𝐍𝐓𝐇𝐑𝐎𝐏𝐘 Open Philanthropy is currently hiring for twenty-two different roles to reduce catastrophic risks from fast-moving advances in AI and biotechnology, including grantmaking, research, and operations. For more information and to apply, please see this application: https://www.openphilanthropy.org/research/new-roles-on-our-gcr-team/ The deadline to apply is November 9th; make sure to check out those roles before they close: 𝐄𝐏𝐈𝐒𝐎𝐃𝐄 𝐋𝐈𝐍𝐊𝐒 * Transcript: https://www.dwarkeshpatel.com/p/paul-christiano * Apple Podcasts: https://podcasts.apple.com/us/podcast/paul-christiano-preventing-an-ai-takeover/id1516093381?i=1000633226398 * Spotify: https://open.spotify.com/episode/5vOuxDP246IG4t4K3EuEKj?si=VW7qTs8ZRHuQX9emnboGcA * Follow me on Twitter: https://twitter.com/dwarkesh_sp 𝐓𝐈𝐌𝐄𝐒𝐓𝐀𝐌𝐏𝐒 00:00:00 - What do we want post-AGI world to look like? 00:24:25 - Timelines 00:45:28 - Evolution vs gradient descent 00:54:53 - Misalignment and takeover 01:17:23 - Is alignment dual-use? 01:31:38 - Responsible scaling policies 01:58:25 - Paul’s alignment research 02:35:01 - Will this revolutionize theoretical CS and math? 02:46:11 - How Paul invented RLHF 02:55:10 - Disagreements with Carl Shulman 03:01:53 - Long TSMC but not NVIDIA

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Dwarkesh Patel · Dwarkesh Patel · 0 of 60

← Previous Next →

Rubik's Cube Encryption Demo

Rubik's Cube Encryption Demo

Bryan Caplan - Nurturing Orphaned Ideas, Education, and UBI

Bryan Caplan - Nurturing Orphaned Ideas, Education, and UBI

Matjaž Leonardis - Science, Identity and Probability

Matjaž Leonardis - Science, Identity and Probability

Robin Hanson - The Long View and The Elephant in the Brain

Robin Hanson - The Long View and The Elephant in the Brain

Caleb Watney - America's Innovation Engine

Caleb Watney - America's Innovation Engine

Alex Tabarrok - Prizes, Prices, and Public Goods

Alex Tabarrok - Prizes, Prices, and Public Goods

Scott Young - Ultralearning, The MIT Challenge

Scott Young - Ultralearning, The MIT Challenge

Scott Aaronson - Quantum Computing, Complexity, and Creativity

Scott Aaronson - Quantum Computing, Complexity, and Creativity

Uncle Bob - The Long Reach of Code, Automating Programming, and Developing Coding Talent

Uncle Bob - The Long Reach of Code, Automating Programming, and Developing Coding Talent

Michael Huemer - Anarchy, Capitalism, and Progress

Michael Huemer - Anarchy, Capitalism, and Progress

Sarah Fitz-Claridge - Taking Children Seriously | The Lunar Society #15

Sarah Fitz-Claridge - Taking Children Seriously | The Lunar Society #15

Byrne Hobart - Optionality, Stagnation, and Secret Societies

Byrne Hobart - Optionality, Stagnation, and Secret Societies

David Deutsch - AI, America, Fun, & Bayes

David Deutsch - AI, America, Fun, & Bayes

Bryan Caplan - Labor Econ, Poverty, & Mental Illness

Bryan Caplan - Labor Econ, Poverty, & Mental Illness

Jimmy Soni - Peter Thiel, Elon Musk, and the Paypal Mafia

Jimmy Soni - Peter Thiel, Elon Musk, and the Paypal Mafia

Razib Khan - Genomics, Intelligence, and The Church of Science

Razib Khan - Genomics, Intelligence, and The Church of Science

Pradyu Prasad - Imperial Japan, the God Emperor, and Militarization in the Modern World

Pradyu Prasad - Imperial Japan, the God Emperor, and Militarization in the Modern World

Manifold Markets Founder - Predictions Markets & Revolutionizing Governance

Manifold Markets Founder - Predictions Markets & Revolutionizing Governance

Ananyo Bhattacharya - John von Neumann, Jewish Genius, and Nuclear War

Ananyo Bhattacharya - John von Neumann, Jewish Genius, and Nuclear War

Agustin Lebron - Trading, Crypto, and Adverse Selection

Agustin Lebron - Trading, Crypto, and Adverse Selection

Sam Bankman-Fried - Crypto, FTX, Altruism, & Leadership

Sam Bankman-Fried - Crypto, FTX, Altruism, & Leadership

Alexander Mikaberidze - Napoleon, War, Progress, and Global Order

Alexander Mikaberidze - Napoleon, War, Progress, and Global Order

Sam Bankman-Fried On FOCUS

Sam Bankman-Fried On FOCUS

Sam Bankman-Fried on GREAT FOUNDERS

Sam Bankman-Fried on GREAT FOUNDERS

$30 BILLION Opportunity Ignored by Sam Bankman-Fried Competitors

$30 BILLION Opportunity Ignored by Sam Bankman-Fried Competitors

Fin Moorhouse - Longtermism, Space, & Entrepreneurship

Fin Moorhouse - Longtermism, Space, & Entrepreneurship

Joseph Carlsmith - Utopia, AI, & Infinite Ethics

Joseph Carlsmith - Utopia, AI, & Infinite Ethics

Will MacAskill - Longtermism, Effective Altruism, History, & Technology

Will MacAskill - Longtermism, Effective Altruism, History, & Technology

Steve Hsu - Intelligence, Embryo Selection, & The Future of Humanity

Steve Hsu - Intelligence, Embryo Selection, & The Future of Humanity

Austin Vernon - Energy Superabundance, Starship Missiles, & Finding Alpha

Austin Vernon - Energy Superabundance, Starship Missiles, & Finding Alpha

Charles C. Mann - Americas Before Columbus & Scientific Wizardry

Charles C. Mann - Americas Before Columbus & Scientific Wizardry

Tyler Cowen - Why Society Will Collapse & Why Sex is Pessimistic

Tyler Cowen - Why Society Will Collapse & Why Sex is Pessimistic

Bryan Caplan - Feminists, Billionaires, and Demagogues

Bryan Caplan - Feminists, Billionaires, and Demagogues

Brian Potter - Future of Construction, Ugly Modernism, & Environmental Review

Brian Potter - Future of Construction, Ugly Modernism, & Environmental Review

Kenneth T. Jackson - Robert Moses, Hero of New York?

Kenneth T. Jackson - Robert Moses, Hero of New York?

Edward Glaeser - Cities, Terrorism, Housing, & Remote Work

Edward Glaeser - Cities, Terrorism, Housing, & Remote Work

Byrne Hobart - FTX, Drugs, Twitter, Taiwan, & Monasticism

Byrne Hobart - FTX, Drugs, Twitter, Taiwan, & Monasticism

Nadia Asparouhova — Tech elites, democracy, open source, & philanthropy

Nadia Asparouhova — Tech elites, democracy, open source, & philanthropy

Bethany McLean — Enron, FTX, 2008, Musk, frauds, & visionaries

Bethany McLean — Enron, FTX, 2008, Musk, frauds, & visionaries

Holden Karnofsky — History's most important century

Holden Karnofsky — History's most important century

$30m Grant to OpenAI?

$30m Grant to OpenAI?

Does GPT Have Holden Worried?

Does GPT Have Holden Worried?

Lars Doucet — Progress, poverty, Georgism, & why rent is too damn high

Lars Doucet — Progress, poverty, Georgism, & why rent is too damn high

Deep Learning Changes Everything

Deep Learning Changes Everything

Garett Jones — Immigration, national IQ, & less democracy

Garett Jones — Immigration, national IQ, & less democracy

Marc Andreessen — AI, crypto, 1000 Elon Musks, regrets, vulnerabilities, & managerial revolution

Marc Andreessen — AI, crypto, 1000 Elon Musks, regrets, vulnerabilities, & managerial revolution

Why You Shouldn't Start A Startup

Why You Shouldn't Start A Startup

The Future Of Venture Capital

The Future Of Venture Capital

The Crucial Skill For A Startup Founder

The Crucial Skill For A Startup Founder

Brett Harrison — FTX US former president speaks out

Brett Harrison — FTX US former president speaks out

Nat Friedman (Github CEO) — Reading ancient scrolls, open source, & AI

Nat Friedman (Github CEO) — Reading ancient scrolls, open source, & AI

Ilya Sutskever (OpenAI Chief Scientist) — Why next-token prediction could surpass human intelligence

Ilya Sutskever (OpenAI Chief Scientist) — Why next-token prediction could surpass human intelligence

Impact of Taiwan Invasion on AI

Impact of Taiwan Invasion on AI

Reliability is Bottleneck on AI - OpenAI Founder

Reliability is Bottleneck on AI - OpenAI Founder

Next Token Prediction SOLVES AI Says OpenAI Founder

Next Token Prediction SOLVES AI Says OpenAI Founder

Harmful Uses of GPT - OpenAI Founder

Harmful Uses of GPT - OpenAI Founder

Why OpenAI Founder Thinks AI Is Near

Why OpenAI Founder Thinks AI Is Near

AI will help us achieve enlightenment - OpenAI Founder

AI will help us achieve enlightenment - OpenAI Founder

Eliezer Yudkowsky — Why AI will kill us, aligning LLMs, nature of intelligence, SciFi, & rationality

Eliezer Yudkowsky — Why AI will kill us, aligning LLMs, nature of intelligence, SciFi, & rationality

Richard Rhodes — The making of the atomic bomb

Richard Rhodes — The making of the atomic bomb

More on: AI Alignment Basics

View skill →

Interpretable machine learning applications: Part 5

Interpretable machine learning applications: Part 5

GenAI news from Weights & Biases CEO, Lukas Biewald

GenAI news from Weights & Biases CEO, Lukas Biewald

Weights & Biases

Responsible AI Winners, 2020 PyTorch Summer Hackathon

Responsible AI Winners, 2020 PyTorch Summer Hackathon

Near Real-Time Analytics to GenAI Centralized Observability | Amazon Web Services

Near Real-Time Analytics to GenAI Centralized Observability | Amazon Web Services

Amazon Web Services

Kiro Hooks | Event-Driven Automation for Your IDE | Amazon Web Services

Kiro Hooks | Event-Driven Automation for Your IDE | Amazon Web Services

Amazon Web Services

Get Started with Raven AGI

Get Started with Raven AGI

Related AI Lessons

AgentThreatBench: The First OWASP Agentic Top 10 Security Benchmark

Learn about AgentThreatBench, the first OWASP agentic top 10 security benchmark for AI safety, and how it addresses the community's blind spot

Dev.to · Vaishnavi Gudur

OpenAI adopts C2PA standard and Google’s SynthID to make AI-generated images easier to identify

OpenAI adopts C2PA standard and Google's SynthID to identify AI-generated images, enhancing transparency and authenticity

The Next Web AI

US regulators pause bank cyber exams so Wall Street can patch Mythos vulnerabilities

US banking regulators pause cyber exams to allow lenders to patch Mythos vulnerabilities, learn how to protect your organization's AI systems

The Next Web AI

The AI Failure Mode That Costs Professionals the Most (And How to Detect It)

Learn to detect AI failure modes that cost professionals 4.3 hours/week in fact-checking to boost productivity

Dev.to · Sarah Beaumont-Mercier

Chapters (11)

What do we want post-AGI world to look like?

24:25 Timelines

45:28 Evolution vs gradient descent

54:53 Misalignment and takeover

1:17:23 Is alignment dual-use?

1:31:38 Responsible scaling policies

1:58:25 Paul’s alignment research

2:35:01 Will this revolutionize theoretical CS and math?

2:46:11 How Paul invented RLHF

2:55:10 Disagreements with Carl Shulman

3:01:53 Long TSMC but not NVIDIA

Using AI to outsmart drug-resistant bacteria

Google DeepMind