Paul Christiano — Preventing an AI takeover
Talked with Paul Christiano (world’s leading AI safety researcher) about:
* Does he regret inventing RLHF?
* What do we want post-AGI world to look like (do we want to keep gods enslaved forever)?
* Why he has relatively modest timelines (40% by 2040, 15% by 2030),
* Why he’s leading the push to get to labs develop responsible scaling policies, & what it would take to prevent an AI coup or bioweapon,
* His current research into a new proof system, and how this could solve alignment by explaining model's behavior,
* and much more.
𝐎𝐏𝐄𝐍 𝐏𝐇𝐈𝐋𝐀𝐍𝐓𝐇𝐑𝐎𝐏𝐘
Open Philanthropy is currently hiring for twenty-two different roles to reduce catastrophic risks from fast-moving advances in AI and biotechnology, including grantmaking, research, and operations. For more information and to apply, please see this application: https://www.openphilanthropy.org/research/new-roles-on-our-gcr-team/
The deadline to apply is November 9th; make sure to check out those roles before they close:
𝐄𝐏𝐈𝐒𝐎𝐃𝐄 𝐋𝐈𝐍𝐊𝐒
* Transcript: https://www.dwarkeshpatel.com/p/paul-christiano
* Apple Podcasts: https://podcasts.apple.com/us/podcast/paul-christiano-preventing-an-ai-takeover/id1516093381?i=1000633226398
* Spotify: https://open.spotify.com/episode/5vOuxDP246IG4t4K3EuEKj?si=VW7qTs8ZRHuQX9emnboGcA
* Follow me on Twitter: https://twitter.com/dwarkesh_sp
𝐓𝐈𝐌𝐄𝐒𝐓𝐀𝐌𝐏𝐒
00:00:00 - What do we want post-AGI world to look like?
00:24:25 - Timelines
00:45:28 - Evolution vs gradient descent
00:54:53 - Misalignment and takeover
01:17:23 - Is alignment dual-use?
01:31:38 - Responsible scaling policies
01:58:25 - Paul’s alignment research
02:35:01 - Will this revolutionize theoretical CS and math?
02:46:11 - How Paul invented RLHF
02:55:10 - Disagreements with Carl Shulman
03:01:53 - Long TSMC but not NVIDIA
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Dwarkesh Patel · Dwarkesh Patel · 0 of 60
← Previous
Next →
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Rubik's Cube Encryption Demo
Dwarkesh Patel
Bryan Caplan - Nurturing Orphaned Ideas, Education, and UBI
Dwarkesh Patel
Matjaž Leonardis - Science, Identity and Probability
Dwarkesh Patel
Robin Hanson - The Long View and The Elephant in the Brain
Dwarkesh Patel
Caleb Watney - America's Innovation Engine
Dwarkesh Patel
Alex Tabarrok - Prizes, Prices, and Public Goods
Dwarkesh Patel
Scott Young - Ultralearning, The MIT Challenge
Dwarkesh Patel
Scott Aaronson - Quantum Computing, Complexity, and Creativity
Dwarkesh Patel
Uncle Bob - The Long Reach of Code, Automating Programming, and Developing Coding Talent
Dwarkesh Patel
Michael Huemer - Anarchy, Capitalism, and Progress
Dwarkesh Patel
Sarah Fitz-Claridge - Taking Children Seriously | The Lunar Society #15
Dwarkesh Patel
Byrne Hobart - Optionality, Stagnation, and Secret Societies
Dwarkesh Patel
David Deutsch - AI, America, Fun, & Bayes
Dwarkesh Patel
Bryan Caplan - Labor Econ, Poverty, & Mental Illness
Dwarkesh Patel
Jimmy Soni - Peter Thiel, Elon Musk, and the Paypal Mafia
Dwarkesh Patel
Razib Khan - Genomics, Intelligence, and The Church of Science
Dwarkesh Patel
Pradyu Prasad - Imperial Japan, the God Emperor, and Militarization in the Modern World
Dwarkesh Patel
Manifold Markets Founder - Predictions Markets & Revolutionizing Governance
Dwarkesh Patel
Ananyo Bhattacharya - John von Neumann, Jewish Genius, and Nuclear War
Dwarkesh Patel
Agustin Lebron - Trading, Crypto, and Adverse Selection
Dwarkesh Patel
Sam Bankman-Fried - Crypto, FTX, Altruism, & Leadership
Dwarkesh Patel
Alexander Mikaberidze - Napoleon, War, Progress, and Global Order
Dwarkesh Patel
Sam Bankman-Fried On FOCUS
Dwarkesh Patel
Sam Bankman-Fried on GREAT FOUNDERS
Dwarkesh Patel
$30 BILLION Opportunity Ignored by Sam Bankman-Fried Competitors
Dwarkesh Patel
Fin Moorhouse - Longtermism, Space, & Entrepreneurship
Dwarkesh Patel
Joseph Carlsmith - Utopia, AI, & Infinite Ethics
Dwarkesh Patel
Will MacAskill - Longtermism, Effective Altruism, History, & Technology
Dwarkesh Patel
Steve Hsu - Intelligence, Embryo Selection, & The Future of Humanity
Dwarkesh Patel
Austin Vernon - Energy Superabundance, Starship Missiles, & Finding Alpha
Dwarkesh Patel
Charles C. Mann - Americas Before Columbus & Scientific Wizardry
Dwarkesh Patel
Tyler Cowen - Why Society Will Collapse & Why Sex is Pessimistic
Dwarkesh Patel
Bryan Caplan - Feminists, Billionaires, and Demagogues
Dwarkesh Patel
Brian Potter - Future of Construction, Ugly Modernism, & Environmental Review
Dwarkesh Patel
Kenneth T. Jackson - Robert Moses, Hero of New York?
Dwarkesh Patel
Edward Glaeser - Cities, Terrorism, Housing, & Remote Work
Dwarkesh Patel
Byrne Hobart - FTX, Drugs, Twitter, Taiwan, & Monasticism
Dwarkesh Patel
Nadia Asparouhova — Tech elites, democracy, open source, & philanthropy
Dwarkesh Patel
Bethany McLean — Enron, FTX, 2008, Musk, frauds, & visionaries
Dwarkesh Patel
Holden Karnofsky — History's most important century
Dwarkesh Patel
$30m Grant to OpenAI?
Dwarkesh Patel
Does GPT Have Holden Worried?
Dwarkesh Patel
Lars Doucet — Progress, poverty, Georgism, & why rent is too damn high
Dwarkesh Patel
Deep Learning Changes Everything
Dwarkesh Patel
Garett Jones — Immigration, national IQ, & less democracy
Dwarkesh Patel
Marc Andreessen — AI, crypto, 1000 Elon Musks, regrets, vulnerabilities, & managerial revolution
Dwarkesh Patel
Why You Shouldn't Start A Startup
Dwarkesh Patel
The Future Of Venture Capital
Dwarkesh Patel
The Crucial Skill For A Startup Founder
Dwarkesh Patel
Brett Harrison — FTX US former president speaks out
Dwarkesh Patel
Nat Friedman (Github CEO) — Reading ancient scrolls, open source, & AI
Dwarkesh Patel
Ilya Sutskever (OpenAI Chief Scientist) — Why next-token prediction could surpass human intelligence
Dwarkesh Patel
Impact of Taiwan Invasion on AI
Dwarkesh Patel
Reliability is Bottleneck on AI - OpenAI Founder
Dwarkesh Patel
Next Token Prediction SOLVES AI Says OpenAI Founder
Dwarkesh Patel
Harmful Uses of GPT - OpenAI Founder
Dwarkesh Patel
Why OpenAI Founder Thinks AI Is Near
Dwarkesh Patel
AI will help us achieve enlightenment - OpenAI Founder
Dwarkesh Patel
Eliezer Yudkowsky — Why AI will kill us, aligning LLMs, nature of intelligence, SciFi, & rationality
Dwarkesh Patel
Richard Rhodes — The making of the atomic bomb
Dwarkesh Patel
More on: AI Alignment Basics
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
AgentThreatBench: The First OWASP Agentic Top 10 Security Benchmark
Dev.to · Vaishnavi Gudur
OpenAI adopts C2PA standard and Google’s SynthID to make AI-generated images easier to identify
The Next Web AI
US regulators pause bank cyber exams so Wall Street can patch Mythos vulnerabilities
The Next Web AI
The AI Failure Mode That Costs Professionals the Most (And How to Detect It)
Dev.to · Sarah Beaumont-Mercier
Chapters (11)
What do we want post-AGI world to look like?
24:25
Timelines
45:28
Evolution vs gradient descent
54:53
Misalignment and takeover
1:17:23
Is alignment dual-use?
1:31:38
Responsible scaling policies
1:58:25
Paul’s alignment research
2:35:01
Will this revolutionize theoretical CS and math?
2:46:11
How Paul invented RLHF
2:55:10
Disagreements with Carl Shulman
3:01:53
Long TSMC but not NVIDIA
🎓
Tutor Explanation
DeepCamp AI