Scaling interpretability

Anthropic · Advanced ·🛡️ AI Safety & Ethics ·2y ago

Key Takeaways

Scaling interpretability is taught with a focus on scientific and engineering progress, and technical challenges in scaling interpretability

Original Description

Science and engineering are inseparable. Our researchers reflect on the close relationship between scientific and engineering progress, and discuss the technical challenges they encountered in scaling our interpretability research to much larger AI models. Read more: https://anthropic.com/research/engineering-challenges-interpretability

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Playlist UUrDwWp7EBBv4NwvScIpBDOA · Anthropic · 18 of 60

← Previous Next →

Quick tips for Claude: Long context file uploads

Quick tips for Claude: Long context file uploads

Inside our first Anthropic Hackathon, San Francisco

Inside our first Anthropic Hackathon, San Francisco

Long inputs, multi-step output with Claude

Long inputs, multi-step output with Claude

Coding with Claude

Coding with Claude

Behind the prompt: Prompting tips for Claude.ai

Behind the prompt: Prompting tips for Claude.ai

Robin AI, powered by Claude

Robin AI, powered by Claude

Claude 3 Opus as an economic analyst

Claude 3 Opus as an economic analyst

Claude 3 Sonnet as a language learning partner

Claude 3 Sonnet as a language learning partner

Claude 3 Haiku turns thousands of physical documents into structured data

Claude 3 Haiku turns thousands of physical documents into structured data

Claude 3 Haiku for instant customer service

Claude 3 Haiku for instant customer service

Claude 3 Haiku for fast document analysis

Claude 3 Haiku for fast document analysis

Tool use with the Claude 3 model family

Tool use with the Claude 3 model family

Coming soon to the Team plan on Claude.ai

Coming soon to the Team plan on Claude.ai

Introducing the Claude iOS app

Introducing the Claude iOS app

Claude is now available in Europe

Claude is now available in Europe

What is interpretability?

What is interpretability?

What should an AI's personality be?

What should an AI's personality be?

Scaling interpretability

Scaling interpretability

Claude 3.5 Sonnet for sparking creativity

Claude 3.5 Sonnet for sparking creativity

Claude 3.5 Sonnet for vision

Claude 3.5 Sonnet for vision

Claude 3.5 Sonnet as a writing partner

Claude 3.5 Sonnet as a writing partner

Claude 3.5 Sonnet for agentic coding

Claude 3.5 Sonnet for agentic coding

Shareable Projects in Claude

Shareable Projects in Claude

Evaluate prompts in the Anthropic Console

Evaluate prompts in the Anthropic Console

Shareable Artifacts in Claude

Shareable Artifacts in Claude

How we built Artifacts with Claude

How we built Artifacts with Claude

Wedia advances digital asset management with Claude

Wedia advances digital asset management with Claude

AI prompt engineering: A deep dive

AI prompt engineering: A deep dive

AI Prompt Engineering 101: Explained

AI Prompt Engineering 101: Explained

Ancient Wisdom, Modern AI?

Ancient Wisdom, Modern AI?

AI's Greatest Challenge: You?

AI's Greatest Challenge: You?

AI Prompts That Drive Growth

AI Prompts That Drive Growth

Tips For Better Results With AI

Tips For Better Results With AI

AI, policy, and the weird sci-fi future with Anthropic’s Jack Clark

AI, policy, and the weird sci-fi future with Anthropic’s Jack Clark

European Parliament expands access to their archives with Claude in Amazon Bedrock

European Parliament expands access to their archives with Claude in Amazon Bedrock

Claude | Computer use for automating operations

Claude | Computer use for automating operations

Claude | Computer use for orchestrating tasks

Claude | Computer use for orchestrating tasks

Claude | Computer use for coding

Claude | Computer use for coding

Asana supercharges work management with Claude

Asana supercharges work management with Claude

What do people use AI models for?

What do people use AI models for?

Alignment faking in large language models

Alignment faking in large language models

Building Anthropic | A conversation with our co-founders

Building Anthropic | A conversation with our co-founders

How difficult is AI alignment? | Anthropic Research Salon

How difficult is AI alignment? | Anthropic Research Salon

Tips for building AI agents

Tips for building AI agents

Claude 3.7 Sonnet with extended thinking

Claude 3.7 Sonnet with extended thinking

Introducing Claude Code

Introducing Claude Code

Advice For Building AI Agents

Advice For Building AI Agents

The Two Most Useful Applications of AI Agents

The Two Most Useful Applications of AI Agents

Defending against AI jailbreaks

Defending against AI jailbreaks

The Most Common Mistake People Make When Building AI Agents

The Most Common Mistake People Make When Building AI Agents

Controlling powerful AI

Controlling powerful AI

How Intercom is redefining customer support with Claude

How Intercom is redefining customer support with Claude

Tracing the thoughts of a large language model

Tracing the thoughts of a large language model

Introducing Claude for Education

Introducing Claude for Education

Could AI models be conscious?

Could AI models be conscious?

Lessons on AI agents from Claude Plays Pokemon

Lessons on AI agents from Claude Plays Pokemon

The Societal Impacts of AI

The Societal Impacts of AI

What Does AI Mean for the Future of Work?

What Does AI Mean for the Future of Work?

Understanding AI Agents...Through Pokémon

Understanding AI Agents...Through Pokémon

What Pokémon Teaches Us About Building With AI

What Pokémon Teaches Us About Building With AI

Related AI Lessons

GuardFall: When Decades-Old Shell Injection Tricks Beat Modern AI Safety Guardrails

Decades-old shell injection tricks can bypass modern AI safety guardrails, highlighting the need for more robust security measures

Dev.to · Cor E

What 116 court judgments taught me about the limits of AI

Learn about the limitations of AI in professional settings through an analysis of 116 court judgments and a personal project using consumer AI tools

Your ChatGPT History Is a Liability. I Fixed That With a $80 Chip and a Pi5.

Protect your ChatGPT history from being used as evidence with a simple hardware solution using a $80 chip and a Pi5

Your Skepticism About AI Is an Asset. Here’s How to Use It.

Learn to leverage skepticism about AI to improve its adoption and implementation in your team and organization, and why it matters for responsible AI development

Medium · Programming

Containers Don't Make Your AI Agent Safe

Web Dev Simplified