Build Hour: GPT-Realtime-2

OpenAI · Beginner ·🧠 Large Language Models ·6h ago
Build with the next wave of realtime voice AI. In this Build Hour, you’ll learn how to use GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper to build low-latency voice agents that can translate live speech, reason across tools, operate apps, and support more natural voice-to-voice and voice-to-action experiences. In this session, Teri Yu (Product) and Erika Kettleson (Solutions Engineering) will cover: • Building with new realtime audio models for translation, streaming speech-to-text, and intelligent voice agents • Using GPT-Realtime-2 capabilities like preambles, 128K context, parallel tool calling, domain understanding, context over turns, and controllable expressiveness • Creating voice-powered workflows for shopping and product analytics dashboards • Customer Spotlight on how Sierra (https://sierra.ai/) is designing production customer experience agents with guardrails, VAD tuning, tracing, redaction, evals, and customer-specific harnesses. 👉 Realtime Voice Blog: https://openai.com/index/advancing-voice-intelligence-with-new-models-in-the-api/ 👉 Voice Agents Docs: https://developers.openai.com/api/docs/guides/voice-agents 👉 Playground: https://platform.openai.com/audio/realtime 👉 Follow along with the code repo: http://github.com/openai/build-hours 👉 Sign up for upcoming live Build Hours: https://webinar.openai.com/buildhours 00:00 Welcome and intro 02:06 Realtime voice models overview 02:26 GPT-Realtime-Translate and GPT-Realtime-Whisper demo 04:36 GPT-Realtime-2: three ways to build with voice AI 05:14 What’s new in GPT-Realtime-2 06:58 Demo: Voice-powered search agent 12:32 Demo: Product analytics dashboard 17:24 What can you build with voice AI? 18:36 Customer spotlight: Sierra 29:56 Q&A 42:05 Resources & Upcoming Build Hours
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Related AI Lessons

Revealing Interpretable Failure Modes of VLMs
Learn to identify and interpret failure modes in Vision-Language Models (VLMs) for safer applications
ArXiv cs.AI
Learning Transferable Latent User Preferences for Human-Aligned Decision Making
Learn how to align large language models with human preferences for better decision making
ArXiv cs.AI
CHAL: Council of Hierarchical Agentic Language
Learn how CHAL, a council of hierarchical agentic language, improves LLM reasoning on ground-truth tasks by addressing structural limitations of current multi-agent debate methodologies
ArXiv cs.AI
PROMETHEUS: Automating Deep Causal Research Integrating Text, Data and Models
Learn how PROMETHEUS automates deep causal research by integrating text, data, and models to create navigable world models
ArXiv cs.AI

Chapters (11)

Welcome and intro
2:06 Realtime voice models overview
2:26 GPT-Realtime-Translate and GPT-Realtime-Whisper demo
4:36 GPT-Realtime-2: three ways to build with voice AI
5:14 What’s new in GPT-Realtime-2
6:58 Demo: Voice-powered search agent
12:32 Demo: Product analytics dashboard
17:24 What can you build with voice AI?
18:36 Customer spotlight: Sierra
29:56 Q&A
42:05 Resources & Upcoming Build Hours
Up next
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Dave Ebbelaar (LLM Eng)
Watch →