Build Hour: GPT-Realtime-2

OpenAI · Beginner ·🧠 Large Language Models ·6h ago

Skills: Multimodal LLMs80%

Build with the next wave of realtime voice AI. In this Build Hour, you’ll learn how to use GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper to build low-latency voice agents that can translate live speech, reason across tools, operate apps, and support more natural voice-to-voice and voice-to-action experiences. In this session, Teri Yu (Product) and Erika Kettleson (Solutions Engineering) will cover: • Building with new realtime audio models for translation, streaming speech-to-text, and intelligent voice agents • Using GPT-Realtime-2 capabilities like preambles, 128K context, parallel tool calling, domain understanding, context over turns, and controllable expressiveness • Creating voice-powered workflows for shopping and product analytics dashboards • Customer Spotlight on how Sierra (https://sierra.ai/) is designing production customer experience agents with guardrails, VAD tuning, tracing, redaction, evals, and customer-specific harnesses. 👉 Realtime Voice Blog: https://openai.com/index/advancing-voice-intelligence-with-new-models-in-the-api/ 👉 Voice Agents Docs: https://developers.openai.com/api/docs/guides/voice-agents 👉 Playground: https://platform.openai.com/audio/realtime 👉 Follow along with the code repo: http://github.com/openai/build-hours 👉 Sign up for upcoming live Build Hours: https://webinar.openai.com/buildhours 00:00 Welcome and intro 02:06 Realtime voice models overview 02:26 GPT-Realtime-Translate and GPT-Realtime-Whisper demo 04:36 GPT-Realtime-2: three ways to build with voice AI 05:14 What’s new in GPT-Realtime-2 06:58 Demo: Voice-powered search agent 12:32 Demo: Product analytics dashboard 17:24 What can you build with voice AI? 18:36 Customer spotlight: Sierra 29:56 Q&A 42:05 Resources & Upcoming Build Hours

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

More on: Multimodal LLMs

View skill →

INSTALL NEW UNCENSORED FaceGen Ai WebUI LOCALLY in 1 CLICK!

INSTALL NEW UNCENSORED FaceGen Ai WebUI LOCALLY in 1 CLICK!

Google Veo 3 Tutorial: How to create AI Videos in Flow, Gemini or Google Vids?

Google Veo 3 Tutorial: How to create AI Videos in Flow, Gemini or Google Vids?

AI Tool Journey

NVIDIA Clara Guardian Virtual Patient Assistant

NVIDIA Clara Guardian Virtual Patient Assistant

NVIDIA Developer

Building Multimodal Search and RAG

Building Multimodal Search and RAG

Midjourney Trick: Consistent Character in Different Images

Midjourney Trick: Consistent Character in Different Images

Ollama Multimodal: EASILY setup Llava locally & Integrate API

Ollama Multimodal: EASILY setup Llava locally & Integrate API

Related AI Lessons

Revealing Interpretable Failure Modes of VLMs

Learn to identify and interpret failure modes in Vision-Language Models (VLMs) for safer applications

Learning Transferable Latent User Preferences for Human-Aligned Decision Making

Learn how to align large language models with human preferences for better decision making

CHAL: Council of Hierarchical Agentic Language

Learn how CHAL, a council of hierarchical agentic language, improves LLM reasoning on ground-truth tasks by addressing structural limitations of current multi-agent debate methodologies

PROMETHEUS: Automating Deep Causal Research Integrating Text, Data and Models

Learn how PROMETHEUS automates deep causal research by integrating text, data, and models to create navigable world models

Chapters (11)

Welcome and intro

2:06 Realtime voice models overview

2:26 GPT-Realtime-Translate and GPT-Realtime-Whisper demo

4:36 GPT-Realtime-2: three ways to build with voice AI

5:14 What’s new in GPT-Realtime-2

6:58 Demo: Voice-powered search agent

12:32 Demo: Product analytics dashboard

17:24 What can you build with voice AI?

18:36 Customer spotlight: Sierra

29:56 Q&A

42:05 Resources & Upcoming Build Hours

5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems

Dave Ebbelaar (LLM Eng)