The Future of Voice AI is Here: Real-Time Cloning, On-Device & Live Translation (Gradium CEO)

The MAD Podcast with Matt Turck · Intermediate ·🧠 Large Language Models ·2w ago
Current voice AI is too slow and expensive for interactive applications like gaming and robotics. Enter Gradium, a commercial spin-off from the Kyutai AI lab. In this demo, Neil Zeghidour showcases their real-time voice infrastructure. Watch their killer features in action: a high-fidelity text-to-speech model running entirely on a CPU, interactive voice agents that maintain natural conversation flow, and real-time speech translation with voice cloning. They even demonstrate restoring the voice of ALS patient Olivier Goy 00:31 - The backstory: A commercial spin-off from Kyutai Labs. 01:16 - The shift from offline to interactive voice in gaming and live streams. 03:12 - Live demo: AI-generated personalized esports commentary. 04:33 - Restoring the voice of ALS patient Olivier Goy. 05:16 - Creating real-time personalized videos. 06:41 - Running a 100M parameter text-to-speech model locally on a CPU. 08:41 - Building interactive voice agents that use function calling. 11:47 - Hibiki: Real-time, on-device speech-to-speech translation. Gradium Website - @ X/Twitter - @AI HOSTED BY: FirstMark Capital Website - @ X/Twitter - @rkCap Matt Turck (Managing Director) Blog - @ LinkedIn - @k/ X/Twitter - @ck This session was recorded live at a recent Data Driven NYC, our in-person, monthly event series. If you are ever in New York, you can join the upcoming events by following FirstMark on Luma: @rkcap Check out the MAD Podcast: Spotify - @LATDSaFvgJG80ACcRJtq Apple - https://podcasts.apple.com/us/podcast/the-mad-podcast-with-matt-turck/id1686238724
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Related AI Lessons

Gemma 4 + LiteRTLM 0.11.0: Finally, On-Device AI Feels Fast (and Stable) on Qualcomm Devices
Learn how Gemma 4 and LiteRTLM 0.11.0 enable fast and stable on-device AI on Qualcomm devices, revolutionizing the user experience
Medium · LLM
What's new in Prompt Optimizer: latest features and improvements
Learn how to optimize prompts with the latest features and improvements in Prompt Optimizer, a crucial tool for effective LLM interactions
Dev.to AI
AI vs LLM vs AI Agents vs Automation — What’s the Real Difference?
Understand the differences between AI, LLM, AI Agents, and Automation to clarify their roles in technology
Dev.to AI
Why Python Became the Default Language for AI?
Discover why Python became the go-to language for AI despite not being the fastest or most powerful, and how its versatility and ease of use contributed to its dominance
Dev.to · Sanket Parmar

Chapters (8)

0:31 The backstory: A commercial spin-off from Kyutai Labs.
1:16 The shift from offline to interactive voice in gaming and live streams.
3:12 Live demo: AI-generated personalized esports commentary.
4:33 Restoring the voice of ALS patient Olivier Goy.
5:16 Creating real-time personalized videos.
6:41 Running a 100M parameter text-to-speech model locally on a CPU.
8:41 Building interactive voice agents that use function calling.
11:47 Hibiki: Real-time, on-device speech-to-speech translation.
Up next
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Dave Ebbelaar (LLM Eng)
Watch →