Let's go Bananas with GenMedia — Guillaume Vernade, Google DeepMind

Name: Let's go Bananas with GenMedia — Guillaume Vernade, Google DeepMind
Uploaded: 2026-05-18T15:00:06Z
Channel: AI Engineer
Description: Guillaume Vernade from Google DeepMind takes a public domain book and runs it through the full gen media stack live. Gemini reads the whole text and wri...

AI Engineer · Intermediate ·🧠 Large Language Models ·1h ago

Skills: Multimodal LLMs90%

Guillaume Vernade from Google DeepMind takes a public domain book and runs it through the full gen media stack live. Gemini reads the whole text and writes image prompts for each character and chapter. Imagen generates the portraits. Veo animates them into video clips using those images as first frames. Lyria composes a different piece of music per chapter, with or without lyrics. The TTS model reads dialogue from the book using a trick that makes two voices sound like four distinct characters. The interesting layer underneath all of it is that Gemini acts as the prompt engineer for every other model, and it works well partly because the gen media models were trained on prompts written by Gemini. The workshop also covers the Lyria Realtime model, which generates music continuously and responds to new prompts mid-stream like a DJ, and a new interactions API that makes chained multi-turn calls cheaper by caching context server-side instead of resending the full book on every turn. Speaker info: - https://x.com/Giom_V - https://www.linkedin.com/in/guillaumevernade - https://github.com/Giom-V

Watch on YouTube ↗ (saves to browser)