Build Visual AI Agents

DeepLearningAI · Intermediate ·🎨 Image & Video AI ·1h ago
Learn more: https://bit.ly/43ctPTW Join our new short course, AI Agents for Image and Video Generation, built in partnership with Google and taught by Katie Nguyen, Developer Relations Engineer at Google Cloud AI, and Wafae Bakkali, Staff Generative AI Specialist at Google. Most agents you've worked with probably produce text. But whether you're building a product demo, a website asset, or an explainer video, you're working with visual media. With models like Google's Nano Banana for images and Veo for video, generating a single output from a prompt is straightforward. The harder problem is producing high-quality results consistently at scale, and the bottleneck there is evaluation: there is no single correct answer to compare against, so quality depends on context and use case. In this course, you'll learn three complementary evaluation techniques, then combine them with image and video generation to build autonomous media agents. You'll build an image agent that turns brand guidelines into UI mockups, and a video agent that plans multi-scene explainers, animates reference frames with synchronized audio, and checks consistency across scenes. In the final lesson, you'll use Gemini CLI to build a generative media agent in natural language, packaging what you've learned into reusable agent skills. In detail, you'll: - Get a clear mental model of the generative media landscape and the architectures behind image, video, and audio generation. - Engineer prompts for high-quality images and video, using techniques like LLM-enhanced prompting, reference images, and starting frames. - Build evaluation pipelines that combine SigLIP image-text similarity scores, LLM-based judges, and structured rubrics to assess output at scale. - Build an image agent that turns brand guidelines into UI mockups, generating, evaluating, and iterating until designs pass your bar. - Build a video agent that plans multi-scene explainers, generates and animates reference frames with audio, and
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Related AI Lessons

remove-ai-watermarks: una CLI borra SynthID, C2PA y el sparkle de Gemini
Learn how to remove AI watermarks from images using a Python-based CLI tool
Dev.to · lu1tr0n
I benchmarked 6 WASM image codecs in the browser. Here is what beats the server.
Benchmarking 6 WASM image codecs in the browser to find the best alternative to server-side compression
Dev.to · Convertilo
I Thought AI Image Tools Were Broken… Until I Realized My Prompts Had No Structure
Learn how to improve AI-generated images by structuring your prompts, a crucial step for reliable results
Medium · ChatGPT
I built a Stable Diffusion playground in 200 lines and zero API keys. Here's how.
Build a Stable Diffusion playground in under 200 lines of code without needing API keys, and explore AI image generation
Dev.to · Devanshu Biswas
Up next
Stanford CME296 Diffusion & Large Vision Models | Spring 2026 | Lecture 6 - Model Training
Stanford Online
Watch →