Free to audit · Opens on Coursera

Multi-modal AI

Name: Multi-modal AI
Uploaded: 2026-04-19T03:00:44.526Z
Channel: Coursera
Description: Learn to build production applications by combining visual and textual inputs with AI coding tools. You will explore multi-modal programming where scree...

Coursera · Intermediate ·💻 AI-Assisted Coding ·5h ago

Skills: Agentic Coding80%AI Pair Programming70%

Learn to build production applications by combining visual and textual inputs with AI coding tools. You will explore multi-modal programming where screenshots, images, and text serve as inputs for AI-assisted code generation, and set up development environments configured for visual AI workflows. The course covers prompt engineering with visual context to improve code generation accuracy, and hands-on development with GitHub Copilot in VS Code for inline suggestions and chat-based interactions. You will build a complete project using live reload and browser developer tools for rapid feedback between AI generation and visual output. The iterative development module teaches documentation-driven design where documentation guides AI toward desired outcomes, image-based iteration for refining generated code through visual comparison, and automated checks and validations that maintain quality through development cycles. You will learn to identify and overcome common iteration challenges including regression and context drift. The advanced module covers Model Context Protocol for connecting AI tools with external capabilities, Playwright for browser automation and visual testing, and Playwright MCP for AI-driven browser interactions that validate web applications directly. By completing this course, you will be able to convert screenshots into production code through iterative, automated, multi-modal AI workflows.

Watch on Coursera ↗ (saves to browser)