Build Multimodal Generative AI Applications

External: Coursera Courses ↗ · Coursera

Open Course on External: Coursera

Free to audit · Opens on External: Coursera

Build Multimodal Generative AI Applications

Coursera · Intermediate ·🎨 Image & Video AI ·3mo ago

Skills: Multimodal LLMs90%Image Generation Basics80%

Key Takeaways

Builds multimodal generative AI applications using language, images, and speech

Original Description

Ready to level up your GenAI skills? Step into the exciting world of multimodal AI, where language, images, and speech come together to build smarter, more interactive applications. In this hands-on course, you’ll learn how to build systems that work across multiple modalities, from creating AI-powered storytellers and meeting assistants to developing image captioning tools and video generation apps. You’ll gain experience with real-world tools like IBM’s Granite, OpenAI’s Whisper, Sora and DALL·E, Meta’s Llama, Mistral’s Mixtral, and Gradio. Plus, you'll explore multimodal search, question answering, and retrieval systems that combine text, speech, and visual data. By the end of the course, you’ll be able to design and build full-stack multimodal AI solutions using Python and frameworks like Flask and Gradio. If you’re looking to gain in-demand skills for building the next generation of AI applications, enroll today and power up your AI career!

Watch on External: Coursera ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

More on: Multimodal LLMs

View skill →

Google Veo 3 Tutorial: How to create AI Videos in Flow, Gemini or Google Vids?

Google Veo 3 Tutorial: How to create AI Videos in Flow, Gemini or Google Vids?

AI Tool Journey

NVIDIA Clara Guardian Virtual Patient Assistant

NVIDIA Clara Guardian Virtual Patient Assistant

NVIDIA Developer

Building Multimodal Search and RAG

Building Multimodal Search and RAG

Midjourney Trick: Consistent Character in Different Images

Midjourney Trick: Consistent Character in Different Images

Ollama Multimodal: EASILY setup Llava locally & Integrate API

Ollama Multimodal: EASILY setup Llava locally & Integrate API

The ONLY Real Time Speech AI that can run locally!!!

The ONLY Real Time Speech AI that can run locally!!!

Related Reads

How I Built an AI Pet Portrait Generator That Turns Photos Into Art

Learn how to build an AI pet portrait generator that turns photos into art using deep learning techniques and Python libraries

Dev.to · William Li

I Put Google’s Squoosh Codecs in the Browser — and Cut My Image Bill Before the Upload Even…

Learn how to use Google's Squoosh codecs in the browser to compress images before upload and reduce costs

Medium · Programming

I Put Google’s Squoosh Codecs in the Browser — and Cut My Image Bill Before the Upload Even…

Learn how to use Google's Squoosh codecs in the browser to compress images before upload, reducing costs and improving performance

Medium · JavaScript

Which AI tools can generate ready-to-use 3D character models for games, animation, or 3D printing?

Learn about AI tools that generate ready-to-use 3D character models for games, animation, or 3D printing and how to evaluate them for different use cases

Reddit r/artificial

Short.ai Review 2025: Turn Long Videos into Viral Shorts in One Click! 🔥