Free to audit · Opens on Coursera

End-to-End Multimodal AI: Fine-Tuning, Fusion, and MLOps

Name: End-to-End Multimodal AI: Fine-Tuning, Fusion, and MLOps
Uploaded: 2026-03-30T13:58:40.813Z
Channel: Coursera
Description: Build production-ready multimodal AI systems that combine vision, language, and audio into unified intelligent applications. This course takes you throu...

Coursera · Advanced ·🏗️ Systems Design & Architecture ·1mo ago

Skills: Multimodal LLMs90%ML Pipelines80%

Build production-ready multimodal AI systems that combine vision, language, and audio into unified intelligent applications. This course takes you through the full lifecycle of multimodal model development — from constructing and fine-tuning transformer-based architectures using PyTorch and TensorFlow, to diagnosing training failures, designing cross-modal retrieval systems, and deploying secure, monitored inference APIs. You will work with real-world tools including CLIP, ViT, FAISS, FastAPI, MLflow, and Ray Tune to build systems that process and integrate multiple data types simultaneously. You will analyze computational complexity to optimize fusion algorithms, evaluate model errors to identify failure patterns, and translate model outputs into stakeholder-ready business insights. This course is built for intermediate practitioners in machine learning and AI who want to move beyond single-modality models and into the cutting edge of AI systems design. By the end, you will have a portfolio of deployable, optimized multimodal systems that demonstrate advanced engineering capability to employers.

Watch on Coursera ↗ (saves to browser)