End-to-End Multimodal AI: Fine-Tuning, Fusion, and MLOps

Coursera Courses ↗ · Coursera

Open Course on Coursera

Free to audit · Opens on Coursera

End-to-End Multimodal AI: Fine-Tuning, Fusion, and MLOps

Coursera · Advanced ·🏗️ Systems Design & Architecture ·1mo ago
Build production-ready multimodal AI systems that combine vision, language, and audio into unified intelligent applications. This course takes you through the full lifecycle of multimodal model development — from constructing and fine-tuning transformer-based architectures using PyTorch and TensorFlow, to diagnosing training failures, designing cross-modal retrieval systems, and deploying secure, monitored inference APIs. You will work with real-world tools including CLIP, ViT, FAISS, FastAPI, MLflow, and Ray Tune to build systems that process and integrate multiple data types simultaneously. You will analyze computational complexity to optimize fusion algorithms, evaluate model errors to identify failure patterns, and translate model outputs into stakeholder-ready business insights. This course is built for intermediate practitioners in machine learning and AI who want to move beyond single-modality models and into the cutting edge of AI systems design. By the end, you will have a portfolio of deployable, optimized multimodal systems that demonstrate advanced engineering capability to employers.
Watch on Coursera ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Related AI Lessons

Up next
Optimizing and Managing Windows 365 Cloud PCs
Coursera
Watch →