Modern AI Models for Vision and Multimodal Understanding

Coursera Courses ↗ · Coursera

Open Course on Coursera

Free to audit · Opens on Coursera

Modern AI Models for Vision and Multimodal Understanding

Coursera · Advanced ·👁️ Computer Vision ·1mo ago
Step into the frontier of artificial intelligence with this advanced course designed to explore the latest models powering visual and multimodal intelligence. From foundational mathematical tools to state-of-the-art architectures, you'll gain the skills to understand and build systems that interpret images, text, and more—just like today’s leading AI models. You'll begin by discovering how Nonlinear Support Vector Machines (NSVMs) and Fourier transforms lay the groundwork for signal processing and pattern recognition in visual data. You'll then build a strong foundation in probabilistic reasoning and temporal modeling with RNNs, enabling AI systems to understand sequences and context. After, you'll learn how transformer architectures revolutionize both language and vision tasks. Finally, you'll dive into multimodal learning with CLIP, which connects images and text, and explore diffusion models that generate high-fidelity images through iterative refinement. This course is ideal for learners who want to go beyond traditional deep learning and explore the models shaping the future of AI. With a blend of theory, code, and real-world applications, you'll be equipped to tackle cutting-edge challenges in computer vision and multimodal AI. This course can be taken for academic credit as part of CU Boulder’s MS in Data Science or MS in Computer Science degrees offered on the Coursera platform. These fully accredited graduate degrees offer targeted courses, short 8-week sessions, and pay-as-you-go tuition. Admission is based on performance in three preliminary courses, not academic history. CU degrees on Coursera are ideal for recent graduates or working professionals. Learn more: MS in Data Science: https://www.coursera.org/degrees/master-of-science-data-science-boulder MS in Computer Science: https://coursera.org/degrees/ms-computer-science-boulder
Watch on Coursera ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Related AI Lessons

Inside SAM 3D: how Meta turns a single image into 3D
Learn how Meta's SAM 3D technology turns a single image into 3D, revolutionizing the field of computer vision
Medium · Machine Learning
Inside SAM 3D: how Meta turns a single image into 3D
Learn how Meta's SAM 3D technology generates 3D models from single images, revolutionizing the field of computer vision
Medium · Deep Learning
Demystifying CNNs: How Convolutional Filters and Max-Pooling Actually Work
Learn how Convolutional Neural Networks (CNNs) use convolutional filters and max-pooling to recognize images
Medium · Data Science
Your "Biometric Age Check" Isn't Verifying Identity — And Defense Lawyers Know It
Biometric age checks don't verify identity, a crucial distinction for developers in computer vision and biometrics
Dev.to AI
Up next
Best Mac Mini Alternatives for Running OpenClaw 24/7 in 2026
Tin Rovic
Watch →