External: Coursera Courses ↗ · Coursera

Open Course on External: Coursera

Free to audit · Opens on External: Coursera

Unify Modalities: Cross-Modal Retrieval

Coursera · Intermediate ·🧠 Large Language Models ·3mo ago

Skills: Multimodal LLMs90%

Key Takeaways

Builds cross-modal retrieval systems that bridge the gap between text and images using approximate nearest-neighbor search algorithms and attention mechanisms

Original Description

Transform how AI systems understand and connect different data modalities. This course empowers machine learning professionals to build cutting-edge cross-modal retrieval systems that bridge the gap between text and images. You'll master the technical implementation of approximate nearest-neighbor search algorithms and design sophisticated attention mechanisms that fuse visual and textual information. Through hands-on work with production-scale tools like FAISS and real datasets like Flickr30K, you'll develop the expertise to create intelligent systems that understand content across modalities—enabling breakthrough applications in search, recommendation, and content understanding that mirror how humans naturally process diverse information types.

Watch on External: Coursera ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

More on: Multimodal LLMs

View skill →

Google Veo 3 Tutorial: How to create AI Videos in Flow, Gemini or Google Vids?

Google Veo 3 Tutorial: How to create AI Videos in Flow, Gemini or Google Vids?

AI Tool Journey

NVIDIA Clara Guardian Virtual Patient Assistant

NVIDIA Clara Guardian Virtual Patient Assistant

NVIDIA Developer

Building Multimodal Search and RAG

Building Multimodal Search and RAG

Midjourney Trick: Consistent Character in Different Images

Midjourney Trick: Consistent Character in Different Images

Ollama Multimodal: EASILY setup Llava locally & Integrate API

Ollama Multimodal: EASILY setup Llava locally & Integrate API

The ONLY Real Time Speech AI that can run locally!!!

The ONLY Real Time Speech AI that can run locally!!!

Related Reads

Building Production-Grade LLM Evaluation Pipelines: From Vibes to Metrics

Learn to build production-grade LLM evaluation pipelines to catch hallucinations before deployment

Demystifying LLM Tokenizers: Building a Client-Side Token and API Cost Calculator

Learn to build a client-side token and API cost calculator to optimize LLM usage and reduce costs

When the Google Recap and the Juejin Picking Roundup Disagree on What Counts as AI

Learn how Google and Juejin disagree on popular AI applications, and why it matters for AI development

Build, Observe, Fix: A LangChain Agent Walkthrough

Learn to build and fix a LangChain agent to improve its accuracy and reliability

Medium · Python

5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems

Dave Ebbelaar (LLM Eng)