Building Multimodal Data Pipelines

External: Coursera Courses ↗ · Coursera

Open Course on External: Coursera

Free to audit · Opens on External: Coursera

Building Multimodal Data Pipelines

Coursera · Beginner ·🔍 RAG & Vector Search ·1mo ago

Key Takeaways

Teaches building multimodal data pipelines using ASR and image processing

Original Description

Images, audio, and video make up a growing share of the data companies generate today, but most pipelines are still built for structured data alone. This course teaches you to build AI-powered pipelines that process multimodal data and turn it into LLM-ready text. You’ll start with the foundations: using ASR to extract transcripts from audio and turning images into LLM-ready text descriptions. From there, you’ll see how Vision Language Models generate descriptions from video segments, capturing not just what’s visible in a single frame, but what unfolds across a scene over time. You’ll then apply these skills to implement a multimodal RAG pipeline that searches across slides, audio, and video from meetings to answer questions about their content. By combining all three modalities, you give LLMs the rich context they need to deliver detailed answers across complex, real-world content. In detail, you’ll: Survey the multimodal data landscape, the unique challenges each data type presents, and the techniques that transform unstructured content into searchable text. Apply OCR and ASR to convert images and audio into structured text, then embed them into a unified vector space for cross-modal semantic search. Prompt Vision Language Models effectively, and choose the right frame sampling and embedding strategy for video. Run a Vision Language Model on meeting videos to generate timestamped segment descriptions, then embed them alongside audio and slides for unified semantic, and time-based search. Build a multimodal RAG system that retrieves across audio, slides, and video to generate grounded, cited answers from meeting recordings. Every technique you’ll learn serves the same goal data engineers have always had: take messy, unstructured data and turn it into something you can query, analyze, and build on.
Watch on External: Coursera ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Related AI Lessons

What Is RAG? The AI Technology That Makes ChatGPT Smarter Without Retraining
Learn about RAG, the AI technology that enhances ChatGPT's capabilities without requiring retraining, and why it matters for advancing language models
Medium · RAG
Understanding the Limits of Linear RAG — and Why Agentic Workflows Are Catching On
Learn the limitations of linear RAG pipelines and how agentic workflows are becoming a popular alternative for more efficient and effective AI workflows
Medium · AI
Understanding the Limits of Linear RAG — and Why Agentic Workflows Are Catching On
Learn why linear RAG pipelines have limitations and how Agentic workflows are becoming a preferred alternative in the industry
Medium · Machine Learning
Understanding the Limits of Linear RAG — and Why Agentic Workflows Are Catching On
Learn why linear RAG pipelines have limitations and how Agentic workflows are becoming a preferred alternative in the industry
Medium · Data Science
Up next
RRF vs DBSF with Qdrant: Hybrid Retrieval Fusion for RAG in Python
Professor Py: AI Engineering
Watch →