Stream & Unify Data Schemas with CDC

External: Coursera Courses ↗ · Coursera

Open Course on External: Coursera

Free to audit · Opens on External: Coursera

Stream & Unify Data Schemas with CDC

Coursera · Intermediate ·🔄 Data Engineering ·3mo ago

Skills: Data Warehousing70%Workflow Orchestration60%

Key Takeaways

Builds a CDC pipeline to stream and unify data schemas

Original Description

Imagine deploying schema changes with confidence—knowing your pipeline will handle them gracefully, consumers will stay healthy, and your data will stay consistent. That's the difference between hoping your CDC pipeline works and knowing it will. In this course you will learn how to build a working, vendor‑neutral CDC pipeline and a single, unified table from evolving source schemas. Starting with Debezium streaming changes from Postgres/MySQL into Kafka, you will use Schema Registry to enforce compatibility, then apply streaming SQL in Flink (or ksqlDB) to map, cast, and merge divergent fields into a canonical model. Finally, you will persist results to an Apache Iceberg table and query it instantly with Trino. Along the way, you’ll learn practical strategies to manage schema drift, choose compatibility modes (backward/full), and avoid breaking downstream consumers. Everything runs locally with Docker so you can reproduce it anywhere and take the same patterns to your cloud stack later. This course is designed for engineers working with Kafka, Debezium, and streaming SQL who need reliable schema evolution and canonical modeling skills. Learners should be familiar with Basic SQL, Docker, and familiarity with Kafka or streaming concepts. By the end of the course,you will be able to implement a small end‑to‑end CDC pipeline that streams from a source DB and unifies evolving schemas into a single queryable table.

Watch on External: Coursera ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

More on: Data Warehousing

View skill →

Build A Data Warehouse in Azure

Building Data Lakes and Lakehouses with Microsoft Fabric

Building Data Lakes and Lakehouses with Microsoft Fabric

Microsoft Azure - Data Lake

Microsoft Azure - Data Lake

Star Schemas & Track Changes

Build a Data Warehouse in AWS

Build a Data Warehouse in AWS

Data Management with Databricks: Big Data with Delta Lakes

Data Management with Databricks: Big Data with Delta Lakes

Related Reads

I Built My Second ETL Pipeline. This Time, I Started Thinking Like a Data Engineer

Learn how to build a production-ready ETL pipeline with Python, Docker, PostgreSQL, and Kestra by thinking like a data engineer

Towards Data Science

JuiceFS Sync for PB-Scale Data Transfers: Resumable Sync, Encryption, and Bandwidth Control

Learn how to efficiently transfer large volumes of data using JuiceFS Sync, which offers resumable sync, encryption, and bandwidth control, ideal for PB-scale data transfers.

How Airflow is using AI to make data engineering more resilient, not more complex

Airflow uses AI to make data engineering more resilient by detecting data drift, resuming failed pipelines, and fixing issues automatically, reducing complexity and improving reliability.

What Can We Do When Memory Becomes the New Bottleneck in Data Engineering?

Learn how to overcome memory bottlenecks in data engineering using Pandas chunking, Dask, and Polars, and why it matters for processing large datasets

Towards Data Science

A Moment Frozen in Time | Arnav Iyengar | TEDxJenks Youth