Data

Data Engineering

ETL pipelines, data warehousing, streaming, orchestration and lakehouse architecture

55
lessons
Skills in this topic
View full skill map →
ETL Basics
beginner
Write a Python ETL pipeline with pandas
Workflow Orchestration
intermediate
Build a DAG in Airflow with sensors and operators
Streaming Data
intermediate
Produce and consume Kafka topics with Python
Data Warehousing
intermediate
Model a star schema with dbt
Lakehouse Architecture
advanced
Manage ACID transactions on a data lake with Delta Lake
All Reads (23) Articles (11)Blog Posts (7)Tutorials (5)
How I Broke Down My ETL Pipeline Project Into Smaller Engineering Exercises
Dev.to · Tanmay 🔄 Data Engineering ⚡ AI Lesson 4d ago
How I Broke Down My ETL Pipeline Project Into Smaller Engineering Exercises
Recently, I started building an ETL pipeline project to better understand how modern data systems...
Reddit r/learnprogramming 🔄 Data Engineering ⚡ AI Lesson 1w ago
I’m looking for advice from people who have handled very large Excel/CSV imports in production systems.
Current requirement from my client: Upload 3 Excel sheets One sheet contains 150k+ rows Another contains 40k+ rows Data needs to be inserted into multiple relat
Data Engineering Terminology Made Easy for Data Scientists, ML Engineers, and Anyone Who’s Ever…
Medium · Machine Learning 🔄 Data Engineering ⚡ AI Lesson 2w ago
Data Engineering Terminology Made Easy for Data Scientists, ML Engineers, and Anyone Who’s Ever…
Nobody Told You What Any of This Means in simple language — Until Now Continue reading on Medium »
From Experimental Notebooks to Production: A Data Engineer’s perspective of Scaling Data Science…
Medium · Machine Learning 🔄 Data Engineering ⚡ AI Lesson 3w ago
From Experimental Notebooks to Production: A Data Engineer’s perspective of Scaling Data Science…
Originally published at harishkesavarao.github.io Continue reading on Medium »
Why We Let AI Design Our ETL Pipelines — but Never Run Them
Medium · AI 🔄 Data Engineering ⚡ AI Lesson 3w ago
Why We Let AI Design Our ETL Pipelines — but Never Run Them
ETL systems are uncompromisingly literal — and that is precisely why they age poorly. Continue reading on Medium »
What I Learnt Implementing a Medallion Architecture from Scratch on Databricks Using Washington…
Medium · Machine Learning 🔄 Data Engineering ⚡ AI Lesson 3w ago
What I Learnt Implementing a Medallion Architecture from Scratch on Databricks Using Washington…
9 hard-won lessons from a real end-to-end lakehouse build, EVLytics Continue reading on Towards Data Engineering »
What I Learnt Implementing a Medallion Architecture from Scratch on Databricks Using Washington…
Medium · Data Science 🔄 Data Engineering ⚡ AI Lesson 3w ago
What I Learnt Implementing a Medallion Architecture from Scratch on Databricks Using Washington…
9 hard-won lessons from a real end-to-end lakehouse build, EVLytics Continue reading on Towards Data Engineering »
LinkedIn Data Engineering Interview Questions: Full Prep Guide
Dev.to · Gowtham Potureddi 🔄 Data Engineering ⚡ AI Lesson 3w ago
LinkedIn Data Engineering Interview Questions: Full Prep Guide
LinkedIn data engineering interview questions lean toward trust-heavy modeling: member-centric...
ETL vs. ELT: Which Approach Should You Use and Why?
Dev.to · Gathuru_M 🔄 Data Engineering ⚡ AI Lesson 3w ago
ETL vs. ELT: Which Approach Should You Use and Why?
1. Introduction Understanding a company's data architecture can feel overwhelming, but...
Containerizing Apache Airflow: Building Portable Data Pipelines with Docker
Dev.to · peter muriya 🔄 Data Engineering ⚡ AI Lesson 1mo ago
Containerizing Apache Airflow: Building Portable Data Pipelines with Docker
Apache Airflow is one of the most widely used orchestration tools in data engineering. It enables...
The Complete Framework to Design ETL Pipelines in Interviews
Medium · Data Science 🔄 Data Engineering ⚡ AI Lesson 1mo ago
The Complete Framework to Design ETL Pipelines in Interviews
A Decision-Tree Approach to Cracking Senior Data Engineering System Design Rounds Continue reading on Towards Data Engineering »
The Complete Framework to Design ETL Pipelines in Interviews
Medium · Programming 🔄 Data Engineering ⚡ AI Lesson 1mo ago
The Complete Framework to Design ETL Pipelines in Interviews
A Decision-Tree Approach to Cracking Senior Data Engineering System Design Rounds Continue reading on Towards Data Engineering »
The Hidden Complexity of Data Engineering in Regulated Industries (And What It Taught Me About…
Medium · Python 🔄 Data Engineering ⚡ AI Lesson 1mo ago
The Hidden Complexity of Data Engineering in Regulated Industries (And What It Taught Me About…
Strict data formats and compliance requirements teach you more about clean software design than any course. Here is what working in a… Continue reading on Level
Building a High-Throughput ETL System in Python
Medium · Programming 🔄 Data Engineering ⚡ AI Lesson 1mo ago
Building a High-Throughput ETL System in Python
How I Combined Pandas, Dask, and SQLAlchemy for Speed and Reliability Continue reading on Top Python Libraries »
Automating ETL Workflows with Apache Airflow: From Python Script to Scheduled Pipeline
Dev.to · peter muriya 🔄 Data Engineering ⚡ AI Lesson 1mo ago
Automating ETL Workflows with Apache Airflow: From Python Script to Scheduled Pipeline
Modern data engineering revolves around automation, reliability, and scalability. Writing an ETL...
Columnar Databases (ClickHouse/Snowflake)
Dev.to · Aviral Srivastava 🔄 Data Engineering ⚡ AI Lesson 1mo ago
Columnar Databases (ClickHouse/Snowflake)
The Data Titans: Diving Deep into the World of Columnar Databases (ClickHouse &...
What 166 Modules Taught Us About Building an ETL Pipeline for Website Content
Dev.to · Smuves 🔄 Data Engineering ⚡ AI Lesson 1mo ago
What 166 Modules Taught Us About Building an ETL Pipeline for Website Content
ETL is a solved problem in most of the software world. Data teams have been extracting, transforming,...
I Stopped Fixing Broken Parsers at 3 AM , Here’s How We Outsourced Our DOM Extraction
Medium · Python 🔄 Data Engineering ⚡ AI Lesson 1mo ago
I Stopped Fixing Broken Parsers at 3 AM , Here’s How We Outsourced Our DOM Extraction
It’s 3:00 AM on a Tuesday. Your PagerDuty alert is ringing. Continue reading on Medium »
Modernizing Data Ingestion: An Async PostgreSQL Pipeline with Psycopg 3
Medium · DevOps 🔄 Data Engineering ⚡ AI Lesson 1mo ago
Modernizing Data Ingestion: An Async PostgreSQL Pipeline with Psycopg 3
Orchestrating high-performance migrations using asynchronous architectures and memory-safe processing. Continue reading on Medium »
Medium · AI 🔄 Data Engineering ⚡ AI Lesson 1mo ago
The Data Engineering Part 2: Building Your First Production Data Pipeline
From raw data to real-time dashboards — a hands-on walkthrough of modern pipeline architecture using Kafka, Spark, dbt, and Airflow, plus… Continue reading on M
Medium · Machine Learning 🔄 Data Engineering ⚡ AI Lesson 1mo ago
The Data Engineering Part 2: Building Your First Production Data Pipeline
From raw data to real-time dashboards — a hands-on walkthrough of modern pipeline architecture using Kafka, Spark, dbt, and Airflow, plus… Continue reading on M
Medium · Data Science 🔄 Data Engineering ⚡ AI Lesson 1mo ago
The Data Engineering Part 2: Building Your First Production Data Pipeline
From raw data to real-time dashboards — a hands-on walkthrough of modern pipeline architecture using Kafka, Spark, dbt, and Airflow, plus… Continue reading on M
Construindo um Lakehouse Resiliente na AWS 
com Terraform e Arquitetura Medallion
Medium · DevOps 🔄 Data Engineering ⚡ AI Lesson 1mo ago
Construindo um Lakehouse Resiliente na AWS com Terraform e Arquitetura Medallion
Na Engenharia de Dados moderna, o sucesso de um projeto não é medido apenas pela eficácia de um script Python. O verdadeiro diferencial… Continue reading on Med