Data Engineering with Scala and Spark

External: Coursera Courses ↗ · Coursera

Open Course on External: Coursera

Free to audit · Opens on External: Coursera

Data Engineering with Scala and Spark

Coursera · Intermediate ·🔄 Data Engineering ·3mo ago

Skills: ETL Basics90%Workflow Orchestration70%Data Warehousing60%

Key Takeaways

Equips data engineers with skills to build scalable data pipelines using Scala and Spark

Original Description

This course is designed to equip data engineers with the skills to build scalable and efficient data pipelines using Scala and Spark. Data engineers will learn best practices for development, testing, and deployment in cloud environments, with a focus on optimizing performance and ensuring data quality. The course provides the necessary tools to transform raw data into actionable insights, making it highly relevant in today’s data-driven world. Throughout the course, learners will improve their data engineering skills by mastering techniques for building both streaming and batch data pipelines. The content emphasizes practical outcomes such as performance tuning and data profiling. With hands-on examples and step-by-step guidance, learners will gain a solid understanding of real-time and batch processing pipelines. What makes this course unique is its combination of foundational theory and real-world applications. By the end, you will be able to use Scala and Spark to process large datasets and optimize pipelines in cloud environments effectively. This course is ideal for data engineers with some experience in data processing. While it assumes familiarity with data engineering concepts and cloud technologies, anyone eager to improve their skills in Scala and Spark will benefit from the practical, step-by-step approach.

Watch on External: Coursera ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

More on: ETL Basics

View skill →

Automate ETL Pipelines

Automate ETL Pipelines

Data Engineering with Delta Lake on Databricks

Data Integration and ETL with Talend

Data Integration and ETL with Talend

Building Batch Pipelines in Cloud Data Fusion

Analytics in 15: Save Time! Try No-Code Data Movement and Transformation

Analytics in 15: Save Time! Try No-Code Data Movement and Transformation

Talend Data Integration: Build & Automate Workflows

Talend Data Integration: Build & Automate Workflows

Related Reads

I Built My Second ETL Pipeline. This Time, I Started Thinking Like a Data Engineer

Learn how to build a production-ready ETL pipeline with Python, Docker, PostgreSQL, and Kestra by thinking like a data engineer

Towards Data Science

JuiceFS Sync for PB-Scale Data Transfers: Resumable Sync, Encryption, and Bandwidth Control

Learn how to efficiently transfer large volumes of data using JuiceFS Sync, which offers resumable sync, encryption, and bandwidth control, ideal for PB-scale data transfers.

How Airflow is using AI to make data engineering more resilient, not more complex

Airflow uses AI to make data engineering more resilient by detecting data drift, resuming failed pipelines, and fixing issues automatically, reducing complexity and improving reliability.

What Can We Do When Memory Becomes the New Bottleneck in Data Engineering?

Learn how to overcome memory bottlenecks in data engineering using Pandas chunking, Dask, and Polars, and why it matters for processing large datasets

Towards Data Science

A Moment Frozen in Time | Arnav Iyengar | TEDxJenks Youth