Databricks Workflows

Databricks · Beginner ·🔄 Data Engineering ·1y ago

Key Takeaways

Databricks Workflows is a managed orchestration service that simplifies creating, managing, monitoring, and repairing workflows for ETL, analytics, and machine learning pipelines, leveraging Delta Live Tables, SQL with AI functions, and serverless compute.

Full Transcript

hi everyone I'm Frank from the tech marketing team at data breaks today I'm excited to Showcase data breaks workflows a managed orchestration service fully integrated with our data intelligence platform it simplifies creating managing monitoring and repairing workflows for all your ETL analytics and machine learning pipelines let's explore a practical workflow example using our data and AI Summit cookie data set our workflow has two key objectives it adopts a datadriven approach to identify Prime locations for new bous flagship stores leveraging fresh data it also generates AI power descriptions of these locations factoring in local ingredients let's Peak under the hood to see how it's constructed the workflow consists of several components a Delta life table DLT task for data ingestion and transformation switching to DLT you can see it seamlessly integrates continuously updated realtime sales data from SQL server with franchise information from sales force both data streams were ingested using fully managed Lake flow connectors showcasing data ingestion in a real world scenario back to workflows we have an if then task for conditional branches and a SQL task that leverages an AI function to interact with large language models like dbrx GPT or Lama and then two notebooks one for the false case of the if then branch and one for triggering a downstreams app this combination allows for deficient data processing flexible workflow orchestration and advanced Aid driven analytics the intuitive pointing click authoring experience allows you to add new tasks effortlessly for instance to iterate overall flagships to all locations and perform an API call out for each I could employ a for each Loop Additionally you can incorporate other task types such as python jar files dashboard refreshes triggering a DBT project or another workflow workflow itself is a service run by data breaks with more than 99.95% availability the tasks executed by workflows run on serverless compute serverless compute provides fast efficient and reliable processing without infrastructure management offering payper use pricing and elastic scalability workflows is fully integrated with unity catalog the unified governance solution that provides centralized data Discovery Access Control auditing and lineage for example here is the full endtoend lineage diagram of our cookies example including volumes tables streaming tables and materialized views let's talk about git integration for a moment git settings in data breaks workflows enable jobs to run version controlled code from git repositories streamlining production deployments now let's explore one of workflows most powerful features automated triggers in this example I've set the workflow to run every 10 minutes ensuring regular data processing another option would be using a file trigger if a file arrives in the Bak house volume it can kick off our workflow this feature supports various Cloud object stores and offers flexible configurations based on file patterns prefixes and directories another powerful option is the table trigger for example updates to the big house sales customer table could automatically trigger our analysis pipeline for a workflow I could also Define job parameters to dynamically customize task Behavior notifications for workflow start success failure or a workflow exceeding the duration threshold and also workflow permissions to control access and execution Rights workflows include centralized monitoring to simplify life for end users this view shows me the current active runs and completed runs for the last 60 days it also includes tasks that were triggered by external orchestrators when it comes to observability I appreciate the Matrix view this view represents jobs over time complete with execution details like run ID duration and task completion status the graphical display of total execution time facilitates quick identification of longer running jobs and tasks that failed datab breaks workflow now also offers a timeline view for visualization and debugging the timeline view displays job runs as horizontal bars on a timeline showing task dependencies durations and status users can easily understand the flow of execution identify bottlenecks and troubleshoot issues the timeline view includes a critical path highlighting and the integration of query profiles this intuitive tool helps to manage and optimize complex workflows efficiently so if you detected an issue in one of the workflows you just click on repair and rerun you then Define if you want to rerun an individual task or all Downstream tasks from the issue you can highlight the arrow easily and then you can click on diagnos error which will give you a precise error message and a recommendation how to fix that issue and then if you rerun that particular workflow it's just rerunning the affected tasks which will finish much quicker and successfully finish while we focused on the UI so far in this demo workflows also provides an API for integration with external orchestrators like a py airflow we offer terraform providers if your organization already has a robust devops framework or if your engineering team is well versed in terraform however for most users looking into cicd we recommend data bricks asset bundles sometimes also called DBS daps provide a more accessible and streamlined way to manage infrastructure daps simply the packaging and deployment through a yaml file that defines the structure and components of your project including workflows and notebooks and SQL and ml models you can use any IDE such as vs code to create this file but this yaml config file can also be generated from an existing workflow this allows data professionals to focus on their core tasks without being bed down by infrastructure complexities customers can use dabs to promote their project definition from a defa to a pro environment using any cicd tool of their choice here's the key takeaway from this demonstration data breaks workflows is the fully managed Lake House orchestration service for building reliable workflows on any Cloud we at data breaks believe it's the best orchestrator for the lak house adopting workflows now paves the way for a seamless trans ition into the forthcoming unified Lake flow platform eliminating the need for future migrations

Original Description

This product demo showcases Databricks Workflows and its integration with the Data Intelligence Platform. It highlights Delta Live Tables tasks for real-time data ingestion, SQL with AI functions for generative AI, conditional branching, and notebook tasks. Other capabilities include serverless compute, Unity Catalog integration, new observability tools like matrix and timeline views, repair and rerun capabilities, and automated triggers. The demo also covers Git integration and infrastructure-as-code. Learn more about Data Engineering on Databricks: https://www.databricks.com/solutions/data-engineering Note: Databricks Lakeflow unifies Data Engineering with Lakeflow Connect, Lakeflow Spark Declarative Pipelines (previously known as DLT), and Lakeflow Jobs (previously known as Workflows).
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Databricks · Databricks · 2 of 60

1 Building AI Agent Systems with Databricks
Building AI Agent Systems with Databricks
Databricks
Databricks Workflows
Databricks Workflows
Databricks
3 Automate Unity Catalog Upgrade with UCX Part 1: Overview
Automate Unity Catalog Upgrade with UCX Part 1: Overview
Databricks
4 Automate Unity Catalog Upgrade with UCX Part 2: Installation
Automate Unity Catalog Upgrade with UCX Part 2: Installation
Databricks
5 Automate Unity Catalog Upgrade with UCX Part 3 - Assessment
Automate Unity Catalog Upgrade with UCX Part 3 - Assessment
Databricks
6 Automate Unity Catalog Upgrade with UCX  Part 4 - Group Migration
Automate Unity Catalog Upgrade with UCX Part 4 - Group Migration
Databricks
7 Table Migration and Catalog Design with UCX | Part 5
Table Migration and Catalog Design with UCX | Part 5
Databricks
8 Setting Up Azure Access for UCX Table Migration | Part 6
Setting Up Azure Access for UCX Table Migration | Part 6
Databricks
9 UCX Table Migration: Creating Catalogs and Schemas | Part 7
UCX Table Migration: Creating Catalogs and Schemas | Part 7
Databricks
10 Automate Unity Catalog Upgrade with UCX  Part 8: Code Migration
Automate Unity Catalog Upgrade with UCX Part 8: Code Migration
Databricks
11 Streaming to Kafka Just Got Easier with DLT Pipelines
Streaming to Kafka Just Got Easier with DLT Pipelines
Databricks
12 Data Engineering From Data to Dashboards with DABs: Crunching the Cookies Dataset
Data Engineering From Data to Dashboards with DABs: Crunching the Cookies Dataset
Databricks
13 Epsilon helps businesses connect with their consumers using Databricks Data Intelligence Platform
Epsilon helps businesses connect with their consumers using Databricks Data Intelligence Platform
Databricks
14 Unilever transforms operations with GenAI using the Databricks Data Intelligence Platform
Unilever transforms operations with GenAI using the Databricks Data Intelligence Platform
Databricks
15 ActionIQ enables businesses to unlock customer data with the Databricks Data Intelligence Platform
ActionIQ enables businesses to unlock customer data with the Databricks Data Intelligence Platform
Databricks
16 Mixed Attention & LLM Context | Data Brew | Episode 35
Mixed Attention & LLM Context | Data Brew | Episode 35
Databricks
17 Inside Databricks SQL: Engineering innovation with Hans
Inside Databricks SQL: Engineering innovation with Hans
Databricks
18 Inside Databricks: Engineering innovation with Michael Armbrust
Inside Databricks: Engineering innovation with Michael Armbrust
Databricks
19 The Money Team at Databricks: driving revenue and customer growth
The Money Team at Databricks: driving revenue and customer growth
Databricks
20 Unity Catalog unveiled: engineering data governance at scale
Unity Catalog unveiled: engineering data governance at scale
Databricks
21 Create a view in Databricks and share it with Power BI using Delta Sharing
Create a view in Databricks and share it with Power BI using Delta Sharing
Databricks
22 NDUS leverages Databricks Data Intelligence Platform to revolutionize higher education management
NDUS leverages Databricks Data Intelligence Platform to revolutionize higher education management
Databricks
23 Démo Databricks de AI/BI
Démo Databricks de AI/BI
Databricks
24 EMEA Data + AI World Tour 2024
EMEA Data + AI World Tour 2024
Databricks
25 GenAI: The Shift to Data Intelligence - Customer Panel on Industry Use Cases
GenAI: The Shift to Data Intelligence - Customer Panel on Industry Use Cases
Databricks
26 GenAI: The Shift to Data Intelligence - Ft. Ash Jhaveri, VP of Reality Labs Partnerships at Meta
GenAI: The Shift to Data Intelligence - Ft. Ash Jhaveri, VP of Reality Labs Partnerships at Meta
Databricks
27 Virtue Foundation leverages the Databricks Data Intelligence Platform to advance global health
Virtue Foundation leverages the Databricks Data Intelligence Platform to advance global health
Databricks
28 Announcing Synthetic Data Generation in Mosaic AI Agent Evaluation
Announcing Synthetic Data Generation in Mosaic AI Agent Evaluation
Databricks
29 AI/BI Dashboards Embedding - A tutorial
AI/BI Dashboards Embedding - A tutorial
Databricks
30 Bayer transforms global data management with the Databricks Data Intelligence Platform
Bayer transforms global data management with the Databricks Data Intelligence Platform
Databricks
31 Databricks at AWS re:Invent 2024
Databricks at AWS re:Invent 2024
Databricks
32 Hive Metastore and AWS Glue Federation in Unity Catalog
Hive Metastore and AWS Glue Federation in Unity Catalog
Databricks
33 Data + AI World Tour Paris 2024
Data + AI World Tour Paris 2024
Databricks
34 Retail reimagined: Currys data-first strategy to driving growth and improving operations
Retail reimagined: Currys data-first strategy to driving growth and improving operations
Databricks
35 Mixture of Memory Experts (MoME) | Data Brew | Episode 36
Mixture of Memory Experts (MoME) | Data Brew | Episode 36
Databricks
36 Verana Health Data Curation and Innovation with Databricks and AWS
Verana Health Data Curation and Innovation with Databricks and AWS
Databricks
37 Securing SaaS Applications: Obsidian Security on Their Journey with Databricks and AWS
Securing SaaS Applications: Obsidian Security on Their Journey with Databricks and AWS
Databricks
38 Twilio Eng VP on Data Intelligence & AI at AWS re:Invent 2024
Twilio Eng VP on Data Intelligence & AI at AWS re:Invent 2024
Databricks
39 Chegg Eng SVP on Data-Driven Approach to Student Success with Databricks and AWS
Chegg Eng SVP on Data-Driven Approach to Student Success with Databricks and AWS
Databricks
40 Ibotta Personalized Rewards Innovation with Databricks and AWS
Ibotta Personalized Rewards Innovation with Databricks and AWS
Databricks
41 Simplify AI governance with #databricks AI Gateway
Simplify AI governance with #databricks AI Gateway
Databricks
42 Databricks SQL and Power BI Integration
Databricks SQL and Power BI Integration
Databricks
43 Databricks Serverless SQL Warehouses
Databricks Serverless SQL Warehouses
Databricks
44 7 West powers audience growth with the Databricks Data Intelligence Platform
7 West powers audience growth with the Databricks Data Intelligence Platform
Databricks
45 Secret to Production AI: Tools & Infrastructure | Data Brew | Episode 37
Secret to Production AI: Tools & Infrastructure | Data Brew | Episode 37
Databricks
46 Skyflow CEO on Data Privacy with Databricks at AWS re:Invent
Skyflow CEO on Data Privacy with Databricks at AWS re:Invent
Databricks
47 Databricks Clean Rooms Product Demo
Databricks Clean Rooms Product Demo
Databricks
48 Dun & Bradstreet Enrichment & Monitoring, powered by Delta Sharing & Databricks Marketplace
Dun & Bradstreet Enrichment & Monitoring, powered by Delta Sharing & Databricks Marketplace
Databricks
49 Unpacking Libraries in Databricks
Unpacking Libraries in Databricks
Databricks
50 Providence uses an AI agent system from Databricks to help doctors improve their communication
Providence uses an AI agent system from Databricks to help doctors improve their communication
Databricks
51 How State Street Uses AI to Transform Millions of Trades Daily
How State Street Uses AI to Transform Millions of Trades Daily
Databricks
52 Vevo Therapeutics CEO on Curing Disease with Data at AWS re:Invent
Vevo Therapeutics CEO on Curing Disease with Data at AWS re:Invent
Databricks
53 Over Architected with Nick & Holly: Databricks updates for Feb 2025
Over Architected with Nick & Holly: Databricks updates for Feb 2025
Databricks
54 The Power of Synthetic Data | Data Brew | Episode 38
The Power of Synthetic Data | Data Brew | Episode 38
Databricks
55 Use Databricks Lakehouse Federation to break down data silos
Use Databricks Lakehouse Federation to break down data silos
Databricks
56 AI's rugby score: National Rugby League rallies fans with analytics and unified data
AI's rugby score: National Rugby League rallies fans with analytics and unified data
Databricks
57 Open Variant Data Type in Delta Lake and Apache Spark
Open Variant Data Type in Delta Lake and Apache Spark
Databricks
58 How would you sort Ætheldred in the alphabet using Databricks?
How would you sort Ætheldred in the alphabet using Databricks?
Databricks
59 A guide on how to operationalize the Databricks AI Security Framework (DASF)
A guide on how to operationalize the Databricks AI Security Framework (DASF)
Databricks
60 Future-Proof Your Asset Performance Management with Generative AI - Field Assistant Live Demo
Future-Proof Your Asset Performance Management with Generative AI - Field Assistant Live Demo
Databricks

Databricks Workflows is a fully managed Lake House orchestration service that simplifies creating, managing, and monitoring workflows for ETL, analytics, and machine learning pipelines. It leverages Delta Live Tables, SQL with AI functions, and serverless compute to provide a scalable and reliable workflow execution environment.

Key Takeaways
  1. Create a new workflow
  2. Add tasks such as Delta Live Tables, SQL, and notebook tasks
  3. Configure conditional branching and workflow triggers
  4. Deploy and monitor workflows
  5. Use the API for integration with external orchestrators
💡 Databricks Workflows provides a fully managed Lake House orchestration service that simplifies creating, managing, and monitoring workflows for ETL, analytics, and machine learning pipelines, eliminating the need for infrastructure management and providing a scalable and reliable workflow execution

Related AI Lessons

How I built the OSS alternatives directory: GitHub ETL, Turso, and the UPSERT trap I hit
Learn how to build a data pipeline for an open-source alternatives directory using GitHub ETL, Turso, and Claude Haiku summaries
Dev.to · MORINAGA
Apache Iceberg in Production: Compaction, Catalogs, and the Pitfalls Nobody Warns You About
Learn how to use Apache Iceberg in production, including compaction, catalogs, and common pitfalls to avoid, to improve data engineering workflows
Dev.to · Gabriel Henrique
Your First Task as a Data Engineer in a New Company? Make the ETL Pipeline Testable
As a new data engineer, make the ETL pipeline testable to ensure data quality and reliability
Towards Data Science
From DataStage and Informatica to Databricks Medallion Architecture: Why Migration Is More Than Code Conversion
Learn how to migrate legacy ETL systems like DataStage to modern architectures like Databricks Medallion, and why it's more than just code conversion
Dev.to · Amit Kumar Singh
Up next
A Moment Frozen in Time | Arnav Iyengar | TEDxJenks Youth
TEDx Talks
Watch →