Databricks Workflows
Key Takeaways
Databricks Workflows is a managed orchestration service that simplifies creating, managing, monitoring, and repairing workflows for ETL, analytics, and machine learning pipelines, leveraging Delta Live Tables, SQL with AI functions, and serverless compute.
Full Transcript
hi everyone I'm Frank from the tech marketing team at data breaks today I'm excited to Showcase data breaks workflows a managed orchestration service fully integrated with our data intelligence platform it simplifies creating managing monitoring and repairing workflows for all your ETL analytics and machine learning pipelines let's explore a practical workflow example using our data and AI Summit cookie data set our workflow has two key objectives it adopts a datadriven approach to identify Prime locations for new bous flagship stores leveraging fresh data it also generates AI power descriptions of these locations factoring in local ingredients let's Peak under the hood to see how it's constructed the workflow consists of several components a Delta life table DLT task for data ingestion and transformation switching to DLT you can see it seamlessly integrates continuously updated realtime sales data from SQL server with franchise information from sales force both data streams were ingested using fully managed Lake flow connectors showcasing data ingestion in a real world scenario back to workflows we have an if then task for conditional branches and a SQL task that leverages an AI function to interact with large language models like dbrx GPT or Lama and then two notebooks one for the false case of the if then branch and one for triggering a downstreams app this combination allows for deficient data processing flexible workflow orchestration and advanced Aid driven analytics the intuitive pointing click authoring experience allows you to add new tasks effortlessly for instance to iterate overall flagships to all locations and perform an API call out for each I could employ a for each Loop Additionally you can incorporate other task types such as python jar files dashboard refreshes triggering a DBT project or another workflow workflow itself is a service run by data breaks with more than 99.95% availability the tasks executed by workflows run on serverless compute serverless compute provides fast efficient and reliable processing without infrastructure management offering payper use pricing and elastic scalability workflows is fully integrated with unity catalog the unified governance solution that provides centralized data Discovery Access Control auditing and lineage for example here is the full endtoend lineage diagram of our cookies example including volumes tables streaming tables and materialized views let's talk about git integration for a moment git settings in data breaks workflows enable jobs to run version controlled code from git repositories streamlining production deployments now let's explore one of workflows most powerful features automated triggers in this example I've set the workflow to run every 10 minutes ensuring regular data processing another option would be using a file trigger if a file arrives in the Bak house volume it can kick off our workflow this feature supports various Cloud object stores and offers flexible configurations based on file patterns prefixes and directories another powerful option is the table trigger for example updates to the big house sales customer table could automatically trigger our analysis pipeline for a workflow I could also Define job parameters to dynamically customize task Behavior notifications for workflow start success failure or a workflow exceeding the duration threshold and also workflow permissions to control access and execution Rights workflows include centralized monitoring to simplify life for end users this view shows me the current active runs and completed runs for the last 60 days it also includes tasks that were triggered by external orchestrators when it comes to observability I appreciate the Matrix view this view represents jobs over time complete with execution details like run ID duration and task completion status the graphical display of total execution time facilitates quick identification of longer running jobs and tasks that failed datab breaks workflow now also offers a timeline view for visualization and debugging the timeline view displays job runs as horizontal bars on a timeline showing task dependencies durations and status users can easily understand the flow of execution identify bottlenecks and troubleshoot issues the timeline view includes a critical path highlighting and the integration of query profiles this intuitive tool helps to manage and optimize complex workflows efficiently so if you detected an issue in one of the workflows you just click on repair and rerun you then Define if you want to rerun an individual task or all Downstream tasks from the issue you can highlight the arrow easily and then you can click on diagnos error which will give you a precise error message and a recommendation how to fix that issue and then if you rerun that particular workflow it's just rerunning the affected tasks which will finish much quicker and successfully finish while we focused on the UI so far in this demo workflows also provides an API for integration with external orchestrators like a py airflow we offer terraform providers if your organization already has a robust devops framework or if your engineering team is well versed in terraform however for most users looking into cicd we recommend data bricks asset bundles sometimes also called DBS daps provide a more accessible and streamlined way to manage infrastructure daps simply the packaging and deployment through a yaml file that defines the structure and components of your project including workflows and notebooks and SQL and ml models you can use any IDE such as vs code to create this file but this yaml config file can also be generated from an existing workflow this allows data professionals to focus on their core tasks without being bed down by infrastructure complexities customers can use dabs to promote their project definition from a defa to a pro environment using any cicd tool of their choice here's the key takeaway from this demonstration data breaks workflows is the fully managed Lake House orchestration service for building reliable workflows on any Cloud we at data breaks believe it's the best orchestrator for the lak house adopting workflows now paves the way for a seamless trans ition into the forthcoming unified Lake flow platform eliminating the need for future migrations
Original Description
This product demo showcases Databricks Workflows and its integration with the Data Intelligence Platform. It highlights Delta Live Tables tasks for real-time data ingestion, SQL with AI functions for generative AI, conditional branching, and notebook tasks. Other capabilities include serverless compute, Unity Catalog integration, new observability tools like matrix and timeline views, repair and rerun capabilities, and automated triggers. The demo also covers Git integration and infrastructure-as-code.
Learn more about Data Engineering on Databricks:
https://www.databricks.com/solutions/data-engineering
Note: Databricks Lakeflow unifies Data Engineering with Lakeflow Connect, Lakeflow Spark Declarative Pipelines (previously known as DLT), and Lakeflow Jobs (previously known as Workflows).
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Databricks · Databricks · 2 of 60
1
▶
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Building AI Agent Systems with Databricks
Databricks
Databricks Workflows
Databricks
Automate Unity Catalog Upgrade with UCX Part 1: Overview
Databricks
Automate Unity Catalog Upgrade with UCX Part 2: Installation
Databricks
Automate Unity Catalog Upgrade with UCX Part 3 - Assessment
Databricks
Automate Unity Catalog Upgrade with UCX Part 4 - Group Migration
Databricks
Table Migration and Catalog Design with UCX | Part 5
Databricks
Setting Up Azure Access for UCX Table Migration | Part 6
Databricks
UCX Table Migration: Creating Catalogs and Schemas | Part 7
Databricks
Automate Unity Catalog Upgrade with UCX Part 8: Code Migration
Databricks
Streaming to Kafka Just Got Easier with DLT Pipelines
Databricks
Data Engineering From Data to Dashboards with DABs: Crunching the Cookies Dataset
Databricks
Epsilon helps businesses connect with their consumers using Databricks Data Intelligence Platform
Databricks
Unilever transforms operations with GenAI using the Databricks Data Intelligence Platform
Databricks
ActionIQ enables businesses to unlock customer data with the Databricks Data Intelligence Platform
Databricks
Mixed Attention & LLM Context | Data Brew | Episode 35
Databricks
Inside Databricks SQL: Engineering innovation with Hans
Databricks
Inside Databricks: Engineering innovation with Michael Armbrust
Databricks
The Money Team at Databricks: driving revenue and customer growth
Databricks
Unity Catalog unveiled: engineering data governance at scale
Databricks
Create a view in Databricks and share it with Power BI using Delta Sharing
Databricks
NDUS leverages Databricks Data Intelligence Platform to revolutionize higher education management
Databricks
Démo Databricks de AI/BI
Databricks
EMEA Data + AI World Tour 2024
Databricks
GenAI: The Shift to Data Intelligence - Customer Panel on Industry Use Cases
Databricks
GenAI: The Shift to Data Intelligence - Ft. Ash Jhaveri, VP of Reality Labs Partnerships at Meta
Databricks
Virtue Foundation leverages the Databricks Data Intelligence Platform to advance global health
Databricks
Announcing Synthetic Data Generation in Mosaic AI Agent Evaluation
Databricks
AI/BI Dashboards Embedding - A tutorial
Databricks
Bayer transforms global data management with the Databricks Data Intelligence Platform
Databricks
Databricks at AWS re:Invent 2024
Databricks
Hive Metastore and AWS Glue Federation in Unity Catalog
Databricks
Data + AI World Tour Paris 2024
Databricks
Retail reimagined: Currys data-first strategy to driving growth and improving operations
Databricks
Mixture of Memory Experts (MoME) | Data Brew | Episode 36
Databricks
Verana Health Data Curation and Innovation with Databricks and AWS
Databricks
Securing SaaS Applications: Obsidian Security on Their Journey with Databricks and AWS
Databricks
Twilio Eng VP on Data Intelligence & AI at AWS re:Invent 2024
Databricks
Chegg Eng SVP on Data-Driven Approach to Student Success with Databricks and AWS
Databricks
Ibotta Personalized Rewards Innovation with Databricks and AWS
Databricks
Simplify AI governance with #databricks AI Gateway
Databricks
Databricks SQL and Power BI Integration
Databricks
Databricks Serverless SQL Warehouses
Databricks
7 West powers audience growth with the Databricks Data Intelligence Platform
Databricks
Secret to Production AI: Tools & Infrastructure | Data Brew | Episode 37
Databricks
Skyflow CEO on Data Privacy with Databricks at AWS re:Invent
Databricks
Databricks Clean Rooms Product Demo
Databricks
Dun & Bradstreet Enrichment & Monitoring, powered by Delta Sharing & Databricks Marketplace
Databricks
Unpacking Libraries in Databricks
Databricks
Providence uses an AI agent system from Databricks to help doctors improve their communication
Databricks
How State Street Uses AI to Transform Millions of Trades Daily
Databricks
Vevo Therapeutics CEO on Curing Disease with Data at AWS re:Invent
Databricks
Over Architected with Nick & Holly: Databricks updates for Feb 2025
Databricks
The Power of Synthetic Data | Data Brew | Episode 38
Databricks
Use Databricks Lakehouse Federation to break down data silos
Databricks
AI's rugby score: National Rugby League rallies fans with analytics and unified data
Databricks
Open Variant Data Type in Delta Lake and Apache Spark
Databricks
How would you sort Ætheldred in the alphabet using Databricks?
Databricks
A guide on how to operationalize the Databricks AI Security Framework (DASF)
Databricks
Future-Proof Your Asset Performance Management with Generative AI - Field Assistant Live Demo
Databricks
More on: AI Workflow Automation
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
How I built the OSS alternatives directory: GitHub ETL, Turso, and the UPSERT trap I hit
Dev.to · MORINAGA
Apache Iceberg in Production: Compaction, Catalogs, and the Pitfalls Nobody Warns You About
Dev.to · Gabriel Henrique
Your First Task as a Data Engineer in a New Company? Make the ETL Pipeline Testable
Towards Data Science
From DataStage and Informatica to Databricks Medallion Architecture: Why Migration Is More Than Code Conversion
Dev.to · Amit Kumar Singh
🎓
Tutor Explanation
DeepCamp AI