Get Data Into Databricks - Feature Store

Databricks · Beginner ·🛠️ AI Tools & Apps ·1y ago

Key Takeaways

Databricks Feature Store is utilized for managing and transforming raw data into features for machine learning workflows, leveraging processes such as joins, aggregates, and transformations.

Full Transcript

hi everyone in this video we'll be talking about the feature store in data bricks generally a feature store is used to create discover and reuse features in data bricks a feature tape was any Delta table with a primary key that means that when you featuri your raw data by applying any number of joins Aggregates and transforms all you have to do to that table to make it a feature table is ify a primary key at this point there are a couple of different paths you can serve that feature table online for realtime feature suring this is known as an online table or or an online store too you can also use it in an offline or a batch setting for model training and inference when you use a feature table throughout the model life cycle you ensure consistent usage of features during model training for which addresses a key component of online offline skew in this notebook I'll go over how we can do that in data breaks this example we are a travel agency that wants to provide recommendations to our users uh to do that we need a model that predicts whether or not it's likely that a given user is going to book a um to make a booking or not we have some data on our users will apply some transformation logic to convert it into the features that we'll then use to train our model to do that we'll just select the data that we have specify and apply the transformation logic don't worry about the details and store the results in a data frame this point to turn that table into a feature table we can use the feature engineering client which is a comprehensive API to create manage and use features to create the table itself I just specify the name of the table the primary key the data itself along with the metadata that I want to have associated with it this feature table is now ready and directly integrated and governed by unity catalog we look at a high level overview of the data we can see that the ID is a primary key and walking over to the lineage tab we can see from the lineage graph that this feature table is Downstream of the original data and Upstream of the model that will train with our feature table we can create the training data set by specifying the ID which we'll use to um in the feature lookup to look up the features associated with that ID combine it with our purchase label and then use it to train our model so to train the data set grabbing the ID and the label I'll use feature lookup to use those IDs to look up this list of features from the feature table that we just created and then to create the actual trading data set itself I'll combine the original data with the features that I just grabbed minus sum specify that the label is the purchase column and I have my training data set next we'll train the model the details of how uh what model we train aren't important for the Intensive purposes of highlighting the benefits of using a feature table throughout this model life cycle what is important is that when you log a model that's been trained with a feature table using the client uh feature engineering client one of the things that's stored along with the metal met the model metadata is the feature spec the feature spec specifies the inputs to the model along with how to get them for example the destination ID is an input to the model and it's retrieved with the lookup key ID so when we have our model and have moved it to production we can run inference on it to do that all we need to specify is the ID on the back in there's automatic feature lookup that takes this ID grabs the features that are associated with that ID uses those same features as the inference features this is what ensures consistent usage of um features during model training and inference and then forecast whether a user is likely to make a booking or not here in this example we'll just select the IDS that we had in our table this is contrive in a production setting you might have a data bricks job that updates the feature tables with new user ID information run inference we just feed the the IDS to forecast and we get the prediction so in this demo I showed how we can create and use a feature table throughout the model life cycle you can find this notebook um along with others on deeper dive topics such as point in time lookup which involves on thefly feature calculations deploying online tables as well as streaming feature tables at DB demos. there you'll also find other helpful tutorials that span the entire data breaks platform from data engineering data science and AI data warehousing and bi along with data coverance thank thanks for watching

Original Description

Try Databricks today: https://dbricks.co/3EAWLK6. Link to the code: https://www.databricks.com/resources/demos/tutorials/data-science-and-ai/feature-store-and-online-inference Discover Databricks Feature Stores in machine learning workflows. We will walk through how raw data is transformed through feature engineering processes like joins, aggregates, and transformations. These features are then stored and made discoverable for reuse in both model training and serving stages. The system ensures consistency between offline batch processing and real-time online serving for inference. A searchable interface allows users to create, discover, and manage features efficiently, enabling seamless integration into client applications.
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Databricks · Databricks · 0 of 60

← Previous Next →
1 Building AI Agent Systems with Databricks
Building AI Agent Systems with Databricks
Databricks
2 Databricks Workflows
Databricks Workflows
Databricks
3 Automate Unity Catalog Upgrade with UCX Part 1: Overview
Automate Unity Catalog Upgrade with UCX Part 1: Overview
Databricks
4 Automate Unity Catalog Upgrade with UCX Part 2: Installation
Automate Unity Catalog Upgrade with UCX Part 2: Installation
Databricks
5 Automate Unity Catalog Upgrade with UCX Part 3 - Assessment
Automate Unity Catalog Upgrade with UCX Part 3 - Assessment
Databricks
6 Automate Unity Catalog Upgrade with UCX  Part 4 - Group Migration
Automate Unity Catalog Upgrade with UCX Part 4 - Group Migration
Databricks
7 Table Migration and Catalog Design with UCX | Part 5
Table Migration and Catalog Design with UCX | Part 5
Databricks
8 Setting Up Azure Access for UCX Table Migration | Part 6
Setting Up Azure Access for UCX Table Migration | Part 6
Databricks
9 UCX Table Migration: Creating Catalogs and Schemas | Part 7
UCX Table Migration: Creating Catalogs and Schemas | Part 7
Databricks
10 Automate Unity Catalog Upgrade with UCX  Part 8: Code Migration
Automate Unity Catalog Upgrade with UCX Part 8: Code Migration
Databricks
11 Streaming to Kafka Just Got Easier with DLT Pipelines
Streaming to Kafka Just Got Easier with DLT Pipelines
Databricks
12 Data Engineering From Data to Dashboards with DABs: Crunching the Cookies Dataset
Data Engineering From Data to Dashboards with DABs: Crunching the Cookies Dataset
Databricks
13 Epsilon helps businesses connect with their consumers using Databricks Data Intelligence Platform
Epsilon helps businesses connect with their consumers using Databricks Data Intelligence Platform
Databricks
14 Unilever transforms operations with GenAI using the Databricks Data Intelligence Platform
Unilever transforms operations with GenAI using the Databricks Data Intelligence Platform
Databricks
15 ActionIQ enables businesses to unlock customer data with the Databricks Data Intelligence Platform
ActionIQ enables businesses to unlock customer data with the Databricks Data Intelligence Platform
Databricks
16 Mixed Attention & LLM Context | Data Brew | Episode 35
Mixed Attention & LLM Context | Data Brew | Episode 35
Databricks
17 Inside Databricks SQL: Engineering innovation with Hans
Inside Databricks SQL: Engineering innovation with Hans
Databricks
18 Inside Databricks: Engineering innovation with Michael Armbrust
Inside Databricks: Engineering innovation with Michael Armbrust
Databricks
19 The Money Team at Databricks: driving revenue and customer growth
The Money Team at Databricks: driving revenue and customer growth
Databricks
20 Unity Catalog unveiled: engineering data governance at scale
Unity Catalog unveiled: engineering data governance at scale
Databricks
21 Create a view in Databricks and share it with Power BI using Delta Sharing
Create a view in Databricks and share it with Power BI using Delta Sharing
Databricks
22 NDUS leverages Databricks Data Intelligence Platform to revolutionize higher education management
NDUS leverages Databricks Data Intelligence Platform to revolutionize higher education management
Databricks
23 Démo Databricks de AI/BI
Démo Databricks de AI/BI
Databricks
24 EMEA Data + AI World Tour 2024
EMEA Data + AI World Tour 2024
Databricks
25 GenAI: The Shift to Data Intelligence - Customer Panel on Industry Use Cases
GenAI: The Shift to Data Intelligence - Customer Panel on Industry Use Cases
Databricks
26 GenAI: The Shift to Data Intelligence - Ft. Ash Jhaveri, VP of Reality Labs Partnerships at Meta
GenAI: The Shift to Data Intelligence - Ft. Ash Jhaveri, VP of Reality Labs Partnerships at Meta
Databricks
27 Virtue Foundation leverages the Databricks Data Intelligence Platform to advance global health
Virtue Foundation leverages the Databricks Data Intelligence Platform to advance global health
Databricks
28 Announcing Synthetic Data Generation in Mosaic AI Agent Evaluation
Announcing Synthetic Data Generation in Mosaic AI Agent Evaluation
Databricks
29 AI/BI Dashboards Embedding - A tutorial
AI/BI Dashboards Embedding - A tutorial
Databricks
30 Bayer transforms global data management with the Databricks Data Intelligence Platform
Bayer transforms global data management with the Databricks Data Intelligence Platform
Databricks
31 Databricks at AWS re:Invent 2024
Databricks at AWS re:Invent 2024
Databricks
32 Hive Metastore and AWS Glue Federation in Unity Catalog
Hive Metastore and AWS Glue Federation in Unity Catalog
Databricks
33 Data + AI World Tour Paris 2024
Data + AI World Tour Paris 2024
Databricks
34 Retail reimagined: Currys data-first strategy to driving growth and improving operations
Retail reimagined: Currys data-first strategy to driving growth and improving operations
Databricks
35 Mixture of Memory Experts (MoME) | Data Brew | Episode 36
Mixture of Memory Experts (MoME) | Data Brew | Episode 36
Databricks
36 Verana Health Data Curation and Innovation with Databricks and AWS
Verana Health Data Curation and Innovation with Databricks and AWS
Databricks
37 Securing SaaS Applications: Obsidian Security on Their Journey with Databricks and AWS
Securing SaaS Applications: Obsidian Security on Their Journey with Databricks and AWS
Databricks
38 Twilio Eng VP on Data Intelligence & AI at AWS re:Invent 2024
Twilio Eng VP on Data Intelligence & AI at AWS re:Invent 2024
Databricks
39 Chegg Eng SVP on Data-Driven Approach to Student Success with Databricks and AWS
Chegg Eng SVP on Data-Driven Approach to Student Success with Databricks and AWS
Databricks
40 Ibotta Personalized Rewards Innovation with Databricks and AWS
Ibotta Personalized Rewards Innovation with Databricks and AWS
Databricks
41 Simplify AI governance with #databricks AI Gateway
Simplify AI governance with #databricks AI Gateway
Databricks
42 Databricks SQL and Power BI Integration
Databricks SQL and Power BI Integration
Databricks
43 Databricks Serverless SQL Warehouses
Databricks Serverless SQL Warehouses
Databricks
44 7 West powers audience growth with the Databricks Data Intelligence Platform
7 West powers audience growth with the Databricks Data Intelligence Platform
Databricks
45 Secret to Production AI: Tools & Infrastructure | Data Brew | Episode 37
Secret to Production AI: Tools & Infrastructure | Data Brew | Episode 37
Databricks
46 Skyflow CEO on Data Privacy with Databricks at AWS re:Invent
Skyflow CEO on Data Privacy with Databricks at AWS re:Invent
Databricks
47 Databricks Clean Rooms Product Demo
Databricks Clean Rooms Product Demo
Databricks
48 Dun & Bradstreet Enrichment & Monitoring, powered by Delta Sharing & Databricks Marketplace
Dun & Bradstreet Enrichment & Monitoring, powered by Delta Sharing & Databricks Marketplace
Databricks
49 Unpacking Libraries in Databricks
Unpacking Libraries in Databricks
Databricks
50 Providence uses an AI agent system from Databricks to help doctors improve their communication
Providence uses an AI agent system from Databricks to help doctors improve their communication
Databricks
51 How State Street Uses AI to Transform Millions of Trades Daily
How State Street Uses AI to Transform Millions of Trades Daily
Databricks
52 Vevo Therapeutics CEO on Curing Disease with Data at AWS re:Invent
Vevo Therapeutics CEO on Curing Disease with Data at AWS re:Invent
Databricks
53 Over Architected with Nick & Holly: Databricks updates for Feb 2025
Over Architected with Nick & Holly: Databricks updates for Feb 2025
Databricks
54 The Power of Synthetic Data | Data Brew | Episode 38
The Power of Synthetic Data | Data Brew | Episode 38
Databricks
55 Use Databricks Lakehouse Federation to break down data silos
Use Databricks Lakehouse Federation to break down data silos
Databricks
56 AI's rugby score: National Rugby League rallies fans with analytics and unified data
AI's rugby score: National Rugby League rallies fans with analytics and unified data
Databricks
57 Open Variant Data Type in Delta Lake and Apache Spark
Open Variant Data Type in Delta Lake and Apache Spark
Databricks
58 How would you sort Ætheldred in the alphabet using Databricks?
How would you sort Ætheldred in the alphabet using Databricks?
Databricks
59 A guide on how to operationalize the Databricks AI Security Framework (DASF)
A guide on how to operationalize the Databricks AI Security Framework (DASF)
Databricks
60 Future-Proof Your Asset Performance Management with Generative AI - Field Assistant Live Demo
Future-Proof Your Asset Performance Management with Generative AI - Field Assistant Live Demo
Databricks

This video introduces Databricks Feature Store and its role in machine learning workflows, covering how raw data is transformed into features and made discoverable for reuse. It highlights the importance of consistency between offline and online processing for real-time inference.

Key Takeaways
  1. Transform raw data into features through joins, aggregates, and transformations
  2. Store and manage features in Databricks Feature Store
  3. Discover and reuse features for model training and serving
  4. Ensure consistency between offline batch processing and real-time online serving
  5. Integrate features into client applications
💡 Databricks Feature Store enables efficient feature management and reuse, ensuring consistency between offline and online processing for seamless model deployment and real-time inference.

Related Reads

📰
Creativity AI #82: Anthropic maps how people really use AI, designers shift from making to mending…
Explore how people interact with AI and the shift in design from making to mending, and learn to apply these concepts in your own work
Medium · AI
📰
The End of YouTube Search? Why AI Creator Discovery Is Becoming the Smarter Way to Learn in 2026
AI creator discovery is becoming a smarter way to learn, shifting focus from video content to creator expertise
Medium · AI
📰
Why AI Tools Are Becoming Essential for Modern Professionals
Learn how AI tools are revolutionizing everyday work for modern professionals, increasing productivity and efficiency
Medium · AI
📰
The Food Stayed Real. The World Around It Changed.
Learn how AI transformed real breakfast photographs into various art forms without altering the food itself
Medium · AI
Up next
I Built a Live Dashboard With Claude - Zero Coding, Zero IT Skills
Nicolas Boucher
Watch →