Get Data Into Databricks - Feature Store
Key Takeaways
Databricks Feature Store is utilized for managing and transforming raw data into features for machine learning workflows, leveraging processes such as joins, aggregates, and transformations.
Full Transcript
hi everyone in this video we'll be talking about the feature store in data bricks generally a feature store is used to create discover and reuse features in data bricks a feature tape was any Delta table with a primary key that means that when you featuri your raw data by applying any number of joins Aggregates and transforms all you have to do to that table to make it a feature table is ify a primary key at this point there are a couple of different paths you can serve that feature table online for realtime feature suring this is known as an online table or or an online store too you can also use it in an offline or a batch setting for model training and inference when you use a feature table throughout the model life cycle you ensure consistent usage of features during model training for which addresses a key component of online offline skew in this notebook I'll go over how we can do that in data breaks this example we are a travel agency that wants to provide recommendations to our users uh to do that we need a model that predicts whether or not it's likely that a given user is going to book a um to make a booking or not we have some data on our users will apply some transformation logic to convert it into the features that we'll then use to train our model to do that we'll just select the data that we have specify and apply the transformation logic don't worry about the details and store the results in a data frame this point to turn that table into a feature table we can use the feature engineering client which is a comprehensive API to create manage and use features to create the table itself I just specify the name of the table the primary key the data itself along with the metadata that I want to have associated with it this feature table is now ready and directly integrated and governed by unity catalog we look at a high level overview of the data we can see that the ID is a primary key and walking over to the lineage tab we can see from the lineage graph that this feature table is Downstream of the original data and Upstream of the model that will train with our feature table we can create the training data set by specifying the ID which we'll use to um in the feature lookup to look up the features associated with that ID combine it with our purchase label and then use it to train our model so to train the data set grabbing the ID and the label I'll use feature lookup to use those IDs to look up this list of features from the feature table that we just created and then to create the actual trading data set itself I'll combine the original data with the features that I just grabbed minus sum specify that the label is the purchase column and I have my training data set next we'll train the model the details of how uh what model we train aren't important for the Intensive purposes of highlighting the benefits of using a feature table throughout this model life cycle what is important is that when you log a model that's been trained with a feature table using the client uh feature engineering client one of the things that's stored along with the metal met the model metadata is the feature spec the feature spec specifies the inputs to the model along with how to get them for example the destination ID is an input to the model and it's retrieved with the lookup key ID so when we have our model and have moved it to production we can run inference on it to do that all we need to specify is the ID on the back in there's automatic feature lookup that takes this ID grabs the features that are associated with that ID uses those same features as the inference features this is what ensures consistent usage of um features during model training and inference and then forecast whether a user is likely to make a booking or not here in this example we'll just select the IDS that we had in our table this is contrive in a production setting you might have a data bricks job that updates the feature tables with new user ID information run inference we just feed the the IDS to forecast and we get the prediction so in this demo I showed how we can create and use a feature table throughout the model life cycle you can find this notebook um along with others on deeper dive topics such as point in time lookup which involves on thefly feature calculations deploying online tables as well as streaming feature tables at DB demos. there you'll also find other helpful tutorials that span the entire data breaks platform from data engineering data science and AI data warehousing and bi along with data coverance thank thanks for watching
Original Description
Try Databricks today: https://dbricks.co/3EAWLK6.
Link to the code: https://www.databricks.com/resources/demos/tutorials/data-science-and-ai/feature-store-and-online-inference
Discover Databricks Feature Stores in machine learning workflows. We will walk through how raw data is transformed through feature engineering processes like joins, aggregates, and transformations. These features are then stored and made discoverable for reuse in both model training and serving stages. The system ensures consistency between offline batch processing and real-time online serving for inference. A searchable interface allows users to create, discover, and manage features efficiently, enabling seamless integration into client applications.
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Databricks · Databricks · 0 of 60
← Previous
Next →
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Building AI Agent Systems with Databricks
Databricks
Databricks Workflows
Databricks
Automate Unity Catalog Upgrade with UCX Part 1: Overview
Databricks
Automate Unity Catalog Upgrade with UCX Part 2: Installation
Databricks
Automate Unity Catalog Upgrade with UCX Part 3 - Assessment
Databricks
Automate Unity Catalog Upgrade with UCX Part 4 - Group Migration
Databricks
Table Migration and Catalog Design with UCX | Part 5
Databricks
Setting Up Azure Access for UCX Table Migration | Part 6
Databricks
UCX Table Migration: Creating Catalogs and Schemas | Part 7
Databricks
Automate Unity Catalog Upgrade with UCX Part 8: Code Migration
Databricks
Streaming to Kafka Just Got Easier with DLT Pipelines
Databricks
Data Engineering From Data to Dashboards with DABs: Crunching the Cookies Dataset
Databricks
Epsilon helps businesses connect with their consumers using Databricks Data Intelligence Platform
Databricks
Unilever transforms operations with GenAI using the Databricks Data Intelligence Platform
Databricks
ActionIQ enables businesses to unlock customer data with the Databricks Data Intelligence Platform
Databricks
Mixed Attention & LLM Context | Data Brew | Episode 35
Databricks
Inside Databricks SQL: Engineering innovation with Hans
Databricks
Inside Databricks: Engineering innovation with Michael Armbrust
Databricks
The Money Team at Databricks: driving revenue and customer growth
Databricks
Unity Catalog unveiled: engineering data governance at scale
Databricks
Create a view in Databricks and share it with Power BI using Delta Sharing
Databricks
NDUS leverages Databricks Data Intelligence Platform to revolutionize higher education management
Databricks
Démo Databricks de AI/BI
Databricks
EMEA Data + AI World Tour 2024
Databricks
GenAI: The Shift to Data Intelligence - Customer Panel on Industry Use Cases
Databricks
GenAI: The Shift to Data Intelligence - Ft. Ash Jhaveri, VP of Reality Labs Partnerships at Meta
Databricks
Virtue Foundation leverages the Databricks Data Intelligence Platform to advance global health
Databricks
Announcing Synthetic Data Generation in Mosaic AI Agent Evaluation
Databricks
AI/BI Dashboards Embedding - A tutorial
Databricks
Bayer transforms global data management with the Databricks Data Intelligence Platform
Databricks
Databricks at AWS re:Invent 2024
Databricks
Hive Metastore and AWS Glue Federation in Unity Catalog
Databricks
Data + AI World Tour Paris 2024
Databricks
Retail reimagined: Currys data-first strategy to driving growth and improving operations
Databricks
Mixture of Memory Experts (MoME) | Data Brew | Episode 36
Databricks
Verana Health Data Curation and Innovation with Databricks and AWS
Databricks
Securing SaaS Applications: Obsidian Security on Their Journey with Databricks and AWS
Databricks
Twilio Eng VP on Data Intelligence & AI at AWS re:Invent 2024
Databricks
Chegg Eng SVP on Data-Driven Approach to Student Success with Databricks and AWS
Databricks
Ibotta Personalized Rewards Innovation with Databricks and AWS
Databricks
Simplify AI governance with #databricks AI Gateway
Databricks
Databricks SQL and Power BI Integration
Databricks
Databricks Serverless SQL Warehouses
Databricks
7 West powers audience growth with the Databricks Data Intelligence Platform
Databricks
Secret to Production AI: Tools & Infrastructure | Data Brew | Episode 37
Databricks
Skyflow CEO on Data Privacy with Databricks at AWS re:Invent
Databricks
Databricks Clean Rooms Product Demo
Databricks
Dun & Bradstreet Enrichment & Monitoring, powered by Delta Sharing & Databricks Marketplace
Databricks
Unpacking Libraries in Databricks
Databricks
Providence uses an AI agent system from Databricks to help doctors improve their communication
Databricks
How State Street Uses AI to Transform Millions of Trades Daily
Databricks
Vevo Therapeutics CEO on Curing Disease with Data at AWS re:Invent
Databricks
Over Architected with Nick & Holly: Databricks updates for Feb 2025
Databricks
The Power of Synthetic Data | Data Brew | Episode 38
Databricks
Use Databricks Lakehouse Federation to break down data silos
Databricks
AI's rugby score: National Rugby League rallies fans with analytics and unified data
Databricks
Open Variant Data Type in Delta Lake and Apache Spark
Databricks
How would you sort Ætheldred in the alphabet using Databricks?
Databricks
A guide on how to operationalize the Databricks AI Security Framework (DASF)
Databricks
Future-Proof Your Asset Performance Management with Generative AI - Field Assistant Live Demo
Databricks
More on: Feature Stores
View skill →Related Reads
📰
📰
📰
📰
Creativity AI #82: Anthropic maps how people really use AI, designers shift from making to mending…
Medium · AI
The End of YouTube Search? Why AI Creator Discovery Is Becoming the Smarter Way to Learn in 2026
Medium · AI
Why AI Tools Are Becoming Essential for Modern Professionals
Medium · AI
The Food Stayed Real. The World Around It Changed.
Medium · AI
🎓
Tutor Explanation
DeepCamp AI