Summit Live: How Databricks Uses Databricks

Databricks · Beginner ·🔄 Data Engineering ·11mo ago

Key Takeaways

Databricks leverages its own platform to operate an enterprise lakehouse, utilizing data and AI to guide decisions, with Bruce Wong, head of data platforms, sharing insights on how his team uses Databricks internally, including Rag Search and Vector Stores.

Full Transcript

[Music] [Applause] All right, we are back live again. I'm Ari with Carly and Bruce and we're here to talk about everything how data bicks uses data bricks and uh I love your outfit, the DBX data platform uh which is your role. Why don't you tell everyone Yeah. what you do here? Yeah. So, my name is Bruce Wong. I lead our data platform organization. Uh which is very interesting. We're leading a data platform or at a data platform company. It's very meta. Yes, very quite meta. Um, and so, you know, like that's actually been really the the interesting intersection is we think about our internal function. How do we use data and AI across the entirety of our business? And so, we work with everyone across engineering, across support, across finance, HR, you name the team, we've worked with them. and they all use data bricks. And so how do we configure and how do we build that function on top of data bricks. Now what's interesting about that is we also work very very closely with our product engineers and our product management to to help define what that roadmap looks like. And so we're really as a company kind of leading and pioneering the way uh for what we think the the future looks like for all companies to operate and stuff. So you're pushing data bicks to its limit within data bricks. Absolutely. And actually fun fact uh data bicks our our workspace is a is a top three workspace by all measures by compute storage and so forth. So we are effectively the largest customer of data bricks itself. That's where the dog food slang comes from. Right. I I had to learn what that meant when I joined. I was like why is everyone here talking about dog food? Yeah. Can you explain that a little bit why we Yeah, I mean the there's a there's a saying in like uh the tech industry and I think other places too. It's like uh you know eat your own dog food or drink your own champagne stuff. Um and so we actually call it log food as a as a as a means of that. And it's actually just we want to try and use the product in as many ways as we possibly can. um so that we can we can figure out what does that actually mean and what are the challenges that we have with that and how did that feed back into making the product significantly better, right? And we like to say if it works at our scale, it will most likely work with all of our customers as well. Yeah. And that that to me is like one of the two best ways to develop your product is have your customers give you feedback. You know, all the product advisory councils, which is great, and then have your own employees. So I I fit in marketing and we have uh love the marketing we have non technical very technical people uh product marketing management TMM very technical and we also have you know uh traditional marketing and events people don't they shouldn't need to have to write Python code or SQL code and so like for example we have like AI genie AIB dashboard on our own data it could be and our salespeople Salesforce Marquetto Slack messages and you could ask questions, you know, how many you how much revenue are we getting from this event versus that event? Which ebooks are getting the most views and by us doing that uh you know we get to see this doesn't quite make sense or it' be better to do this way or that way and that feeds back to the data platform how we all synthesize it together. So that's brilliant. Yeah, absolutely. And I you know I would say you know one of the things that um that we talk about is we really make our careers on joining data right to put it really simply right and it's like can I join the marketing data with the product data can I join uh our you know our customer insights about how customers are using the product with either for that marketing campaign or for the sales motion right or even how do we think about that of the evolution of our product to support those things and and so like a lot of uh a lot of the things we've been looking at is like okay how you know what what portion of our customers is starting to use AI as part of their their day in and dayout workflow how does that inform AIBI's roadmap how does that inform even all like I would actually say the entire product roadmap across the entire surface is really being transformed and enabled by AI right and then us as a data platform organization And it's like great, we have all these amazing capabilities from the product. What does that actually mean for us to like enable that and use that inside of data bricks to to kind of live that journey and pioneer that path? So, so it's a really really exciting time to be in the data platform space. Yeah, it's a I like my team and myself like this is this is the job we wanted. We had you know a lot of us could have gone almost anywhere inside of engineering and like we're like no this is we're right at the center. this is the thing we wanted to do. Uh, and it's really great to, you know, work with so many amazing people, amazing, uh, people in marketing, amazing people in sales, amazing people in engineering. Uh, and seeing all of that come to life, right? Uh, and so we get to see, I like to say, we get to see the early parts of the product. We get to see the, you know, the good, bad, and the ugly, right? But we also get to see when when we find something good and we figure out that recipe that works inside we actually have a number of stories where you know the product we we we're worked ahead of the product. We're like we think there's something here that we need to solve for us internally and we actually have multiple stories of that it worked for us internally and then we worked with product managers and they're like our customers said the same exact things as you guys did. what did you guys do to solve this? And we're like, oh, this is our strategy. Here's our docs. Here's our code. Yeah. And that's actually made its way into the product roadmap and into the product. That's Yeah. One, another example is we call Arya. It's like our AR like our anal analyst not the data science analyst but like Gartner Forester IDC's of the world uh relation application where you know there's rag there's fine-tuning there's traditional AIG but then there's a situation where you want to finally curate specific answers if an analyst firm asks you know a question are you ISO compliant etc how many point you want it to be very curated and specific And that was dog food, log food uh capabilities. And now, you know, we see customers uh getting the benefit of how the product improved because of that. And um and and speaking of that, uh you've been busy and your team has been busy here at Summit, like a dozen speeches. I think you've given two. One about something about swimming in a lakehouse and stopping to guess costs. I loved that title. What was it? Swimming in your own. Swimming in our own lakehouse. Yeah. So that talk we gave uh we gave on Tuesday um is sort of a you know 10,000 foot view of how we use data bicks at data bicks and also how we built and architected our data bicks instance to work across all the entire company um and I think that that's actually uh a unique thing for us is like you know had we not existed as a data platform or you know each group probably would have created their own layout and we would have had silos we would have had silos, we would have had not one amazing lakehouse to join all the data together and collaborate together as a company. We would have had 11 different lakehouses. And it was like which lakehouse should I go for which thing? And we're like no, we think that we think that this is the right opinionated way to do this is have one team and one way to do this. One way that we should practice data in AI at the company. Now we have to understand all the different personas like we have everyone from non uh less technical users that we need to enable things like Genie and natural language processing and so forth but then we also have extremely technical people at the company that we actually serve like our support is some of the most technical support people that we've ever encountered our engineering core super technical and so how do you build a platform that can work for everyone but they're very very different use cases right and data bricks really allows us to actually do that. Um, and we've actually done a lot of work with the product team around personas of our customers, but personas of our own company as well and stuff. That's incredible. I know a lot of people watching either are on a data platform team, running one, or trying to start one. What do you see? What's the commonality between the best data platform teams? Yeah. Well, first off, I'll say I believe that I'm biased, but I believe the best data platform teams use data bricks and and you know, and I can say that having used the product for I've been here about three years and you know, when I first started, I was like, I have certain things I love about data bricks and I have certain things like, yeah, that's a little bit of rough edges and but through that collaboration and that working with the different product teams, you know, we really have a a best-in-class data platform now and and getting to work with all the different surfaces like there's so many things that data data bricks just we get out of the box for free and it means that my team gets to focus on some much more interesting work. Yeah. Uh a good example is like we were working on something with our security team and about the auditability of people using uh certain data and we're like well this is great. Well, since we use data bricks, the audit logging comes for free. I didn't need to spend any engineering effort on building a whole audit logging framework, a logging standard or any of that. We basically made the decision, the architectural decision, do as much as we possibly can on data bricks and the auditing is taken care of for us. Yeah, but that's like one very tangible example, but there's so many examples of like I don't want to worry about storage. I don't actually care like I don't care about what hard drives our data is stored on, right? I care that I don't actually care about which cloud provider storage, right, that we use, right? Um, and so data bricks lets us like zoom out and really look at from a first principled standpoint, what is the thing that what's the opportunity at hand? What's the problems that we're addressing? And we get to really focus on the value of those those things and instead of worrying about oh is the auto logging there oh is the storage the correct thing or not the product team the product actually lets us have that leverage to focus on that value. Yeah, I come from a security background and I can tell you audit logging is usually one of the most painful things. Yes. That you can do as a security org and you get a lot of friction and push back from people because you're building something that you know inherently is going to be challenging. It's going to require a lot of overhead. It's going to require cutting people out of things they might think they want access to. And having that out of the box is like incredible. Oh, it's table stakes. Yeah. And you know from an engineering perspective, from a data platform practitioner perspective, you know, we we are very security-minded. We work we partner with security. We partner with legal all the time. We know the importance of audit logging and if we needed to, we'd absolutely do what's necessary for audit. But the fact that we are on data bricks means that we just don't have to, right? So it's not the most exciting work for a data platform engineer to build auto logging, but we don't have to now because we're building on top of data bricks as a as a very solid foundation for us. So exciting. Well, good. Well, we're getting the wrap-up signal. First of all, thank you for coming on. Honestly, a lot of people saying how does data bricks use data bricks? You know, think why is is great that we keep improving the product with ourselves and then uh with our customers and our partners. Uh but yeah, wanted to thank you for coming on. Yeah, thank you very much for having me here. Right. Awesome. [Applause]

Original Description

Ever wonder how Databricks operates its own enterprise lakehouse, where all employees and all teams inside use data and AI to solve problems and guide our decisions? Bruce Wong, head of data platforms, will talk about how his team leverages Databricks itself.
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Databricks · Databricks · 0 of 60

← Previous Next →
1 Building AI Agent Systems with Databricks
Building AI Agent Systems with Databricks
Databricks
2 Databricks Workflows
Databricks Workflows
Databricks
3 Automate Unity Catalog Upgrade with UCX Part 1: Overview
Automate Unity Catalog Upgrade with UCX Part 1: Overview
Databricks
4 Automate Unity Catalog Upgrade with UCX Part 2: Installation
Automate Unity Catalog Upgrade with UCX Part 2: Installation
Databricks
5 Automate Unity Catalog Upgrade with UCX Part 3 - Assessment
Automate Unity Catalog Upgrade with UCX Part 3 - Assessment
Databricks
6 Automate Unity Catalog Upgrade with UCX  Part 4 - Group Migration
Automate Unity Catalog Upgrade with UCX Part 4 - Group Migration
Databricks
7 Table Migration and Catalog Design with UCX | Part 5
Table Migration and Catalog Design with UCX | Part 5
Databricks
8 Setting Up Azure Access for UCX Table Migration | Part 6
Setting Up Azure Access for UCX Table Migration | Part 6
Databricks
9 UCX Table Migration: Creating Catalogs and Schemas | Part 7
UCX Table Migration: Creating Catalogs and Schemas | Part 7
Databricks
10 Automate Unity Catalog Upgrade with UCX  Part 8: Code Migration
Automate Unity Catalog Upgrade with UCX Part 8: Code Migration
Databricks
11 Streaming to Kafka Just Got Easier with DLT Pipelines
Streaming to Kafka Just Got Easier with DLT Pipelines
Databricks
12 Data Engineering From Data to Dashboards with DABs: Crunching the Cookies Dataset
Data Engineering From Data to Dashboards with DABs: Crunching the Cookies Dataset
Databricks
13 Epsilon helps businesses connect with their consumers using Databricks Data Intelligence Platform
Epsilon helps businesses connect with their consumers using Databricks Data Intelligence Platform
Databricks
14 Unilever transforms operations with GenAI using the Databricks Data Intelligence Platform
Unilever transforms operations with GenAI using the Databricks Data Intelligence Platform
Databricks
15 ActionIQ enables businesses to unlock customer data with the Databricks Data Intelligence Platform
ActionIQ enables businesses to unlock customer data with the Databricks Data Intelligence Platform
Databricks
16 Mixed Attention & LLM Context | Data Brew | Episode 35
Mixed Attention & LLM Context | Data Brew | Episode 35
Databricks
17 Inside Databricks SQL: Engineering innovation with Hans
Inside Databricks SQL: Engineering innovation with Hans
Databricks
18 Inside Databricks: Engineering innovation with Michael Armbrust
Inside Databricks: Engineering innovation with Michael Armbrust
Databricks
19 The Money Team at Databricks: driving revenue and customer growth
The Money Team at Databricks: driving revenue and customer growth
Databricks
20 Unity Catalog unveiled: engineering data governance at scale
Unity Catalog unveiled: engineering data governance at scale
Databricks
21 Create a view in Databricks and share it with Power BI using Delta Sharing
Create a view in Databricks and share it with Power BI using Delta Sharing
Databricks
22 NDUS leverages Databricks Data Intelligence Platform to revolutionize higher education management
NDUS leverages Databricks Data Intelligence Platform to revolutionize higher education management
Databricks
23 Démo Databricks de AI/BI
Démo Databricks de AI/BI
Databricks
24 EMEA Data + AI World Tour 2024
EMEA Data + AI World Tour 2024
Databricks
25 GenAI: The Shift to Data Intelligence - Customer Panel on Industry Use Cases
GenAI: The Shift to Data Intelligence - Customer Panel on Industry Use Cases
Databricks
26 GenAI: The Shift to Data Intelligence - Ft. Ash Jhaveri, VP of Reality Labs Partnerships at Meta
GenAI: The Shift to Data Intelligence - Ft. Ash Jhaveri, VP of Reality Labs Partnerships at Meta
Databricks
27 Virtue Foundation leverages the Databricks Data Intelligence Platform to advance global health
Virtue Foundation leverages the Databricks Data Intelligence Platform to advance global health
Databricks
28 Announcing Synthetic Data Generation in Mosaic AI Agent Evaluation
Announcing Synthetic Data Generation in Mosaic AI Agent Evaluation
Databricks
29 AI/BI Dashboards Embedding - A tutorial
AI/BI Dashboards Embedding - A tutorial
Databricks
30 Bayer transforms global data management with the Databricks Data Intelligence Platform
Bayer transforms global data management with the Databricks Data Intelligence Platform
Databricks
31 Databricks at AWS re:Invent 2024
Databricks at AWS re:Invent 2024
Databricks
32 Hive Metastore and AWS Glue Federation in Unity Catalog
Hive Metastore and AWS Glue Federation in Unity Catalog
Databricks
33 Data + AI World Tour Paris 2024
Data + AI World Tour Paris 2024
Databricks
34 Retail reimagined: Currys data-first strategy to driving growth and improving operations
Retail reimagined: Currys data-first strategy to driving growth and improving operations
Databricks
35 Mixture of Memory Experts (MoME) | Data Brew | Episode 36
Mixture of Memory Experts (MoME) | Data Brew | Episode 36
Databricks
36 Verana Health Data Curation and Innovation with Databricks and AWS
Verana Health Data Curation and Innovation with Databricks and AWS
Databricks
37 Securing SaaS Applications: Obsidian Security on Their Journey with Databricks and AWS
Securing SaaS Applications: Obsidian Security on Their Journey with Databricks and AWS
Databricks
38 Twilio Eng VP on Data Intelligence & AI at AWS re:Invent 2024
Twilio Eng VP on Data Intelligence & AI at AWS re:Invent 2024
Databricks
39 Chegg Eng SVP on Data-Driven Approach to Student Success with Databricks and AWS
Chegg Eng SVP on Data-Driven Approach to Student Success with Databricks and AWS
Databricks
40 Ibotta Personalized Rewards Innovation with Databricks and AWS
Ibotta Personalized Rewards Innovation with Databricks and AWS
Databricks
41 Simplify AI governance with #databricks AI Gateway
Simplify AI governance with #databricks AI Gateway
Databricks
42 Databricks SQL and Power BI Integration
Databricks SQL and Power BI Integration
Databricks
43 Databricks Serverless SQL Warehouses
Databricks Serverless SQL Warehouses
Databricks
44 7 West powers audience growth with the Databricks Data Intelligence Platform
7 West powers audience growth with the Databricks Data Intelligence Platform
Databricks
45 Secret to Production AI: Tools & Infrastructure | Data Brew | Episode 37
Secret to Production AI: Tools & Infrastructure | Data Brew | Episode 37
Databricks
46 Skyflow CEO on Data Privacy with Databricks at AWS re:Invent
Skyflow CEO on Data Privacy with Databricks at AWS re:Invent
Databricks
47 Databricks Clean Rooms Product Demo
Databricks Clean Rooms Product Demo
Databricks
48 Dun & Bradstreet Enrichment & Monitoring, powered by Delta Sharing & Databricks Marketplace
Dun & Bradstreet Enrichment & Monitoring, powered by Delta Sharing & Databricks Marketplace
Databricks
49 Unpacking Libraries in Databricks
Unpacking Libraries in Databricks
Databricks
50 Providence uses an AI agent system from Databricks to help doctors improve their communication
Providence uses an AI agent system from Databricks to help doctors improve their communication
Databricks
51 How State Street Uses AI to Transform Millions of Trades Daily
How State Street Uses AI to Transform Millions of Trades Daily
Databricks
52 Vevo Therapeutics CEO on Curing Disease with Data at AWS re:Invent
Vevo Therapeutics CEO on Curing Disease with Data at AWS re:Invent
Databricks
53 Over Architected with Nick & Holly: Databricks updates for Feb 2025
Over Architected with Nick & Holly: Databricks updates for Feb 2025
Databricks
54 The Power of Synthetic Data | Data Brew | Episode 38
The Power of Synthetic Data | Data Brew | Episode 38
Databricks
55 Use Databricks Lakehouse Federation to break down data silos
Use Databricks Lakehouse Federation to break down data silos
Databricks
56 AI's rugby score: National Rugby League rallies fans with analytics and unified data
AI's rugby score: National Rugby League rallies fans with analytics and unified data
Databricks
57 Open Variant Data Type in Delta Lake and Apache Spark
Open Variant Data Type in Delta Lake and Apache Spark
Databricks
58 How would you sort Ætheldred in the alphabet using Databricks?
How would you sort Ætheldred in the alphabet using Databricks?
Databricks
59 A guide on how to operationalize the Databricks AI Security Framework (DASF)
A guide on how to operationalize the Databricks AI Security Framework (DASF)
Databricks
60 Future-Proof Your Asset Performance Management with Generative AI - Field Assistant Live Demo
Future-Proof Your Asset Performance Management with Generative AI - Field Assistant Live Demo
Databricks

Learn how Databricks operates its own enterprise lakehouse, leveraging data and AI to solve problems and guide decisions, with insights from Bruce Wong, head of data platforms. Discover how to implement Rag Search and utilize Vector Stores to improve data analytics. This talk is crucial for understanding the importance of a well-designed lakehouse architecture in driving business decisions.

Key Takeaways
  1. Design a lakehouse architecture
  2. Implement Rag Search
  3. Utilize Vector Stores
  4. Deploy data platforms
  5. Leverage AI for data analytics
  6. Optimize data storage
  7. Monitor and maintain the lakehouse
💡 A well-designed lakehouse architecture is crucial for driving business decisions, and leveraging AI and data analytics can significantly improve problem-solving and decision-making.

Related AI Lessons

How I built the OSS alternatives directory: GitHub ETL, Turso, and the UPSERT trap I hit
Learn how to build a data pipeline for an open-source alternatives directory using GitHub ETL, Turso, and Claude Haiku summaries
Dev.to · MORINAGA
Apache Iceberg in Production: Compaction, Catalogs, and the Pitfalls Nobody Warns You About
Learn how to use Apache Iceberg in production, including compaction, catalogs, and common pitfalls to avoid, to improve data engineering workflows
Dev.to · Gabriel Henrique
Your First Task as a Data Engineer in a New Company? Make the ETL Pipeline Testable
As a new data engineer, make the ETL pipeline testable to ensure data quality and reliability
Towards Data Science
From DataStage and Informatica to Databricks Medallion Architecture: Why Migration Is More Than Code Conversion
Learn how to migrate legacy ETL systems like DataStage to modern architectures like Databricks Medallion, and why it's more than just code conversion
Dev.to · Amit Kumar Singh
Up next
A Moment Frozen in Time | Arnav Iyengar | TEDxJenks Youth
TEDx Talks
Watch →