Fuelling Decisions: How DTN Powers Gas Pricing and Data Science Collaboration

Outerbounds · Intermediate ·🛠️ AI Tools & Apps ·2y ago

Skills: AI Workflow Automation90%Tool Use & Function Calling80%

Key Takeaways

DTN leverages Metaflow with Kubernetes to build a collaborative Jupyterhub data science platform, utilizing automated pipelines for seamless deployment from Gitlab to Argo-workflows and tracking costs with Kubecost.

Full Transcript

moft Tyler is a a data science lead at uh at dtn a company that has a very interesting tagline it says oper provides operational intelligence for confident decisions I love it uh and of course he's been a very Ardent follower and user of metaflow for several years I believe now so welcome Tyler and over to you thank you Shri uh so as she mentioned I'm a data science platform lead at dtn it's actually been less than a year sh I know it's felt like years but pretty new um so let me know if you can't see this but I think it's working so the title of my talk is increasing velocity with metaflow my goal today is to kind of make this like a demonstration of the workflow of our average data scientist at dtn and how they um you know would go from iterations to actually deploying a production flow with metaflow here's a bit of the agenda today so I'll start with some of the types of things that dtn does and then we'll get into the different aspects of our architecture and how the data scientists use them so dtn primarily focuses on three main areas that is weather Fuel and Agriculture and so on the data science side of things these are some examples of projects that we've built on um products that customers are actively using so this one's called storm impact analytics so essentially we're ingesting weather forecasts and it will export and predict on potential impacts to um infrastructure be that power or roads Etc and what our customers do with this is when they get a pred ition they're better able to allocate resources to send you know more people to fix power lines before the storm hits that kind of thing this meeting is being recorded on the agriculture side of things uh one of our main things we do is crop yield modeling so we'll ingest satellite imagery data um previous Harvest data in order to generate models for the current year on estimated yields for different crops this is an example of a graph that comes out of one of those types of models and then finally fuel demand modeling um this is on our fuel side of things dtn's involved with a lot of the point Sales Systems for um the last leg of fuel transport which is essentially like the tankers to the gas stations and so with that data we're able to predict some of the demands over time time and this is an example of the demands year-over-year um during coid and how it's a little bit lower so that's some examples of what dtn does um on the data science side of things on to how we do it so our main interface with metaflow is through a project called Jupiter Hub and I'll show that in a second but want to essentially what Jupiter Hub is it's a preconfigured computational environment so all we have to do to to get someone set up um on our environment and just enable their email and our SSO and when they log in they have access to cond environments a shared file system to a decently large amount of compute and they're just off the races and so I'll show a little bit of a demo what that looks like so this is an an example of Jupiter Hub and this is actually a production environment I just created a few um example files to show how it works but I think most of us are familiar with Jupiter notebooks and so you can execute python in real time each user has the ability to use about 130 gigs of RAM we had to split that up because it's a overallocated single ec2 instance that's this is running on so we if they were unlimited they would uh crash the server which was never fun but the nice thing about this is that um metaflow is completely set up via environment variables and so you can see here we have a shared environment which is immutable to the user and so when people iterate on jupyterhub itself they'll be using this DS environment and this is useful because occasionally we have to manually modify meta flow for certain things that occur in our environment one of those for example is the kubernetes auto scaler it was like deciding to try and move pods from one node to another to try and um eliminate nodes that were underutilized and naturally that would break your job and made scientist set so um it's nice to be able to kind of push out updates to metaflow so that the users don't have to worry about doing it themselves and we have consistency across all of our users the other thing is that metaflow itself is preconfigured so here's just a few of the metlow environment variables um this isn't all of them because some of them are sensitive with tokens to Argo and that kind of thing but you can see here that we have like the container image and this is because pulling from the public Docker container with large jobs was blowing out the um limits and so we pull from Amazon ECR because we run on Amazon um we also have kubernetes automatically configured so as soon as the user logs in it authenticates to our kubernetes cluster through I am RS so it's really nice and that you just it's like kind of Click button solution to get your data scientists up and running hey quick quick questions here tiger so is Jupiter itself also running on kubernetes or is that a separate setup and and the kubernetes environment for meta flow is a separate setup it could be currently it's not um so right now it's just running on a single large E2 instance and part of that is that was just the existing architecture but the thing is that um it enables us just to use like a single large instance and over allocate it really easily whereas on kubernetes the default is to have each user have their own pod got it got it so one large easy to instance hosts Jupiter Hub users log in to I mean I guess there is a URL for logging in so you have a UI and then you it's authenticated users log in when they log in all users are sharing the the same ec2 instance so all their Jupiter notebooks are running on the same ec2 instance yeah with which has pre-configured meta flow and then when they run their flows the flows actually run on a kubernetes cluster that is Elsewhere on AWS but elsewhere maybe correct it's on a eks cluster on AWS and that's obviously if they use The kubernetes Decorator they can also run flows locally of course yeah nice and if anyone has any other questions feel free to uh to ask them as we go as well all right so I wanted to show um a sample pipeline like I mentioned of how a project starts and then actually gets scheduled on Argo and so we have a cookie cutter repo that data scientists will use to create all of the template files to create a project and within that you can schedule multiple flows in the gitlab CI files and then whenever your flow is pushed to the main branch in gitlab it will automatically deploy itself to Argo so to kind of start showing how that works um most of our iteration with data scientists is done in Jupiter notebooks and then the next step after that once you have a working concept is to create a flow file and so we've all seen metaflow files um with our setup we have it set so that flows are put in the source directory and this is because by default it will include all of the current python files and all subdirectories of python files in the code package and so here I created an example of adding a really simple function to your source and so you can test this locally and then you can also test this on kubernetes when you want to scale up and so this is an example example of adding The kubernetes Decorator and here we can see an example of this running it's worth noting that we also have this set up with the UI so that makes debugging really easy you can just click the button with your link and go and look at the UI and see all the standard error standard outs for your flows once you have your flow set up the way you want and you want to deploy it to prod you go to the gitlab CI file so like in the base of the repo there's a file that is the CI and this is what gitlab pulls up in order to um see what python files are listed to push through the CI let me move this so I can actually access that so we made it so that this pulls from a template repository so again like if we want to change how the template works for all data scientists we just change the template in gitlab and then all of the projects inherit from that template so this is what the template looks like hopefully this is big enough but essentially it runs that U Argo workflows create command we've added a few additional things like it will tag it with the gsha it we also have um a slack web hook on Argo so that if things fail it will notify us in a slack Channel and then this is all triggered to run whenever there's a change to either the python file for your flow or any of the subdirectories within your source files and the goal of this is so that the state of your flow is always represented in the main branch so that when another person comes by to try and like fix your flow if it's broken and the main developer on vacation we know the state of the flow and where all the code is and I'm guessing that gitlab is configured to have the cube config of the kubernetes cluster so when you do python whatever flow. piy Argo workflows create it already is running in an environment that has access to the kubernetes cluster it has access to cube config so that it can connect to that cluster and so on right correct yeah and that's where this like Ci project user comes into play so we have it configured to an IMR that has sufficient privileges to deploy things to kubernetes MH nice and to show an example of this running um so we have a few different like linting Steps here with black rough he lint Etc we have a testing phase and then we have this um deployment phase where it goes and deploys your job to Argo um and so we've also integrated with Cube cost and this was kind of a happen stance that CP cost had a mechanism for this but within CP cost and for contacts coup cost is an open source project that enables you to filter based on different pods or tags and then it pings the AWS cost API to estimate the cost of your running pods and so we install coup cost and there's this concept called allocations and you're able to map allocations to tags and kubernetes objects and because the project tag when you're creating a flow maps to a particular tag within kubernetes we just added that tag and now cop is able to track all of our flows based on our projects so as an example of that this is someone who had a project called first 2023 and you can see here that it tracks like the CPU cost it tracks the ram the persistent volumes Etc it's kind of cool in that it will also show you like increase and decrease of cost over time as well as your efficiency um and so that actually was like kind of the the sum of My Demo it went a little faster than I anticipated but if there's any questions I can dive deeper into any part of this and but thanks for watching questions anyone feel free to either chime in or send it on uh on Zoom chat or actually I'm also keep I should keep an eye on ask metaflow if you have questions you can ask there on slack um I I have lots of questions but I if there if there are people who have more questions feel free to ask them first I have a quick question about ccas does it also track S3 costs or is it mostly the kubernetes cost by by default no because S3 isn't directly associated with a kubernetes object yeah okay um with S3 I mean one of the reasons we did coup cost is because there's not really a native way in AWS cost Explorer to like pull out the granularity of PODS within kubernetes but you should be able to do that um in S3 I a quick question okay uh so maybe I missed that how you managing like in the deployment process how are you managing like if you delete a a workflow or for example or you want to rename it like what's the kind of cleanup process or how do you manage to state with what's deployed uh Argo workflows yeah so there's um a UI for Argo I and so that's where we will go if you go to Argo UI you can log in and see the state of all of your cron jobs and that's where um you can go to edit things so you just question if you well you just quick follow up if you want to kind of delete the deployment you would just go to the UI and delete it there or you can do that there's also like an Argo workflows delete command in the metaflow CLI um we don't have that integrated with our gitlab because it's not super common that people fully delete things but if you wanted to you could delete it with your flow file okay uh it's not related but it might be of interest but um we've also installed a cube cost but actually AWS have an integration with it now that means you can keep more than 15 days worth of data um so yeah might be well is that the one that uh integrates with the S3 bucket that tracks your costs uh I'm not sure I don't think so I think you can add that on but um yeah it came out last year originally you couldn't do it but it came out late last year and now you can um I'm happy to send you a link offline yeah the link to the do be awesome hey Tyler great presentation I have a question uh on Cube cast and and and the Argo deployment piece so one so so are you deploying your metaflow jobs across multiple clusters and then tracking these cost across multiple clusters or are these within a cluster and and second half of the question is like hey how how do you manage to deploy your jobs across multiple clusters and tracking them using Aro as well so can you talk a bit about that yeah so all of our jobs are deployed on a single cluster um and I think that answers the second question as well because you don't deploy a multiple cluster okay I think Cube cost by default is kind of like a premium model um so there's a sass offering and that integrates with multiple clusters but by default CU cost it doesn't have like a integration with multiple clusters so could Tyler question about like the data scientists experience of using kind of sort of this model yeah uh what is what is what is their feedback on this like when like I'm guessing Jupiter Hub is something that many people are familiar with so they are happy to kind of start there but everything following that uh using metaflow using Argo workflows on kubernetes like what is the data scientists take on how that setup is and what they like about it what they may may not like as much um it's kind of like the that acceptance curve where have like the early adopter late adopter side of things so some of our data scientists who enjoy learning new tools and trying new techniques were like really eager to try it out and really enjoyed it and I think one of our most positive feedbacks was he realized like an entire module he had created to track models and version them was like now obsolete because of like the artifact management of metlow he like oh this is so cool I don't have to like do all of this manage myself anymore so that was really cool some of like the less positive is that there's obviously a learning curve to U metaflow as well as kubernetes and lots of like random things that pop up particularly as you scale up um and so those things I think are a little harder for our data scientist because they don't have really context on like why these errors are happening go and then as far as the like production flow of like how we set that up in gitlab we design that hand inand with our data scientists and a few different meetings for like hey like are you okay with this like what do you want to have happen um and that's what eventually led to you know our kilab setup got it and then data scientists users rather any user today can directly run uh what python flow. Pi Argo workflows create and and it'll create the Argo workflow for that flow on kubernetes or the same Could Happen through gitlab right so it's either you created it interactively or through gitlab it's still the same or is there like a difference in like the environments name spaces whatever yeah so we did namespace it so um as we if you look here I we set the namespace to production so this both um puts it in a different metaflow name space which is project U yeah yeah project project concept within metlow but it's also a different name space within kubernetes because we wanted to have different limits so that we wouldn't have someone testing something out and like blow out the capacity of the cluster and then our production jobs couldn't schedule anything I see so when you yeah yeah go ahead go ahead so by default within Jupiter Hub if you do like an Argo workflows create it will create it within the develop El name space and you could like hardcode overwrite it if you wanted to but people don't really and then by default it deploys the prod with think G okay so the production name space has I guess better monitoring better um guard rails around it for better resource utilization or uh limited resource utilization so that it can run a production workload whereas Dev is kind of like you know experimentation oriented is that right I wouldn't really necessarily say it had any different kind of monitoring it's just um within kuber denes like name spaces are pretty isolated from each other and by Design and so by having a different name space enables us to isolate like the compute usage from the development side of the house to the production side of the house but it still uses like the same maniflow UI and slack Integrations and that kind of thing

Original Description

Tyler Potts is a Data Science Platform Lead at DTN. DTN leverages Metaflow with Kubernetes for building a pre-configured and collaborative Jupyterhub data science platform. This setup comprises of automated pipelines that facilitate seamless deployment from Gitlab to Argo-workflows. These pipelines ensure that workflows are source-controlled, schedulable and effortlessly redeployed. Kubecost is used to track the costs of flows that are utilizing the @project decorator. Discover more such stories at slack.outerbounds.co

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Playlist UU5h8Ji6Lm1RyAZopnCpDq7Q · Outerbounds · 45 of 60

← Previous Next →

Metaflow GUI for monitoring machine learning workflows

Metaflow GUI for monitoring machine learning workflows

Metaflow Cards [no sound]

Metaflow Cards [no sound]

Fireside chat #1: How to Produce Sustainable Business Value with Machine Learning

Fireside chat #1: How to Produce Sustainable Business Value with Machine Learning

Fireside chat #2: MadeWithML.com -- Teaching Practical Machine Learning

Fireside chat #2: MadeWithML.com -- Teaching Practical Machine Learning

Metaflow on Kubernetes and Argo Workflows [no sound]

Metaflow on Kubernetes and Argo Workflows [no sound]

Fireside chat #3: Reasonable Scale Machine Learning -- You're not Google and it's totally OK

Fireside chat #3: Reasonable Scale Machine Learning -- You're not Google and it's totally OK

Metaflow Tags: Programmatic Tagging

Metaflow Tags: Programmatic Tagging

Metaflow Tags: Basic Tagging

Metaflow Tags: Basic Tagging

Metaflow Tags: Tags in CI/CD

Metaflow Tags: Tags in CI/CD

Metaflow Tags: Tags and Namespaces

Metaflow Tags: Tags and Namespaces

Metaflow Tags: Tags and Continuous Training

Metaflow Tags: Tags and Continuous Training

Fireside chat #4: Machine Learning and User Experience -- Building ML Products for People

Fireside chat #4: Machine Learning and User Experience -- Building ML Products for People

Fireside Chat #5: Machine Learning + Infrastructure for Humans

Fireside Chat #5: Machine Learning + Infrastructure for Humans

Metaflow Sandbox Demo: Free Data Science Infrastructure In the Browser

Metaflow Sandbox Demo: Free Data Science Infrastructure In the Browser

Metaflow on Azure

Metaflow on Azure

Fireside Chat #6: Operationalizing ML -- Patterns and Pain Points from MLOps Practitioners

Fireside Chat #6: Operationalizing ML -- Patterns and Pain Points from MLOps Practitioners

ML engineering vs traditional software engineering: similarities and differences

ML engineering vs traditional software engineering: similarities and differences

Why data scientists love and hate notebooks: velocity and validation

Why data scientists love and hate notebooks: velocity and validation

What even is a 10x ML engineer?

What even is a 10x ML engineer?

The 4 main tasks in the production ML lifecycle

The 4 main tasks in the production ML lifecycle

Is the premise of data-centric AI flawed?

Is the premise of data-centric AI flawed?

The 3 factors that Determine the success of ML projects

The 3 factors that Determine the success of ML projects

Fireside Chat #7: How to Build an Enterprise Machine Learning Platform from Scratch

Fireside Chat #7: How to Build an Enterprise Machine Learning Platform from Scratch

Run Metaflow on any cloud: Google Cloud, Azure, or AWS [no sound]

Run Metaflow on any cloud: Google Cloud, Azure, or AWS [no sound]

Metaflow on GCP

Metaflow on GCP

Fireside Chat #8: Navigating the Full Stack of Machine Learning

Fireside Chat #8: Navigating the Full Stack of Machine Learning

How to Build a Full-Stack Recommender System

How to Build a Full-Stack Recommender System

Modernize your Airflow deployments with Metaflow - zero-cost migration [no sound]

Modernize your Airflow deployments with Metaflow - zero-cost migration [no sound]

Easy Airflow DAGs for ML and data science with Metaflow [no sound]

Easy Airflow DAGs for ML and data science with Metaflow [no sound]

Fireside chat #9: Language Processing: From Prototype to Production

Fireside chat #9: Language Processing: From Prototype to Production

How to build end-to-end recommender systems at reasonable scale

How to build end-to-end recommender systems at reasonable scale

Full-Stack Machine Learning with Metaflow on CoRise

Full-Stack Machine Learning with Metaflow on CoRise

Natural Language Processing meets MLOps

Natural Language Processing meets MLOps

Fireside Chat #10: Large Language Models: Beyond Proofs of Concept

Fireside Chat #10: Large Language Models: Beyond Proofs of Concept

What even are Large Language Models?

What even are Large Language Models?

How to get started with LLMs today

How to get started with LLMs today

LLMs in production

LLMs in production

Accessing secrets securely in Metaflow [no audio]

Accessing secrets securely in Metaflow [no audio]

Fireside Chat #11: The Open-Source Modern Data Stack

Fireside Chat #11: The Open-Source Modern Data Stack

Fireside chat #12: Kubernetes for Data Scientists

Fireside chat #12: Kubernetes for Data Scientists

Behind the Screen: How Amazon Prime Video ships RecSys models 4x faster

Behind the Screen: How Amazon Prime Video ships RecSys models 4x faster

Fireside chat #13: Supply Chain Security in Machine Learning

Fireside chat #13: Supply Chain Security in Machine Learning

Quick Delivery, Quicker ML: DeliveryHero's Metaflow Story

Quick Delivery, Quicker ML: DeliveryHero's Metaflow Story

Crafting General Intelligence: LLM Fine-tuning with Metaflow at Adept.ai

Crafting General Intelligence: LLM Fine-tuning with Metaflow at Adept.ai

Fuelling Decisions: How DTN Powers Gas Pricing and Data Science Collaboration

Fuelling Decisions: How DTN Powers Gas Pricing and Data Science Collaboration

From Kitchen to Doorstep: Optimizing Data Science Velocity at Deliveroo

From Kitchen to Doorstep: Optimizing Data Science Velocity at Deliveroo

Building a GenAI Ready ML Platform with Metaflow at Autodesk

Building a GenAI Ready ML Platform with Metaflow at Autodesk

Media Transcoding for 10 Million users and beyond with Metaflow at Epignosis

Media Transcoding for 10 Million users and beyond with Metaflow at Epignosis

Telematics with Metaflow: How Nirvana Insurance built a large-scale Risk Estimation platform

Telematics with Metaflow: How Nirvana Insurance built a large-scale Risk Estimation platform

Fireside chat #14: Generative AI and Machine Learning for Film, TV, and Gaming

Fireside chat #14: Generative AI and Machine Learning for Film, TV, and Gaming

The Past, Present, and Future of Generative AI

The Past, Present, and Future of Generative AI

Building Production Systems with Generative AI, Machine Learning, and Data

Building Production Systems with Generative AI, Machine Learning, and Data

A Custom Fine-Tuned LLM in Action (LLMs, RAG, and Fine-Tuning: An Interactive Guided Tour Part 5)

A Custom Fine-Tuned LLM in Action (LLMs, RAG, and Fine-Tuning: An Interactive Guided Tour Part 5)

Building Live Production Systems with RAG (LLMs & RAG: An Interactive Guided Tour Part 4)

Building Live Production Systems with RAG (LLMs & RAG: An Interactive Guided Tour Part 4)

Better Relevancy with RAG (LLMs, RAG, and Fine-Tuning: An Interactive Guided Tour Part 3)

Better Relevancy with RAG (LLMs, RAG, and Fine-Tuning: An Interactive Guided Tour Part 3)

Working with OSS LLMs (LLMs, RAG, and Fine-Tuning: An Interactive Guided Tour Part 2)

Working with OSS LLMs (LLMs, RAG, and Fine-Tuning: An Interactive Guided Tour Part 2)

Hitting OpenAI and Other Vendor APIs (LLMs, RAG, and Fine-Tuning: An Interactive Guided Tour Part 1)

Hitting OpenAI and Other Vendor APIs (LLMs, RAG, and Fine-Tuning: An Interactive Guided Tour Part 1)

Production Systems with Generative AI (LLMs, RAG, & Fine-Tuning: An Interactive Guided Tour Part 0)

Production Systems with Generative AI (LLMs, RAG, & Fine-Tuning: An Interactive Guided Tour Part 0)

LLMs in Practice: A Guide to Recent Trends and Techniques

LLMs in Practice: A Guide to Recent Trends and Techniques

Metaflow for distributed high-performance computing and large-scale AI training

Metaflow for distributed high-performance computing and large-scale AI training

DTN's data science platform leverages Metaflow with Kubernetes to facilitate collaboration and automated pipelines, enabling seamless deployment and cost tracking. This setup allows data scientists to focus on building models and driving business decisions. By using Kubecost, DTN can track the costs of flows and optimize resource utilization.

Key Takeaways

Set up Metaflow with Kubernetes
Configure Jupyterhub for collaboration
Automate pipelines using Argo-workflows
Deploy workflows from Gitlab
Track costs with Kubecost
Optimize resource utilization

💡 Automating pipelines and tracking costs can significantly improve the efficiency and effectiveness of data science collaboration and workflow deployment.

🔒 Pro feature: Ask AI to explain this lesson →

More on: AI Workflow Automation

View skill →

Framer Tutorial: Build a Shopify-integrated Website

Framer Tutorial: Build a Shopify-integrated Website

NEW AI PC Build - Live Stream

NEW AI PC Build - Live Stream

Vertex Pipelines: Qwik Start

How to Run n8n Locally (Full On-Premise Setup Tutorial)

How to Run n8n Locally (Full On-Premise Setup Tutorial)

NetworkChuck (2)

Cloud Composer: Copying BigQuery Tables Across Different Locations

Houdini Procedural Modeling: Advanced Projects

Houdini Procedural Modeling: Advanced Projects

Related AI Lessons

How to prepare TIC teacher exams in Spain with AI (oposiciones 2026)

Prepare for TIC teacher exams in Spain using AI with these actionable steps

Why I built a simple AI provider wrapper (and you might too)

Learn why a simple AI provider wrapper is useful and how to build one for streamlined AI integration

Dev.to · zhongqiyue

This ChatGPT Prompt Replaced 3 Hours of PowerPoint Work

Learn to generate pitch-ready presentation decks in 5 minutes using ChatGPT, replacing hours of manual work

This ChatGPT Prompt Replaced 3 Hours of PowerPoint Work

Learn to generate pitch-ready presentation decks in 5 minutes using ChatGPT, replacing hours of manual work

Medium · ChatGPT

AI in Care - Katie Furey, Pairly.com

The Access Group