Running Large Pipelines to Analyze Cloud Costs in Data and AI

Outerbounds · Beginner ·🏗️ Systems Design & Architecture ·2y ago
This video delves into the complexities of running large Metaflow flows at scale. It introduces "Metaflow for Metaflows" since the use-case discussed is about running a flow that analysis the costs of other flows. The presentation covers the strategy of collecting metadata from various flows to generate detailed cost reports and insights. Through the deployment of Metaflow's advanced features, such as branching, retries, scheduling, and project triggers, along with AWS services like IAM roles, S3, Glue, and Athena, a sophisticated framework is established. This framework is designed to manage network failures, optimize data queries, and efficiently allocate resources across AWS accounts. The talk addresses challenges encountered, including disk pressure, rate limit breaches, and the high costs associated with scanning large volumes of data through Athena. Solutions such as enhancing disk resources, optimizing query execution, and implementing caching mechanisms are discussed. The video further explores methodologies for cost allocation, resource utilization, and long-term data storage strategies, concluding with an overview of achieving operational scalability through these innovative approaches.
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Playlist UU5h8Ji6Lm1RyAZopnCpDq7Q · Outerbounds · 0 of 60

← Previous Next →
1 Metaflow GUI for monitoring machine learning workflows
Metaflow GUI for monitoring machine learning workflows
Outerbounds
2 Metaflow Cards [no sound]
Metaflow Cards [no sound]
Outerbounds
3 Fireside chat #1: How to Produce Sustainable Business Value with Machine Learning
Fireside chat #1: How to Produce Sustainable Business Value with Machine Learning
Outerbounds
4 Fireside chat #2: MadeWithML.com -- Teaching Practical Machine Learning
Fireside chat #2: MadeWithML.com -- Teaching Practical Machine Learning
Outerbounds
5 Metaflow on Kubernetes and Argo Workflows [no sound]
Metaflow on Kubernetes and Argo Workflows [no sound]
Outerbounds
6 Fireside chat #3: Reasonable Scale Machine Learning -- You're not Google and it's totally OK
Fireside chat #3: Reasonable Scale Machine Learning -- You're not Google and it's totally OK
Outerbounds
7 Metaflow Tags: Programmatic Tagging
Metaflow Tags: Programmatic Tagging
Outerbounds
8 Metaflow Tags: Basic Tagging
Metaflow Tags: Basic Tagging
Outerbounds
9 Metaflow Tags: Tags in CI/CD
Metaflow Tags: Tags in CI/CD
Outerbounds
10 Metaflow Tags: Tags and Namespaces
Metaflow Tags: Tags and Namespaces
Outerbounds
11 Metaflow Tags: Tags and Continuous Training
Metaflow Tags: Tags and Continuous Training
Outerbounds
12 Fireside chat #4: Machine Learning and User Experience -- Building ML Products for People
Fireside chat #4: Machine Learning and User Experience -- Building ML Products for People
Outerbounds
13 Fireside Chat #5: Machine Learning + Infrastructure for Humans
Fireside Chat #5: Machine Learning + Infrastructure for Humans
Outerbounds
14 Metaflow Sandbox Demo: Free Data Science Infrastructure In the Browser
Metaflow Sandbox Demo: Free Data Science Infrastructure In the Browser
Outerbounds
15 Metaflow on Azure
Metaflow on Azure
Outerbounds
16 Fireside Chat #6: Operationalizing ML -- Patterns and Pain Points from MLOps Practitioners
Fireside Chat #6: Operationalizing ML -- Patterns and Pain Points from MLOps Practitioners
Outerbounds
17 ML engineering vs traditional software engineering: similarities and differences
ML engineering vs traditional software engineering: similarities and differences
Outerbounds
18 Why data scientists love and hate notebooks: velocity and validation
Why data scientists love and hate notebooks: velocity and validation
Outerbounds
19 What even is a 10x ML engineer?
What even is a 10x ML engineer?
Outerbounds
20 The 4 main tasks in the production ML lifecycle
The 4 main tasks in the production ML lifecycle
Outerbounds
21 Is the premise of data-centric AI flawed?
Is the premise of data-centric AI flawed?
Outerbounds
22 The 3 factors that Determine the success of ML projects
The 3 factors that Determine the success of ML projects
Outerbounds
23 Fireside Chat #7: How to Build an Enterprise Machine Learning Platform from Scratch
Fireside Chat #7: How to Build an Enterprise Machine Learning Platform from Scratch
Outerbounds
24 Run Metaflow on any cloud: Google Cloud, Azure, or AWS [no sound]
Run Metaflow on any cloud: Google Cloud, Azure, or AWS [no sound]
Outerbounds
25 Metaflow on GCP
Metaflow on GCP
Outerbounds
26 Fireside Chat #8: Navigating the Full Stack of Machine Learning
Fireside Chat #8: Navigating the Full Stack of Machine Learning
Outerbounds
27 How to Build a Full-Stack Recommender System
How to Build a Full-Stack Recommender System
Outerbounds
28 Modernize your Airflow deployments with Metaflow - zero-cost migration [no sound]
Modernize your Airflow deployments with Metaflow - zero-cost migration [no sound]
Outerbounds
29 Easy Airflow DAGs for ML and data science with Metaflow [no sound]
Easy Airflow DAGs for ML and data science with Metaflow [no sound]
Outerbounds
30 Fireside chat #9:  Language Processing: From Prototype to Production
Fireside chat #9: Language Processing: From Prototype to Production
Outerbounds
31 How to build end-to-end recommender systems at reasonable scale
How to build end-to-end recommender systems at reasonable scale
Outerbounds
32 Full-Stack Machine Learning with Metaflow on CoRise
Full-Stack Machine Learning with Metaflow on CoRise
Outerbounds
33 Natural Language Processing meets MLOps
Natural Language Processing meets MLOps
Outerbounds
34 Fireside Chat #10: Large Language Models: Beyond Proofs of Concept
Fireside Chat #10: Large Language Models: Beyond Proofs of Concept
Outerbounds
35 What even are Large Language Models?
What even are Large Language Models?
Outerbounds
36 How to get started with LLMs today
How to get started with LLMs today
Outerbounds
37 LLMs in production
LLMs in production
Outerbounds
38 Accessing secrets securely in Metaflow [no audio]
Accessing secrets securely in Metaflow [no audio]
Outerbounds
39 Fireside Chat #11: The Open-Source Modern Data Stack
Fireside Chat #11: The Open-Source Modern Data Stack
Outerbounds
40 Fireside chat #12: Kubernetes for Data Scientists
Fireside chat #12: Kubernetes for Data Scientists
Outerbounds
41 Behind the Screen: How Amazon Prime Video ships RecSys models 4x faster
Behind the Screen: How Amazon Prime Video ships RecSys models 4x faster
Outerbounds
42 Fireside chat #13: Supply Chain Security in Machine Learning
Fireside chat #13: Supply Chain Security in Machine Learning
Outerbounds
43 Quick Delivery, Quicker ML: DeliveryHero's Metaflow Story
Quick Delivery, Quicker ML: DeliveryHero's Metaflow Story
Outerbounds
44 Crafting General Intelligence: LLM Fine-tuning with Metaflow at Adept.ai
Crafting General Intelligence: LLM Fine-tuning with Metaflow at Adept.ai
Outerbounds
45 Fuelling Decisions: How DTN Powers Gas Pricing and Data Science Collaboration
Fuelling Decisions: How DTN Powers Gas Pricing and Data Science Collaboration
Outerbounds
46 From Kitchen to Doorstep: Optimizing Data Science Velocity at Deliveroo
From Kitchen to Doorstep: Optimizing Data Science Velocity at Deliveroo
Outerbounds
47 Building a GenAI Ready ML Platform with Metaflow at Autodesk
Building a GenAI Ready ML Platform with Metaflow at Autodesk
Outerbounds
48 Media Transcoding for 10 Million users and beyond with Metaflow at Epignosis
Media Transcoding for 10 Million users and beyond with Metaflow at Epignosis
Outerbounds
49 Telematics with Metaflow: How Nirvana Insurance built a large-scale Risk Estimation platform
Telematics with Metaflow: How Nirvana Insurance built a large-scale Risk Estimation platform
Outerbounds
50 Fireside chat #14: Generative AI and Machine Learning for Film, TV, and Gaming
Fireside chat #14: Generative AI and Machine Learning for Film, TV, and Gaming
Outerbounds
51 The Past, Present, and Future of Generative AI
The Past, Present, and Future of Generative AI
Outerbounds
52 Building Production Systems with Generative AI, Machine Learning, and Data
Building Production Systems with Generative AI, Machine Learning, and Data
Outerbounds
53 A Custom Fine-Tuned LLM in Action (LLMs, RAG, and Fine-Tuning: An Interactive Guided Tour Part 5)
A Custom Fine-Tuned LLM in Action (LLMs, RAG, and Fine-Tuning: An Interactive Guided Tour Part 5)
Outerbounds
54 Building Live Production Systems with RAG (LLMs & RAG: An Interactive Guided Tour Part 4)
Building Live Production Systems with RAG (LLMs & RAG: An Interactive Guided Tour Part 4)
Outerbounds
55 Better Relevancy with RAG (LLMs, RAG, and Fine-Tuning: An Interactive Guided Tour Part 3)
Better Relevancy with RAG (LLMs, RAG, and Fine-Tuning: An Interactive Guided Tour Part 3)
Outerbounds
56 Working with OSS LLMs (LLMs, RAG, and Fine-Tuning: An Interactive Guided Tour Part 2)
Working with OSS LLMs (LLMs, RAG, and Fine-Tuning: An Interactive Guided Tour Part 2)
Outerbounds
57 Hitting OpenAI and Other Vendor APIs (LLMs, RAG, and Fine-Tuning: An Interactive Guided Tour Part 1)
Hitting OpenAI and Other Vendor APIs (LLMs, RAG, and Fine-Tuning: An Interactive Guided Tour Part 1)
Outerbounds
58 Production Systems with Generative AI (LLMs, RAG, & Fine-Tuning: An Interactive Guided Tour Part 0)
Production Systems with Generative AI (LLMs, RAG, & Fine-Tuning: An Interactive Guided Tour Part 0)
Outerbounds
59 LLMs in Practice: A Guide to Recent Trends and Techniques
LLMs in Practice: A Guide to Recent Trends and Techniques
Outerbounds
60 Metaflow for distributed high-performance computing and large-scale AI training
Metaflow for distributed high-performance computing and large-scale AI training
Outerbounds

Related AI Lessons

7 TypeScript Patterns I Use in Every Project
Learn 7 essential TypeScript patterns to improve your coding skills and apply them to your projects for better maintainability and scalability
Dev.to · Alex Chen
Unbounded Processes: The Hidden Cost of Always Saying Yes
Learn to identify and manage unbounded processes that can lead to system failure, and why saying no to certain requests is crucial for scalability
Dev.to · Khali Sollis
Developing network-based multiplayer games made easy
Learn to develop network-based multiplayer games easily using a lightweight server and framework
Medium · Programming
Errors as Infrastructure: Why the first crate in NEXUS wasn't networking.
Learn how to design a metadata-centric failure contract for distributed Rust environments and why error handling is crucial infrastructure
Dev.to · Anatolii Shliakhto
Up next
Oracle on Google Cloud | Oracle on Google Cloud Compute
Google Cloud
Watch →