Before you scale: A guide to Cloud Run cost optimization

Google Cloud Tech · Intermediate ·🏗️ Systems Design & Architecture ·4mo ago
Skills: RAG Basics60%

Key Takeaways

This video provides a comprehensive guide to Cloud Run cost optimization, covering techniques such as setting max instances, using authentication, and creating budget alerts to prevent billing surprises and reduce costs. It also explores the use of the Cloud Console's optimization page and Cost Explorer to identify cost drivers and optimize resource allocation.

Full Transcript

In our last video, Mitchell, you showed us how to pick the right Cloud Run building model and how to estimate costs for a new Cloud app that uses Cloud Run. >> Yeah, that was fun. >> I enjoyed it, too. Uh, what about optimizing Cloud Run service that's already been deployed? >> Let me show you how to do it. Welcome to the show, Mitchell. What do you do here at Google? >> I'm an engineering manager on Cloud Run. My team focuses on the cost and performance of Cloud Run serving infrastructure. >> It sounds like you are just the right person to ask about cost optimization in Cloud Run. So, I've used Cloud Run a lot over the years, but to be honest, I haven't really dug into the billing. Let's say I'm running an application that uses Cloud Run. How can I save on my cloud bill? >> So, here's how I think about it. First, you want to prevent billing surprises. And second, you want to optimize the cost of your running service. >> Okay, I've heard from many developers who are concerned about the first item on your list, uh, billing surprises. >> That makes sense. Any system has expected traffic where everything will work as normal and unexpected traffic beyond that. >> Okay. >> So let's say your system runs on traditional servers. You might have a web server and a database server. If you get more traffic than either of those servers can handle, some users will be turned away. >> Got it. Uh but serverless is a different story, right? >> Yes. With serverless computing like Cloud Run, you can choose what should happen when traffic increases. You can turn away users like a traditional server-based system or you can scale up and handle the extra traffic. >> In what situations would I want to scale up? >> Well, let's say you're running an online store. Then you don't want to turn customers away, >> right? I won't say no to more revenue. But what if I don't want my system to scale up and pay for that scaling? >> Well, that's easy. You can set max instances to a number like two. That means that Cloud Run will prevent your service from scaling up beyond two instances. If you get more traffic than those two instances can handle, users will be turned away. >> That sounds useful. Is there some way of allowing surges in traffic from real users, but not from attackers? >> Yes, there is, but it requires a little more work. You can require requests to your Cloudr run service to be authenticated. You can use identityware proxy, identity platform or firebase authentication for that. >> Uh but what about if my cloud run service supports anonymous users who don't log in? >> Then you can use Firebase app check. Another option is to use cloud armor and a load balancer. Cloud Armor is a web application firewall. So you can set rules for what kind of traffic to allow. For example, you can set rate limits so one group of clients can't exhaust your system resources. You can also use cloud armor to stop common attacks like SQL injection or cross-side scripting or to block bot traffic. >> Very good. Now, let's say there's a large increase in traffic and in my bill, I want to know so I can take action. >> Yes. And that's where budget alerts come in. You can trigger email or pub sub alerts when Google predicts that your cloud bill will hit a certain dollar amount. Or you can set it to trigger if your cost will be larger than the previous month's bill by a certain percentage. >> Good stuff. Uh that will help me sleep better at night and worry less about cost overruns. The second item on your list is how to optimize the cost for running service. So to be honest, the the cloud run cost in my web apps is less than 10% of the total bill. So I haven't really worried about it. >> Well, that's true of your application and many others, but everybody's application is different. We should still talk about how to optimize an existing Cloudr run service. >> Fair enough. How do I do that? >> So in the cloud console, go to cloud hub and optimization. >> And this is a new page, right? >> Yeah, it was launched recently. You can see a lot of this information in the billing section of the console, too. But this page is built to help you optimize your application. Here you can see the trend in total cost over time for all products in your project, not just Cloud Run. The total is broken down in the cost and utilization section down here. >> Got it? >> So I'll click the link view details and cost explorer. And here's a more detailed breakdown. Cloud Run services cost this customer $370 in the last 30 days and Cloudr Run jobs cost this customer $157. Down here we see that the Cloudr run cost has increased by 3% while the cloud logging cost has dropped by 1%. If there are big jumps in these numbers, you should investigate. And over here the cost trends are broken down further. It looks like the service called API drives most of the cloudr run cost. H I see >> I can switch over to this vCPU utilization view. This table shows how much virtual CPU different Cloudr run services are using. The Discordbot service is only using 2% of its allocated CPU. This customer may be able to save some money by allocating fewer CPUs to it. >> Got it. And the customer is paying for memory too, right? >> Yes, they are. Let's check the memory utilization report. It looks like this animated WEBP service is only using half a percent of the memory allocated to it. The customer could probably save some money by allocating less memory to it. >> Nice. Uh but you said that the service called API is driving most of the cloud run bill here. >> Yes, it is. Let me click it to get more details about that service in particular. That's a lot of additional data. >> Yes, there is. I'll scroll down to the charts for CPU and memory utilization. The instances don't use a lot of CPU or memory, so we should consider raising the concurrency limit. That way, each instance will take on more requests, and you'll need fewer of them, and you will pay less. By default, this is set to 80, but some services may be able to handle even more than that. But if I raise the concurrency limit too high, won't my system grind it to walt? If your workload is CPUbound, it's safe to have a fairly high concurrency because Cloud Run will start new instances if your CPU is working too hard. Once you've tuned concurrency, you should check that your Cloud Run instances have the right amount of memory. Let's check the memory utilization over on the right. If your instances use very little memory, consider lowering the memory allocated to each instance. That will save you money. Got it. I guess I want high CPU and high memory utilization. >> That's right. Finally, aside from tuning, there's one more way to save money. Compute flexible committed use discounts. If you have predictable traffic, you can enter into a contract with Google that you will use a certain amount of processing and you get it at a lower price. By the time you watch this video, there may be automated recommendations in the cloud console for this. >> So, you'd commit to using a certain dollar amounts worth of Cloud Run per hour. >> Actually, you commit to spend a certain amount per hour per region across Cloud Run, Compute Engine, or Kubernetes Engine. So if you use less of one, you can use more of another. >> All right, that was a lot of useful information. Uh could you recap it for us, Mitchell? Sure. First, prevent billing surprises with max instances, authentication, firebased app check or cloud armor. Also create budget alerts. Second, use the optimization report to find out what drives cost and which services have low utilization. allocate fewer resources to them. Also, don't be afraid to set your concurrency limit high and consider using committed use discounts. >> Very useful. I knew some of this, but far from all. Uh, thank you for sharing this with us, Mitchell. >> Thanks for having me. >> And thank you everyone for watching. If you have any questions for Mitchell or me, please let us know in the comments. Also, do let me know what you thought of today's episode. I read every single comment. Until next time.

Original Description

Go to the Optimization Hub → https://goo.gle/4lf5KV3 Mitchell Slep (Engineering Manager, Cloud Run) joins Martin Omander to walk through the practical mechanics of Cloud Run cost optimization. This deep dive moves past basic billing models into technical configurations for tuning active services. Key technical takeaways: 1️⃣ Preventing overruns: Implementing max instances, using Cloud Armor for rate limiting, and setting up budget alerts to handle unexpected traffic spikes or attacks. 2️⃣ Utilization tuning: Using the Cloud Hub Optimization report to identify services with low CPU and memory utilization—like a bot using only 2% of its allocated resources. 3️⃣ Concurrency settings: Why increasing the concurrency limit (default 80) can reduce your instance count and overall bill without stalling your system. 4️⃣ Committed Use Discounts (CUDs): How to leverage flexible spend commitments across Cloud Run, GKE, and Compute Engine. Start saving money today! Chapters: 0:00 - Intro 1:10 -Preventing billing surprises 3:46 - Optimizing the cost of a running service 8:07 - Recap Watch more Serverless Expeditions → https://goo.gle/ServerlessExpeditions 🔔 Subscribe to Google Cloud Tech → https://goo.gle/GoogleCloudTech #Serverless #GoogleCloud Speakers: Martin Omander, Mitchell Slep Products Mentioned: Cloud Armor, Cloud Run, Committed Use Discounts
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Google Cloud Tech · Google Cloud Tech · 0 of 60

← Previous Next →
1 I’m going for it #GoogleCloudCertified
I’m going for it #GoogleCloudCertified
Google Cloud Tech
2 I had to get #GoogleCloudCertified
I had to get #GoogleCloudCertified
Google Cloud Tech
3 Be better overall at what you do #GoogleCloudCertified
Be better overall at what you do #GoogleCloudCertified
Google Cloud Tech
4 Cloud Monitoring on our radar #Analysis #Uptime
Cloud Monitoring on our radar #Analysis #Uptime
Google Cloud Tech
5 Introduction to Generative AI Studio
Introduction to Generative AI Studio
Google Cloud Tech
6 How to use Github Actions with Google's Workload Identity Federation
How to use Github Actions with Google's Workload Identity Federation
Google Cloud Tech
7 Introduction to Responsible AI
Introduction to Responsible AI
Google Cloud Tech
8 Networking updates and CDMC-certified architecture
Networking updates and CDMC-certified architecture
Google Cloud Tech
9 Create and use a Cloud Storage bucket
Create and use a Cloud Storage bucket
Google Cloud Tech
10 How to digitize text from documents
How to digitize text from documents
Google Cloud Tech
11 Faster analytical queries with AlloyDB
Faster analytical queries with AlloyDB
Google Cloud Tech
12 Next ‘23 sessions and FaaS Wave
Next ‘23 sessions and FaaS Wave
Google Cloud Tech
13 Introduction to Assured Open Source Software
Introduction to Assured Open Source Software
Google Cloud Tech
14 BigQuery Cost Optimization: Storage
BigQuery Cost Optimization: Storage
Google Cloud Tech
15 BigQuery Cost Optimization: Compute
BigQuery Cost Optimization: Compute
Google Cloud Tech
16 BigQuery Cost Optimization: Select Queries
BigQuery Cost Optimization: Select Queries
Google Cloud Tech
17 Remote Field Equipment Management with Manufacturing Data Engine
Remote Field Equipment Management with Manufacturing Data Engine
Google Cloud Tech
18 Supercharging your applications with Cloud SQL Enterprise Plus
Supercharging your applications with Cloud SQL Enterprise Plus
Google Cloud Tech
19 Vector Support on our radar #GenAI
Vector Support on our radar #GenAI
Google Cloud Tech
20 Architecting a blockchain startup with Google Cloud
Architecting a blockchain startup with Google Cloud
Google Cloud Tech
21 Kubernetes and multitasking updates!
Kubernetes and multitasking updates!
Google Cloud Tech
22 GKE: Using Kubernetes Events
GKE: Using Kubernetes Events
Google Cloud Tech
23 How to configure firewall rules for Cloud Composer
How to configure firewall rules for Cloud Composer
Google Cloud Tech
24 Vertex AI Embeddings API + Matching Engine: Grounding LLMs made easy
Vertex AI Embeddings API + Matching Engine: Grounding LLMs made easy
Google Cloud Tech
25 Geospatial analytics on our radar #EarthEngine #BigQuery
Geospatial analytics on our radar #EarthEngine #BigQuery
Google Cloud Tech
26 Ensuring requests are set in Kubernetes
Ensuring requests are set in Kubernetes
Google Cloud Tech
27 Cloud Next 2023, Google research program, and more!
Cloud Next 2023, Google research program, and more!
Google Cloud Tech
28 How to migrate projects between organizations with Resource Manager
How to migrate projects between organizations with Resource Manager
Google Cloud Tech
29 How to run #MySQL in Google Cloud
How to run #MySQL in Google Cloud
Google Cloud Tech
30 #GenerativeAI for enterprises and #Next2023
#GenerativeAI for enterprises and #Next2023
Google Cloud Tech
31 How Google Photos scales to store 4 trillion photos and videos
How Google Photos scales to store 4 trillion photos and videos
Google Cloud Tech
32 Google Cross-Cloud Interconnect (Demo 2)
Google Cross-Cloud Interconnect (Demo 2)
Google Cloud Tech
33 GKE Cost Optimization Golden Signals: Introduction
GKE Cost Optimization Golden Signals: Introduction
Google Cloud Tech
34 GKE Cost Optimization Golden Signals: Workload Rightsizing
GKE Cost Optimization Golden Signals: Workload Rightsizing
Google Cloud Tech
35 GKE Load Balancing: Overview
GKE Load Balancing: Overview
Google Cloud Tech
36 GKE Load Balancing: Best Practices
GKE Load Balancing: Best Practices
Google Cloud Tech
37 Disaster Recovery in GKE
Disaster Recovery in GKE
Google Cloud Tech
38 How to configure IP masquerade agent in GKE Standard clusters
How to configure IP masquerade agent in GKE Standard clusters
Google Cloud Tech
39 Enable and use GKE Control plane logs
Enable and use GKE Control plane logs
Google Cloud Tech
40 Compliance in Australia with Assured Workloads
Compliance in Australia with Assured Workloads
Google Cloud Tech
41 Creating budgets and budget alerts in Google Cloud #FinOps
Creating budgets and budget alerts in Google Cloud #FinOps
Google Cloud Tech
42 Cloud SQL Enterprise Plus on our radar #mySQL
Cloud SQL Enterprise Plus on our radar #mySQL
Google Cloud Tech
43 What's Next for Google Cloud?
What's Next for Google Cloud?
Google Cloud Tech
44 How Loveholidays scaled with Contact Center AI
How Loveholidays scaled with Contact Center AI
Google Cloud Tech
45 What is fleet team management in GKE?
What is fleet team management in GKE?
Google Cloud Tech
46 Troubleshoot VPC Network Peering
Troubleshoot VPC Network Peering
Google Cloud Tech
47 Introduction to DocAI and Contact Center AI
Introduction to DocAI and Contact Center AI
Google Cloud Tech
48 Cloud Run Direct VPC egress explained
Cloud Run Direct VPC egress explained
Google Cloud Tech
49 Database deployment options in GKE
Database deployment options in GKE
Google Cloud Tech
50 Analyze cloud billing data with #BigQuery
Analyze cloud billing data with #BigQuery
Google Cloud Tech
51 Tips to becoming a world-class Prompt Engineer
Tips to becoming a world-class Prompt Engineer
Google Cloud Tech
52 Serverless is simple. Do I need CI/CD?
Serverless is simple. Do I need CI/CD?
Google Cloud Tech
53 Accelerating model deployment with MLOps
Accelerating model deployment with MLOps
Google Cloud Tech
54 How Hawaii's Department of Human Services scaled with CCAI
How Hawaii's Department of Human Services scaled with CCAI
Google Cloud Tech
55 Pricing API on our #Radar
Pricing API on our #Radar
Google Cloud Tech
56 How Recommendations AI for Media can boost customer retention
How Recommendations AI for Media can boost customer retention
Google Cloud Tech
57 Troubleshooting: Node Not Ready Status
Troubleshooting: Node Not Ready Status
Google Cloud Tech
58 One weekend until Cloud Next 2023!
One weekend until Cloud Next 2023!
Google Cloud Tech
59 #GoogleCloudNext starts tomorrow!
#GoogleCloudNext starts tomorrow!
Google Cloud Tech
60 #GoogleCloudNext will be demand!
#GoogleCloudNext will be demand!
Google Cloud Tech

This video teaches viewers how to optimize Cloud Run costs using various techniques such as setting max instances, using authentication, and creating budget alerts. It also covers the use of the Cloud Console's optimization page and Cost Explorer to identify cost drivers and optimize resource allocation. By following these techniques, viewers can prevent billing surprises and reduce their Cloud Run costs.

Key Takeaways
  1. Set max instances to a number to prevent scaling up and paying for it
  2. Turn away users when traffic increases
  3. Use authentication to allow surges in traffic from real users but not from attackers
  4. Use budget alerts to trigger email or pub sub alerts when Google predicts that your cloud bill will hit a certain dollar amount
  5. Optimize the cost for running service by using the Cloud Hub and Optimization page in the cloud console
  6. Click the link to view details and cost explorer
  7. Switch to the vCPU utilization view
  8. Check the memory utilization report
  9. Click on the service called API to get more details
  10. Scroll down to the charts for CPU and memory utilization
💡 Using the Cloud Console's optimization page and Cost Explorer can help identify cost drivers and optimize resource allocation, leading to significant cost savings.

Related Reads

📰
Salam is a general-purpose and systems programming language designed for efficient software development
Learn about Salam, a new general-purpose and systems programming language for efficient software development
Dev.to · John Bampton
📰
The Performance Illusion
Learn why web systems fail under real load despite good benchmarks and how to avoid this performance illusion
Medium · Programming
📰
Write-Intensive Systems: Key Challenges in Distributed Systems
Learn to tackle key challenges in distributed write-intensive systems, crucial for scalable IoT and big data applications
Dev.to · Mohammad Quanit
📰
Started a Visual System Design & AI Infrastructure Channel
Learn about a new YouTube channel focused on visual system design and AI infrastructure and how to apply these concepts in your own projects
Dev.to · Jaswanth

Chapters (4)

Intro
1:10 Preventing billing surprises
3:46 Optimizing the cost of a running service
8:07 Recap
Up next
WordPress vs Emdash | Demo & Explainer
Matt Tutorials
Watch →