Before you scale: A guide to Cloud Run cost optimization
Skills:
RAG Basics60%
Key Takeaways
This video provides a comprehensive guide to Cloud Run cost optimization, covering techniques such as setting max instances, using authentication, and creating budget alerts to prevent billing surprises and reduce costs. It also explores the use of the Cloud Console's optimization page and Cost Explorer to identify cost drivers and optimize resource allocation.
Full Transcript
In our last video, Mitchell, you showed us how to pick the right Cloud Run building model and how to estimate costs for a new Cloud app that uses Cloud Run. >> Yeah, that was fun. >> I enjoyed it, too. Uh, what about optimizing Cloud Run service that's already been deployed? >> Let me show you how to do it. Welcome to the show, Mitchell. What do you do here at Google? >> I'm an engineering manager on Cloud Run. My team focuses on the cost and performance of Cloud Run serving infrastructure. >> It sounds like you are just the right person to ask about cost optimization in Cloud Run. So, I've used Cloud Run a lot over the years, but to be honest, I haven't really dug into the billing. Let's say I'm running an application that uses Cloud Run. How can I save on my cloud bill? >> So, here's how I think about it. First, you want to prevent billing surprises. And second, you want to optimize the cost of your running service. >> Okay, I've heard from many developers who are concerned about the first item on your list, uh, billing surprises. >> That makes sense. Any system has expected traffic where everything will work as normal and unexpected traffic beyond that. >> Okay. >> So let's say your system runs on traditional servers. You might have a web server and a database server. If you get more traffic than either of those servers can handle, some users will be turned away. >> Got it. Uh but serverless is a different story, right? >> Yes. With serverless computing like Cloud Run, you can choose what should happen when traffic increases. You can turn away users like a traditional server-based system or you can scale up and handle the extra traffic. >> In what situations would I want to scale up? >> Well, let's say you're running an online store. Then you don't want to turn customers away, >> right? I won't say no to more revenue. But what if I don't want my system to scale up and pay for that scaling? >> Well, that's easy. You can set max instances to a number like two. That means that Cloud Run will prevent your service from scaling up beyond two instances. If you get more traffic than those two instances can handle, users will be turned away. >> That sounds useful. Is there some way of allowing surges in traffic from real users, but not from attackers? >> Yes, there is, but it requires a little more work. You can require requests to your Cloudr run service to be authenticated. You can use identityware proxy, identity platform or firebase authentication for that. >> Uh but what about if my cloud run service supports anonymous users who don't log in? >> Then you can use Firebase app check. Another option is to use cloud armor and a load balancer. Cloud Armor is a web application firewall. So you can set rules for what kind of traffic to allow. For example, you can set rate limits so one group of clients can't exhaust your system resources. You can also use cloud armor to stop common attacks like SQL injection or cross-side scripting or to block bot traffic. >> Very good. Now, let's say there's a large increase in traffic and in my bill, I want to know so I can take action. >> Yes. And that's where budget alerts come in. You can trigger email or pub sub alerts when Google predicts that your cloud bill will hit a certain dollar amount. Or you can set it to trigger if your cost will be larger than the previous month's bill by a certain percentage. >> Good stuff. Uh that will help me sleep better at night and worry less about cost overruns. The second item on your list is how to optimize the cost for running service. So to be honest, the the cloud run cost in my web apps is less than 10% of the total bill. So I haven't really worried about it. >> Well, that's true of your application and many others, but everybody's application is different. We should still talk about how to optimize an existing Cloudr run service. >> Fair enough. How do I do that? >> So in the cloud console, go to cloud hub and optimization. >> And this is a new page, right? >> Yeah, it was launched recently. You can see a lot of this information in the billing section of the console, too. But this page is built to help you optimize your application. Here you can see the trend in total cost over time for all products in your project, not just Cloud Run. The total is broken down in the cost and utilization section down here. >> Got it? >> So I'll click the link view details and cost explorer. And here's a more detailed breakdown. Cloud Run services cost this customer $370 in the last 30 days and Cloudr Run jobs cost this customer $157. Down here we see that the Cloudr run cost has increased by 3% while the cloud logging cost has dropped by 1%. If there are big jumps in these numbers, you should investigate. And over here the cost trends are broken down further. It looks like the service called API drives most of the cloudr run cost. H I see >> I can switch over to this vCPU utilization view. This table shows how much virtual CPU different Cloudr run services are using. The Discordbot service is only using 2% of its allocated CPU. This customer may be able to save some money by allocating fewer CPUs to it. >> Got it. And the customer is paying for memory too, right? >> Yes, they are. Let's check the memory utilization report. It looks like this animated WEBP service is only using half a percent of the memory allocated to it. The customer could probably save some money by allocating less memory to it. >> Nice. Uh but you said that the service called API is driving most of the cloud run bill here. >> Yes, it is. Let me click it to get more details about that service in particular. That's a lot of additional data. >> Yes, there is. I'll scroll down to the charts for CPU and memory utilization. The instances don't use a lot of CPU or memory, so we should consider raising the concurrency limit. That way, each instance will take on more requests, and you'll need fewer of them, and you will pay less. By default, this is set to 80, but some services may be able to handle even more than that. But if I raise the concurrency limit too high, won't my system grind it to walt? If your workload is CPUbound, it's safe to have a fairly high concurrency because Cloud Run will start new instances if your CPU is working too hard. Once you've tuned concurrency, you should check that your Cloud Run instances have the right amount of memory. Let's check the memory utilization over on the right. If your instances use very little memory, consider lowering the memory allocated to each instance. That will save you money. Got it. I guess I want high CPU and high memory utilization. >> That's right. Finally, aside from tuning, there's one more way to save money. Compute flexible committed use discounts. If you have predictable traffic, you can enter into a contract with Google that you will use a certain amount of processing and you get it at a lower price. By the time you watch this video, there may be automated recommendations in the cloud console for this. >> So, you'd commit to using a certain dollar amounts worth of Cloud Run per hour. >> Actually, you commit to spend a certain amount per hour per region across Cloud Run, Compute Engine, or Kubernetes Engine. So if you use less of one, you can use more of another. >> All right, that was a lot of useful information. Uh could you recap it for us, Mitchell? Sure. First, prevent billing surprises with max instances, authentication, firebased app check or cloud armor. Also create budget alerts. Second, use the optimization report to find out what drives cost and which services have low utilization. allocate fewer resources to them. Also, don't be afraid to set your concurrency limit high and consider using committed use discounts. >> Very useful. I knew some of this, but far from all. Uh, thank you for sharing this with us, Mitchell. >> Thanks for having me. >> And thank you everyone for watching. If you have any questions for Mitchell or me, please let us know in the comments. Also, do let me know what you thought of today's episode. I read every single comment. Until next time.
Original Description
Go to the Optimization Hub → https://goo.gle/4lf5KV3
Mitchell Slep (Engineering Manager, Cloud Run) joins Martin Omander to walk through the practical mechanics of Cloud Run cost optimization. This deep dive moves past basic billing models into technical configurations for tuning active services.
Key technical takeaways:
1️⃣ Preventing overruns: Implementing max instances, using Cloud Armor for rate limiting, and setting up budget alerts to handle unexpected traffic spikes or attacks.
2️⃣ Utilization tuning: Using the Cloud Hub Optimization report to identify services with low CPU and memory utilization—like a bot using only 2% of its allocated resources.
3️⃣ Concurrency settings: Why increasing the concurrency limit (default 80) can reduce your instance count and overall bill without stalling your system.
4️⃣ Committed Use Discounts (CUDs): How to leverage flexible spend commitments across Cloud Run, GKE, and Compute Engine.
Start saving money today!
Chapters:
0:00 - Intro
1:10 -Preventing billing surprises
3:46 - Optimizing the cost of a running service
8:07 - Recap
Watch more Serverless Expeditions → https://goo.gle/ServerlessExpeditions
🔔 Subscribe to Google Cloud Tech → https://goo.gle/GoogleCloudTech
#Serverless #GoogleCloud
Speakers: Martin Omander, Mitchell Slep
Products Mentioned: Cloud Armor, Cloud Run, Committed Use Discounts
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Google Cloud Tech · Google Cloud Tech · 0 of 60
← Previous
Next →
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
I’m going for it #GoogleCloudCertified
Google Cloud Tech
I had to get #GoogleCloudCertified
Google Cloud Tech
Be better overall at what you do #GoogleCloudCertified
Google Cloud Tech
Cloud Monitoring on our radar #Analysis #Uptime
Google Cloud Tech
Introduction to Generative AI Studio
Google Cloud Tech
How to use Github Actions with Google's Workload Identity Federation
Google Cloud Tech
Introduction to Responsible AI
Google Cloud Tech
Networking updates and CDMC-certified architecture
Google Cloud Tech
Create and use a Cloud Storage bucket
Google Cloud Tech
How to digitize text from documents
Google Cloud Tech
Faster analytical queries with AlloyDB
Google Cloud Tech
Next ‘23 sessions and FaaS Wave
Google Cloud Tech
Introduction to Assured Open Source Software
Google Cloud Tech
BigQuery Cost Optimization: Storage
Google Cloud Tech
BigQuery Cost Optimization: Compute
Google Cloud Tech
BigQuery Cost Optimization: Select Queries
Google Cloud Tech
Remote Field Equipment Management with Manufacturing Data Engine
Google Cloud Tech
Supercharging your applications with Cloud SQL Enterprise Plus
Google Cloud Tech
Vector Support on our radar #GenAI
Google Cloud Tech
Architecting a blockchain startup with Google Cloud
Google Cloud Tech
Kubernetes and multitasking updates!
Google Cloud Tech
GKE: Using Kubernetes Events
Google Cloud Tech
How to configure firewall rules for Cloud Composer
Google Cloud Tech
Vertex AI Embeddings API + Matching Engine: Grounding LLMs made easy
Google Cloud Tech
Geospatial analytics on our radar #EarthEngine #BigQuery
Google Cloud Tech
Ensuring requests are set in Kubernetes
Google Cloud Tech
Cloud Next 2023, Google research program, and more!
Google Cloud Tech
How to migrate projects between organizations with Resource Manager
Google Cloud Tech
How to run #MySQL in Google Cloud
Google Cloud Tech
#GenerativeAI for enterprises and #Next2023
Google Cloud Tech
How Google Photos scales to store 4 trillion photos and videos
Google Cloud Tech
Google Cross-Cloud Interconnect (Demo 2)
Google Cloud Tech
GKE Cost Optimization Golden Signals: Introduction
Google Cloud Tech
GKE Cost Optimization Golden Signals: Workload Rightsizing
Google Cloud Tech
GKE Load Balancing: Overview
Google Cloud Tech
GKE Load Balancing: Best Practices
Google Cloud Tech
Disaster Recovery in GKE
Google Cloud Tech
How to configure IP masquerade agent in GKE Standard clusters
Google Cloud Tech
Enable and use GKE Control plane logs
Google Cloud Tech
Compliance in Australia with Assured Workloads
Google Cloud Tech
Creating budgets and budget alerts in Google Cloud #FinOps
Google Cloud Tech
Cloud SQL Enterprise Plus on our radar #mySQL
Google Cloud Tech
What's Next for Google Cloud?
Google Cloud Tech
How Loveholidays scaled with Contact Center AI
Google Cloud Tech
What is fleet team management in GKE?
Google Cloud Tech
Troubleshoot VPC Network Peering
Google Cloud Tech
Introduction to DocAI and Contact Center AI
Google Cloud Tech
Cloud Run Direct VPC egress explained
Google Cloud Tech
Database deployment options in GKE
Google Cloud Tech
Analyze cloud billing data with #BigQuery
Google Cloud Tech
Tips to becoming a world-class Prompt Engineer
Google Cloud Tech
Serverless is simple. Do I need CI/CD?
Google Cloud Tech
Accelerating model deployment with MLOps
Google Cloud Tech
How Hawaii's Department of Human Services scaled with CCAI
Google Cloud Tech
Pricing API on our #Radar
Google Cloud Tech
How Recommendations AI for Media can boost customer retention
Google Cloud Tech
Troubleshooting: Node Not Ready Status
Google Cloud Tech
One weekend until Cloud Next 2023!
Google Cloud Tech
#GoogleCloudNext starts tomorrow!
Google Cloud Tech
#GoogleCloudNext will be demand!
Google Cloud Tech
More on: RAG Basics
View skill →Related Reads
📰
📰
📰
📰
Salam is a general-purpose and systems programming language designed for efficient software development
Dev.to · John Bampton
The Performance Illusion
Medium · Programming
Write-Intensive Systems: Key Challenges in Distributed Systems
Dev.to · Mohammad Quanit
Started a Visual System Design & AI Infrastructure Channel
Dev.to · Jaswanth
Chapters (4)
Intro
1:10
Preventing billing surprises
3:46
Optimizing the cost of a running service
8:07
Recap
🎓
Tutor Explanation
DeepCamp AI