Before you scale: A guide to Cloud Run cost optimization

Google Cloud Tech · Intermediate ·🏗️ Systems Design & Architecture ·4mo ago

Skills: RAG Basics60%

Key Takeaways

This video provides a comprehensive guide to Cloud Run cost optimization, covering techniques such as setting max instances, using authentication, and creating budget alerts to prevent billing surprises and reduce costs. It also explores the use of the Cloud Console's optimization page and Cost Explorer to identify cost drivers and optimize resource allocation.

Full Transcript

In our last video, Mitchell, you showed us how to pick the right Cloud Run building model and how to estimate costs for a new Cloud app that uses Cloud Run. >> Yeah, that was fun. >> I enjoyed it, too. Uh, what about optimizing Cloud Run service that's already been deployed? >> Let me show you how to do it. Welcome to the show, Mitchell. What do you do here at Google? >> I'm an engineering manager on Cloud Run. My team focuses on the cost and performance of Cloud Run serving infrastructure. >> It sounds like you are just the right person to ask about cost optimization in Cloud Run. So, I've used Cloud Run a lot over the years, but to be honest, I haven't really dug into the billing. Let's say I'm running an application that uses Cloud Run. How can I save on my cloud bill? >> So, here's how I think about it. First, you want to prevent billing surprises. And second, you want to optimize the cost of your running service. >> Okay, I've heard from many developers who are concerned about the first item on your list, uh, billing surprises. >> That makes sense. Any system has expected traffic where everything will work as normal and unexpected traffic beyond that. >> Okay. >> So let's say your system runs on traditional servers. You might have a web server and a database server. If you get more traffic than either of those servers can handle, some users will be turned away. >> Got it. Uh but serverless is a different story, right? >> Yes. With serverless computing like Cloud Run, you can choose what should happen when traffic increases. You can turn away users like a traditional server-based system or you can scale up and handle the extra traffic. >> In what situations would I want to scale up? >> Well, let's say you're running an online store. Then you don't want to turn customers away, >> right? I won't say no to more revenue. But what if I don't want my system to scale up and pay for that scaling? >> Well, that's easy. You can set max instances to a number like two. That means that Cloud Run will prevent your service from scaling up beyond two instances. If you get more traffic than those two instances can handle, users will be turned away. >> That sounds useful. Is there some way of allowing surges in traffic from real users, but not from attackers? >> Yes, there is, but it requires a little more work. You can require requests to your Cloudr run service to be authenticated. You can use identityware proxy, identity platform or firebase authentication for that. >> Uh but what about if my cloud run service supports anonymous users who don't log in? >> Then you can use Firebase app check. Another option is to use cloud armor and a load balancer. Cloud Armor is a web application firewall. So you can set rules for what kind of traffic to allow. For example, you can set rate limits so one group of clients can't exhaust your system resources. You can also use cloud armor to stop common attacks like SQL injection or cross-side scripting or to block bot traffic. >> Very good. Now, let's say there's a large increase in traffic and in my bill, I want to know so I can take action. >> Yes. And that's where budget alerts come in. You can trigger email or pub sub alerts when Google predicts that your cloud bill will hit a certain dollar amount. Or you can set it to trigger if your cost will be larger than the previous month's bill by a certain percentage. >> Good stuff. Uh that will help me sleep better at night and worry less about cost overruns. The second item on your list is how to optimize the cost for running service. So to be honest, the the cloud run cost in my web apps is less than 10% of the total bill. So I haven't really worried about it. >> Well, that's true of your application and many others, but everybody's application is different. We should still talk about how to optimize an existing Cloudr run service. >> Fair enough. How do I do that? >> So in the cloud console, go to cloud hub and optimization. >> And this is a new page, right? >> Yeah, it was launched recently. You can see a lot of this information in the billing section of the console, too. But this page is built to help you optimize your application. Here you can see the trend in total cost over time for all products in your project, not just Cloud Run. The total is broken down in the cost and utilization section down here. >> Got it? >> So I'll click the link view details and cost explorer. And here's a more detailed breakdown. Cloud Run services cost this customer $370 in the last 30 days and Cloudr Run jobs cost this customer $157. Down here we see that the Cloudr run cost has increased by 3% while the cloud logging cost has dropped by 1%. If there are big jumps in these numbers, you should investigate. And over here the cost trends are broken down further. It looks like the service called API drives most of the cloudr run cost. H I see >> I can switch over to this vCPU utilization view. This table shows how much virtual CPU different Cloudr run services are using. The Discordbot service is only using 2% of its allocated CPU. This customer may be able to save some money by allocating fewer CPUs to it. >> Got it. And the customer is paying for memory too, right? >> Yes, they are. Let's check the memory utilization report. It looks like this animated WEBP service is only using half a percent of the memory allocated to it. The customer could probably save some money by allocating less memory to it. >> Nice. Uh but you said that the service called API is driving most of the cloud run bill here. >> Yes, it is. Let me click it to get more details about that service in particular. That's a lot of additional data. >> Yes, there is. I'll scroll down to the charts for CPU and memory utilization. The instances don't use a lot of CPU or memory, so we should consider raising the concurrency limit. That way, each instance will take on more requests, and you'll need fewer of them, and you will pay less. By default, this is set to 80, but some services may be able to handle even more than that. But if I raise the concurrency limit too high, won't my system grind it to walt? If your workload is CPUbound, it's safe to have a fairly high concurrency because Cloud Run will start new instances if your CPU is working too hard. Once you've tuned concurrency, you should check that your Cloud Run instances have the right amount of memory. Let's check the memory utilization over on the right. If your instances use very little memory, consider lowering the memory allocated to each instance. That will save you money. Got it. I guess I want high CPU and high memory utilization. >> That's right. Finally, aside from tuning, there's one more way to save money. Compute flexible committed use discounts. If you have predictable traffic, you can enter into a contract with Google that you will use a certain amount of processing and you get it at a lower price. By the time you watch this video, there may be automated recommendations in the cloud console for this. >> So, you'd commit to using a certain dollar amounts worth of Cloud Run per hour. >> Actually, you commit to spend a certain amount per hour per region across Cloud Run, Compute Engine, or Kubernetes Engine. So if you use less of one, you can use more of another. >> All right, that was a lot of useful information. Uh could you recap it for us, Mitchell? Sure. First, prevent billing surprises with max instances, authentication, firebased app check or cloud armor. Also create budget alerts. Second, use the optimization report to find out what drives cost and which services have low utilization. allocate fewer resources to them. Also, don't be afraid to set your concurrency limit high and consider using committed use discounts. >> Very useful. I knew some of this, but far from all. Uh, thank you for sharing this with us, Mitchell. >> Thanks for having me. >> And thank you everyone for watching. If you have any questions for Mitchell or me, please let us know in the comments. Also, do let me know what you thought of today's episode. I read every single comment. Until next time.

Original Description

Go to the Optimization Hub → https://goo.gle/4lf5KV3 Mitchell Slep (Engineering Manager, Cloud Run) joins Martin Omander to walk through the practical mechanics of Cloud Run cost optimization. This deep dive moves past basic billing models into technical configurations for tuning active services. Key technical takeaways: 1️⃣ Preventing overruns: Implementing max instances, using Cloud Armor for rate limiting, and setting up budget alerts to handle unexpected traffic spikes or attacks. 2️⃣ Utilization tuning: Using the Cloud Hub Optimization report to identify services with low CPU and memory utilization—like a bot using only 2% of its allocated resources. 3️⃣ Concurrency settings: Why increasing the concurrency limit (default 80) can reduce your instance count and overall bill without stalling your system. 4️⃣ Committed Use Discounts (CUDs): How to leverage flexible spend commitments across Cloud Run, GKE, and Compute Engine. Start saving money today! Chapters: 0:00 - Intro 1:10 -Preventing billing surprises 3:46 - Optimizing the cost of a running service 8:07 - Recap Watch more Serverless Expeditions → https://goo.gle/ServerlessExpeditions 🔔 Subscribe to Google Cloud Tech → https://goo.gle/GoogleCloudTech #Serverless #GoogleCloud Speakers: Martin Omander, Mitchell Slep Products Mentioned: Cloud Armor, Cloud Run, Committed Use Discounts

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Google Cloud Tech · Google Cloud Tech · 0 of 60

← Previous Next →

I’m going for it #GoogleCloudCertified

I’m going for it #GoogleCloudCertified

Google Cloud Tech

I had to get #GoogleCloudCertified

I had to get #GoogleCloudCertified

Google Cloud Tech

Be better overall at what you do #GoogleCloudCertified

Be better overall at what you do #GoogleCloudCertified

Google Cloud Tech

Cloud Monitoring on our radar #Analysis #Uptime

Cloud Monitoring on our radar #Analysis #Uptime

Google Cloud Tech

Introduction to Generative AI Studio

Introduction to Generative AI Studio

Google Cloud Tech

How to use Github Actions with Google's Workload Identity Federation

How to use Github Actions with Google's Workload Identity Federation

Google Cloud Tech

Introduction to Responsible AI

Introduction to Responsible AI

Google Cloud Tech

Networking updates and CDMC-certified architecture

Networking updates and CDMC-certified architecture

Google Cloud Tech

Create and use a Cloud Storage bucket

Create and use a Cloud Storage bucket

Google Cloud Tech

How to digitize text from documents

How to digitize text from documents

Google Cloud Tech

Faster analytical queries with AlloyDB

Faster analytical queries with AlloyDB

Google Cloud Tech

Next ‘23 sessions and FaaS Wave

Next ‘23 sessions and FaaS Wave

Google Cloud Tech

Introduction to Assured Open Source Software

Introduction to Assured Open Source Software

Google Cloud Tech

BigQuery Cost Optimization: Storage

BigQuery Cost Optimization: Storage

Google Cloud Tech

BigQuery Cost Optimization: Compute

BigQuery Cost Optimization: Compute

Google Cloud Tech

BigQuery Cost Optimization: Select Queries

BigQuery Cost Optimization: Select Queries

Google Cloud Tech

Remote Field Equipment Management with Manufacturing Data Engine

Remote Field Equipment Management with Manufacturing Data Engine

Google Cloud Tech

Supercharging your applications with Cloud SQL Enterprise Plus

Supercharging your applications with Cloud SQL Enterprise Plus

Google Cloud Tech

Vector Support on our radar #GenAI

Vector Support on our radar #GenAI

Google Cloud Tech

Architecting a blockchain startup with Google Cloud

Architecting a blockchain startup with Google Cloud

Google Cloud Tech

Kubernetes and multitasking updates!

Kubernetes and multitasking updates!

Google Cloud Tech

GKE: Using Kubernetes Events

GKE: Using Kubernetes Events

Google Cloud Tech

How to configure firewall rules for Cloud Composer

How to configure firewall rules for Cloud Composer

Google Cloud Tech

Vertex AI Embeddings API + Matching Engine: Grounding LLMs made easy

Vertex AI Embeddings API + Matching Engine: Grounding LLMs made easy

Google Cloud Tech

Geospatial analytics on our radar #EarthEngine #BigQuery

Geospatial analytics on our radar #EarthEngine #BigQuery

Google Cloud Tech

Ensuring requests are set in Kubernetes

Ensuring requests are set in Kubernetes

Google Cloud Tech

Cloud Next 2023, Google research program, and more!

Cloud Next 2023, Google research program, and more!

Google Cloud Tech

How to migrate projects between organizations with Resource Manager

How to migrate projects between organizations with Resource Manager

Google Cloud Tech

How to run #MySQL in Google Cloud

How to run #MySQL in Google Cloud

Google Cloud Tech

#GenerativeAI for enterprises and #Next2023

#GenerativeAI for enterprises and #Next2023

Google Cloud Tech

How Google Photos scales to store 4 trillion photos and videos

How Google Photos scales to store 4 trillion photos and videos

Google Cloud Tech

Google Cross-Cloud Interconnect (Demo 2)

Google Cross-Cloud Interconnect (Demo 2)

Google Cloud Tech

GKE Cost Optimization Golden Signals: Introduction

GKE Cost Optimization Golden Signals: Introduction

Google Cloud Tech

GKE Cost Optimization Golden Signals: Workload Rightsizing

GKE Cost Optimization Golden Signals: Workload Rightsizing

Google Cloud Tech

GKE Load Balancing: Overview

GKE Load Balancing: Overview

Google Cloud Tech

GKE Load Balancing: Best Practices

GKE Load Balancing: Best Practices

Google Cloud Tech

Disaster Recovery in GKE

Disaster Recovery in GKE

Google Cloud Tech

How to configure IP masquerade agent in GKE Standard clusters

How to configure IP masquerade agent in GKE Standard clusters

Google Cloud Tech

Enable and use GKE Control plane logs

Enable and use GKE Control plane logs

Google Cloud Tech

Compliance in Australia with Assured Workloads

Compliance in Australia with Assured Workloads

Google Cloud Tech

Creating budgets and budget alerts in Google Cloud #FinOps

Creating budgets and budget alerts in Google Cloud #FinOps

Google Cloud Tech

Cloud SQL Enterprise Plus on our radar #mySQL

Cloud SQL Enterprise Plus on our radar #mySQL

Google Cloud Tech

What's Next for Google Cloud?

What's Next for Google Cloud?

Google Cloud Tech

How Loveholidays scaled with Contact Center AI

How Loveholidays scaled with Contact Center AI

Google Cloud Tech

What is fleet team management in GKE?

What is fleet team management in GKE?

Google Cloud Tech

Troubleshoot VPC Network Peering

Troubleshoot VPC Network Peering

Google Cloud Tech

Introduction to DocAI and Contact Center AI

Introduction to DocAI and Contact Center AI

Google Cloud Tech

Cloud Run Direct VPC egress explained

Cloud Run Direct VPC egress explained

Google Cloud Tech

Database deployment options in GKE

Database deployment options in GKE

Google Cloud Tech

Analyze cloud billing data with #BigQuery

Analyze cloud billing data with #BigQuery

Google Cloud Tech

Tips to becoming a world-class Prompt Engineer

Tips to becoming a world-class Prompt Engineer

Google Cloud Tech

Serverless is simple. Do I need CI/CD?

Serverless is simple. Do I need CI/CD?

Google Cloud Tech

Accelerating model deployment with MLOps

Accelerating model deployment with MLOps

Google Cloud Tech

How Hawaii's Department of Human Services scaled with CCAI

How Hawaii's Department of Human Services scaled with CCAI

Google Cloud Tech

Pricing API on our #Radar

Pricing API on our #Radar

Google Cloud Tech

How Recommendations AI for Media can boost customer retention

How Recommendations AI for Media can boost customer retention

Google Cloud Tech

Troubleshooting: Node Not Ready Status

Troubleshooting: Node Not Ready Status

Google Cloud Tech

One weekend until Cloud Next 2023!

One weekend until Cloud Next 2023!

Google Cloud Tech

#GoogleCloudNext starts tomorrow!

#GoogleCloudNext starts tomorrow!

Google Cloud Tech

#GoogleCloudNext will be demand!

#GoogleCloudNext will be demand!

Google Cloud Tech

This video teaches viewers how to optimize Cloud Run costs using various techniques such as setting max instances, using authentication, and creating budget alerts. It also covers the use of the Cloud Console's optimization page and Cost Explorer to identify cost drivers and optimize resource allocation. By following these techniques, viewers can prevent billing surprises and reduce their Cloud Run costs.

Key Takeaways

Set max instances to a number to prevent scaling up and paying for it
Turn away users when traffic increases
Use authentication to allow surges in traffic from real users but not from attackers
Use budget alerts to trigger email or pub sub alerts when Google predicts that your cloud bill will hit a certain dollar amount
Optimize the cost for running service by using the Cloud Hub and Optimization page in the cloud console
Click the link to view details and cost explorer
Switch to the vCPU utilization view
Check the memory utilization report
Click on the service called API to get more details
Scroll down to the charts for CPU and memory utilization

💡 Using the Cloud Console's optimization page and Cost Explorer can help identify cost drivers and optimize resource allocation, leading to significant cost savings.

🔒 Pro feature: Ask AI to explain this lesson →

More on: RAG Basics

View skill →

High Performance (Realtime) RAG Chains: From Basic to Advanced

High Performance (Realtime) RAG Chains: From Basic to Advanced

Coding the Ultimate RAG Engine from Zero

Coding the Ultimate RAG Engine from Zero

Building Agentic RAG From Scratch in Pure Python

Building Agentic RAG From Scratch in Pure Python

Build an LLM and RAG-based Chat Application using AlloyDB and LangChain

I Built a RAG App to Decode Airline Bureaucracy (So You Don't Have To)

I Built a RAG App to Decode Airline Bureaucracy (So You Don't Have To)

Akamai Developers

RAG Demo for Beginners: Full Hands-On Tutorial in Tamil | Build Your Own RAG AI | Karthik's Show

RAG Demo for Beginners: Full Hands-On Tutorial in Tamil | Build Your Own RAG AI | Karthik's Show

Related Reads

The Standoff Nobody Wins: Understanding Deadlock

Learn about deadlock, a situation where two or more processes are blocked, waiting for each other to release resources, and how to avoid it in programming

Medium · Programming

Presentation: Practical Robustness: Going Beyond Memory Safety in Rust

Learn how to use Rust to build failure-proof systems by leveraging ownership, enums, and typestate patterns

Integration Digest for June 2026

Learn 4 rules to build an efficient MCP server and improve your system design skills

Dev.to · Stanislav Deviatov

Decoupling Async State from UI Lifecycles

Learn to decouple async state from UI lifecycles for better app performance and maintainability

Dev.to · Luciano0322

Chapters (4)

Intro

1:10 Preventing billing surprises

3:46 Optimizing the cost of a running service

8:07 Recap

The Enterprise Software Problem Nobody Talks About

Wholesale Investor