Self host Gemma 4: Deploy LLMs on Cloud Run GPUs
GCP credit → https://goo.gle/handson-ep7-lab1
Lab → https://goo.gle/guardians
In this episode, we deploy Google's Gemma 4 model to Cloud Run two completely different ways, each with real trade-offs you need to understand before choosing one for production.
🔨 Ollama — model baked into the container. Instant cold starts. Rebuild to update.
⚡ vLLM — model mounted from Cloud Storage via FUSE. Slower first boot, but swap models without redeploying.
Both use Cloud Run GPUs, scale to zero, and ship through automated CI/CD with Cloud Build.
We build both. You decide which fits. 👇
📦 CI/CD with Cloud Build
🖥️ GPU accelerated serverless inference
🔄 Baked in vs. decoupled model architecture
🚀 Scale to zero
⚖️ Cold start speed vs. production agility
Chapters:
0:00 - Intro
6:08 - Getting started with Agentverse lab
7:57 - Laying the foundations of the citadel
16:07 - Forging the power core: Self hosted LLMs
28:02 - Forging the citadel's central core: Deploy vLLM
43:59 - Summary
More resources:
Cloud Run GPU documentation → https://goo.gle/4sEbTvG
Ollama documentation → https://goo.gle/3Qdi64w
vLLM documentation → https://goo.gle/4cvvxE9
Cloud Storage FUSE → https://goo.gle/4cQAb0V
Watch more Hands on AI → https://www.youtube.com/watch?v=qCBreTfjFHQ&list=PLIivdWyY5sqKnJOvP89yF8t9mWuzMTcbM
🔔 Subscribe to Google Cloud Tech → https://goo.gle/GoogleCloudTech
#Gemma4 #CloudRun
Speakers: Ayo Adedeji, Annie Wang
Products Mentioned: Agent Development Kit, Gemini API, Cloud Run
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
More on: LLM Engineering
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
On deleting what the model gave you
Dev.to AI
The Wall Every AI Has Been Hitting And the Startup That Claims to Have Broken Through
Medium · LLM
Unsolved AI Mystery Is Solved Along With Lessons Learned On Why ChatGPT Became Oddly Obsessed With Gremlins And Goblins
Forbes Innovation
The Human Element in AI-Based Emotion Recognition: When Machines Start to Read Our Faces
Medium · Deep Learning
Chapters (6)
Intro
6:08
Getting started with Agentverse lab
7:57
Laying the foundations of the citadel
16:07
Forging the power core: Self hosted LLMs
28:02
Forging the citadel's central core: Deploy vLLM
43:59
Summary
🎓
Tutor Explanation
DeepCamp AI