Self host Gemma 4: Deploy LLMs on Cloud Run GPUs

Google Cloud Tech · Beginner ·🧠 Large Language Models ·2w ago

Skills: LLM Engineering90%Tool Use & Function Calling60%

GCP credit → https://goo.gle/handson-ep7-lab1 Lab → https://goo.gle/guardians In this episode, we deploy Google's Gemma 4 model to Cloud Run two completely different ways, each with real trade-offs you need to understand before choosing one for production. 🔨 Ollama — model baked into the container. Instant cold starts. Rebuild to update. ⚡ vLLM — model mounted from Cloud Storage via FUSE. Slower first boot, but swap models without redeploying. Both use Cloud Run GPUs, scale to zero, and ship through automated CI/CD with Cloud Build. We build both. You decide which fits. 👇 📦 CI/CD with Cloud Build 🖥️ GPU accelerated serverless inference 🔄 Baked in vs. decoupled model architecture 🚀 Scale to zero ⚖️ Cold start speed vs. production agility Chapters: 0:00 - Intro 6:08 - Getting started with Agentverse lab 7:57 - Laying the foundations of the citadel 16:07 - Forging the power core: Self hosted LLMs 28:02 - Forging the citadel's central core: Deploy vLLM 43:59 - Summary More resources: Cloud Run GPU documentation → https://goo.gle/4sEbTvG Ollama documentation → https://goo.gle/3Qdi64w vLLM documentation → https://goo.gle/4cvvxE9 Cloud Storage FUSE → https://goo.gle/4cQAb0V Watch more Hands on AI → https://www.youtube.com/watch?v=qCBreTfjFHQ&list=PLIivdWyY5sqKnJOvP89yF8t9mWuzMTcbM 🔔 Subscribe to Google Cloud Tech → https://goo.gle/GoogleCloudTech #Gemma4 #CloudRun Speakers: Ayo Adedeji, Annie Wang Products Mentioned: Agent Development Kit, Gemini API, Cloud Run

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

More on: LLM Engineering

View skill →

Build an LLM and RAG-based Chat Application using AlloyDB and LangChain

FULLY LOCAL Mistral AI PDF Processing [Hands-on Tutorial]

FULLY LOCAL Mistral AI PDF Processing [Hands-on Tutorial]

Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation

Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation

How to Make an Asteroids Game Bot (LIVE)

How to Make an Asteroids Game Bot (LIVE)

Using Claude Code + Nano Banana Pro To Create a Dataset of Engineering Drawings

Using Claude Code + Nano Banana Pro To Create a Dataset of Engineering Drawings

Automata Learning Lab

Advanced AI and Machine Learning Techniques and Capstone

Advanced AI and Machine Learning Techniques and Capstone

Related AI Lessons

On deleting what the model gave you

Learn to refine AI-generated content by deleting and editing, a crucial step in creative work with AI models

The Wall Every AI Has Been Hitting And the Startup That Claims to Have Broken Through

Startup SubQ claims to have broken through the context length constraint in AI, solving a decade-long limitation

Unsolved AI Mystery Is Solved Along With Lessons Learned On Why ChatGPT Became Oddly Obsessed With Gremlins And Goblins

Discover why ChatGPT became obsessed with gremlins and goblins, and learn vital lessons from this AI mystery

Forbes Innovation

The Human Element in AI-Based Emotion Recognition: When Machines Start to Read Our Faces

Learn how AI-based emotion recognition systems use facial analysis to understand human emotions and the importance of the human element in these systems

Medium · Deep Learning

Chapters (6)

Intro

6:08 Getting started with Agentverse lab

7:57 Laying the foundations of the citadel

16:07 Forging the power core: Self hosted LLMs

28:02 Forging the citadel's central core: Deploy vLLM

43:59 Summary

5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems

Dave Ebbelaar (LLM Eng)