A closer look at Gemma 4 with Baseten and NVIDIA
Inference isn't just one thing—it’s the entire stack.
Live from Google Cloud Next '26, Jason Davenport (Google Cloud), Jay Rodge (NVIDIA), and Philip Kiely (Baseten) break down the "Full Stack Seating Chart" of modern AI: from the silicon powering the models to the frameworks scaling them to millions of users.
This session dives into the day-zero support for Gemma 4, Google's most capable open model family, and how the partnership between Google, NVIDIA, and Baseten is solving the "Hypergrowth" problem for AI applications.
Key Highlights:
Next-Gen Hardware: Jay Rodge announces the arrival of NVIDIA Blackwell (RTX PRO 6000) and the future Vera Rubin GPUs on Google Cloud, featuring 96GB of VRAM—enough to pack multiple massive models on a single chip.
Gemma 4 & MoE: A look at the new Gemma 4 26B A4B (Mixture-of-Experts) model, which activates only 4B parameters to deliver 27B-class intelligence at lightning-fast speeds.
Inference Engineering: Philip Kiely discusses his new book, "Inference Engineering," and explains why inference is a holistic challenge involving CUDA, infrastructure, distributed systems, and tight latency SLAs.
Scaling at Baseten: A live demo showing how Baseten uses GKE and L4 GPUs to provide one-click deployments of Gemma 4, featuring auto-scaling that handles traffic spikes without sacrificing response time.
Precision & Optimization: Why NVFP4 and TensorRT-LLM are the "secret sauce" for getting the highest possible performance out of Gemma on NVIDIA hardware.
"If you have a GPU that costs twice as much but handles three times the volume, you’ve actually lowered your TCO. In inference engineering, cheap isn't always the goal—efficiency is."
Get Started: Explore the Gemma 4 family on Hugging Face, check out Baseten for model serving, and join the NVIDIA & Google Cloud developer community to start building.
#Gemma4 #NVIDIABlackwell #InferenceEngineering #GoogleCloudNext #Baseten #OpenModels #VeraRubin
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
More on: LLM Foundations
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
Hy3 Önizleme API'si Ücretsiz Nasıl Kullanılır?
Dev.to AI
I built a Python module to A/B test prompts inside Claude Code, and you can run it on yours
Dev.to · Frank Brsrk
10 Benefits of Learning Generative AI in 2026 (Complete Guide for Beginners & Professionals)
Medium · AI
Talk to Your Data: A New Approach Using JavaScript Instead of SQL Yields Better Results
Medium · JavaScript
🎓
Tutor Explanation
DeepCamp AI