A closer look at Gemma 4 with Baseten and NVIDIA

Google Cloud · Intermediate ·🧠 Large Language Models ·10h ago
Inference isn't just one thing—it’s the entire stack. Live from Google Cloud Next '26, Jason Davenport (Google Cloud), Jay Rodge (NVIDIA), and Philip Kiely (Baseten) break down the "Full Stack Seating Chart" of modern AI: from the silicon powering the models to the frameworks scaling them to millions of users. This session dives into the day-zero support for Gemma 4, Google's most capable open model family, and how the partnership between Google, NVIDIA, and Baseten is solving the "Hypergrowth" problem for AI applications. Key Highlights: Next-Gen Hardware: Jay Rodge announces the arrival of NVIDIA Blackwell (RTX PRO 6000) and the future Vera Rubin GPUs on Google Cloud, featuring 96GB of VRAM—enough to pack multiple massive models on a single chip. Gemma 4 & MoE: A look at the new Gemma 4 26B A4B (Mixture-of-Experts) model, which activates only 4B parameters to deliver 27B-class intelligence at lightning-fast speeds. Inference Engineering: Philip Kiely discusses his new book, "Inference Engineering," and explains why inference is a holistic challenge involving CUDA, infrastructure, distributed systems, and tight latency SLAs. Scaling at Baseten: A live demo showing how Baseten uses GKE and L4 GPUs to provide one-click deployments of Gemma 4, featuring auto-scaling that handles traffic spikes without sacrificing response time. Precision & Optimization: Why NVFP4 and TensorRT-LLM are the "secret sauce" for getting the highest possible performance out of Gemma on NVIDIA hardware. "If you have a GPU that costs twice as much but handles three times the volume, you’ve actually lowered your TCO. In inference engineering, cheap isn't always the goal—efficiency is." Get Started: Explore the Gemma 4 family on Hugging Face, check out Baseten for model serving, and join the NVIDIA & Google Cloud developer community to start building. #Gemma4 #NVIDIABlackwell #InferenceEngineering #GoogleCloudNext #Baseten #OpenModels #VeraRubin
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Related AI Lessons

Hy3 Önizleme API'si Ücretsiz Nasıl Kullanılır?
Learn how to use the Hy3 Preview API for free and explore its capabilities in AI and LLMs
Dev.to AI
I built a Python module to A/B test prompts inside Claude Code, and you can run it on yours
Learn how to A/B test prompts inside Claude Code using a Python module and improve your AI model's performance
Dev.to · Frank Brsrk
10 Benefits of Learning Generative AI in 2026 (Complete Guide for Beginners & Professionals)
Unlock 10 benefits of learning Generative AI in 2026 and boost your career across industries
Medium · AI
Talk to Your Data: A New Approach Using JavaScript Instead of SQL Yields Better Results
Learn how to use JavaScript instead of SQL to interact with your data, yielding better results with a new approach
Medium · JavaScript
Up next
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Dave Ebbelaar (LLM Eng)
Watch →