Gemma 4 Deep Dive — Cassidy Hardin, Researcher, Google DeepMind

AI Engineer · Beginner ·📄 Research Papers Explained ·2w ago
Open models are getting smaller, faster, and far more capable. In this talk, Cassidy Hardin walks through the latest advances in the Gemma family, with a focus on Gemma 4 and what it enables for developers building on-device and open-weight AI systems. She covers the architecture behind Gemma’s dense, effective, and mixture-of-experts models, including improvements to attention, multimodal support for text, vision, and audio, and the design decisions that make strong reasoning, coding, and agentic workflows possible at practical sizes. Speaker info: - https://uk.linkedin.com/in/cassidyhardin Timestamps: 00:00:28 - Introduction to the Gemma 4 model family and its four size categories 00:01:54 - Shift to Apache 2.0 licensing for developer accessibility 00:02:25 - Deep dive into the 31B dense reasoning and 26B mixture-of-experts (MoE) models 00:03:30 - Overview of on-device effective models (2B and 4B) with multimodal support 00:04:21 - Architectural updates: interleaved local/global attention and grouped query attention 00:06:51 - Explanation of the new MoE architecture (128 experts, 8 active) 00:07:44 - Implementation of Per Layer Embeddings (PLE) to optimize on-device memory 00:11:06 - Multimodal advances: variable aspect ratios and resolutions for vision encoders 00:16:31 - Audio processing enhancements via conformer architecture and audio tokenizers 00:18:07 - Getting started: self-hosting (Hugging Face, Ollama) and cloud deployment (Vertex AI)
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Related AI Lessons

The ABCs of reading medical research and review papers these days
Learn to critically evaluate medical research papers by accepting nothing at face value, believing no one blindly, and checking everything
Medium · LLM
#1 DevLog Meta-research: I Got Tired of Tab Chaos While Reading Research Papers.
Learn to manage research paper tabs efficiently and apply meta-research techniques to improve productivity
Dev.to AI
How to Set Up a Karpathy-Style Wiki for Your Research Field
Learn to set up a Karpathy-style wiki for your research field to organize and share knowledge effectively
Medium · AI
The Non-Optimality of Scientific Knowledge: Path Dependence, Lock-In, and The Local Minimum Trap
Scientific knowledge may be stuck in a local minimum, hindering optimal progress, and understanding this concept is crucial for advancing research
ArXiv cs.AI

Chapters (10)

0:28 Introduction to the Gemma 4 model family and its four size categories
1:54 Shift to Apache 2.0 licensing for developer accessibility
2:25 Deep dive into the 31B dense reasoning and 26B mixture-of-experts (MoE) models
3:30 Overview of on-device effective models (2B and 4B) with multimodal support
4:21 Architectural updates: interleaved local/global attention and grouped query at
6:51 Explanation of the new MoE architecture (128 experts, 8 active)
7:44 Implementation of Per Layer Embeddings (PLE) to optimize on-device memory
11:06 Multimodal advances: variable aspect ratios and resolutions for vision encoder
16:31 Audio processing enhancements via conformer architecture and audio tokenizers
18:07 Getting started: self-hosting (Hugging Face, Ollama) and cloud deployment (Ver
Up next
Microsoft Research Forum | Season 2, Episode 4
Microsoft Research
Watch →