TLMs: Tiny LLMs and Agents on Edge Devices with LiteRT-LM — Cormac Brick, Google

Name: TLMs: Tiny LLMs and Agents on Edge Devices with LiteRT-LM — Cormac Brick, Google
Uploaded: 2026-05-03T22:00:06Z
Channel: AI Engineer
Description: Tiny LLMs are making on-device agents much more practical. In this workshop, Cormac Brick walks through how LiteRT-LM brings language models to edge dev...

AI Engineer · Intermediate ·🧠 Large Language Models ·1w ago

Skills: LLM Engineering90%Tool Use & Function Calling70%

Tiny LLMs are making on-device agents much more practical. In this workshop, Cormac Brick walks through how LiteRT-LM brings language models to edge devices, with a focus on Gemma, agent skills, and the real engineering tradeoffs behind running LLM workflows on phones and other constrained hardware. The session covers performance across edge devices, on-device function calling, fine-tuning and deployment, platform support across Android and iOS, and the memory, safety, and UX constraints that shape edge-native AI systems. If you're building local agents or want a practical look at where edge LLMs are headed, this is a useful hands-on overview. Speaker info: - https://www.linkedin.com/in/cbrick/ Timestamps (0:00:00) Intro: AI on the Edge, Small Language Models, and Gemma (0:04:51) Enabling App Development: MediaPipe, LiteRT, and System Services (0:09:09) Small Language Models: Performance, Reach, and Fine-tuning (0:11:30) Gemma 4: Sizes (E2B and E4B) and AI Core Roadmap (0:16:10) Gemma on Edge Runtime: Performance Benchmarks (0:18:34) Agent Skills: Google AI Gallery, Mood Tracker, and Wikipedia Lookup (0:23:38) Skill Architecture: Efficiency, Progressive Disclosure, and Tool Loading (0:27:34) Reliability: Constrained Decoding and Tool Usage (0:29:18) Community and Custom Skills (0:31:30) Skill Development Deep Dive: Orchestrator and Registry (0:33:30) Rapid Skill Prototyping: Using Gemini CLI and ADB (0:38:35) Open Source: AI Edge Gallery and Community Engagement (0:41:00) Deploying Tiny Models (sub-1B parameters) In-App (0:47:44) Third-Party Models: Fast VLM and Hardware Acceleration (0:50:17) Model Examples: Function Gemma, Mobile Actions, and Embedding Gemma (0:55:41) AI Edge Eloquent: Transcription and Text Polishing (0:59:07) Modularity Playbook: ASR and Text Polishing Engines (1:01:23) Synthetic Data Workflows for Tiny Models (1:06:36) Web Support and Fine-tuning Documentation (1:08:20) Summary and Key Takeaways (1:12:49) Q&A: Multi-skill Execution, Context

Watch on YouTube ↗ (saves to browser)