Lightning Talk: From Hugging Face To Handheld: Scaling LLM Deployment W... Cormac Brick & Weiyi Wang

PyTorch · Advanced ·🧠 Large Language Models ·3w ago
Lightning Talk: From Hugging Face To Handheld: Scaling LLM Deployment With LiteRT Generative API - Cormac Brick & Weiyi Wang, Google This session will demonstrate the E2E journey of bringing custom PyTorch-based Open Source LLMs on cross platform devices using LiteRT. We will show developers how to take a custom Hugging Face Transformers checkpoint and convert them for on-device execution, including: -Taking the Pytorch model from conversion to deployment. -Automated Optimization: How LiteRT performs automated patching of performance-critical components, including architecture-specific rewrites for PyTorch models. -Seamless Fine-Tuning Integration: How to move from an Unsloth fine-tuning session to a TorchAO-quantized model and LiteRT export without leaving your script. -The "0-Day" Enablement Strategy: Well-known architectures are supported out-of-the-box. We’ll share how we enabled the QWEN0.6 (or Liquid AI) model in just 20 minutes. -Interactive Validation: Run inference on the exported model directly in the Terminal or Colab to verify numerical correctness before deploying to device. This workflow shows a smooth fine-tune-to-deployment story where everything stays within the original PyTorch/Hugging Face ecosystem. Viewers can "vibe code" along using Gemini CLI or other coding agents.
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Related AI Lessons

Up next
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Dave Ebbelaar (LLM Eng)
Watch →