Lightning Talk: From Hugging Face To Handheld: Scaling LLM Deployment W... Cormac Brick & Weiyi Wang
Lightning Talk: From Hugging Face To Handheld: Scaling LLM Deployment With LiteRT Generative API - Cormac Brick & Weiyi Wang, Google
This session will demonstrate the E2E journey of bringing custom PyTorch-based Open Source LLMs on cross platform devices using LiteRT. We will show developers how to take a custom Hugging Face Transformers checkpoint and convert them for on-device execution, including:
-Taking the Pytorch model from conversion to deployment.
-Automated Optimization: How LiteRT performs automated patching of performance-critical components, including architecture-specific rewrites for PyTorch models.
-Seamless Fine-Tuning Integration: How to move from an Unsloth fine-tuning session to a TorchAO-quantized model and LiteRT export without leaving your script.
-The "0-Day" Enablement Strategy: Well-known architectures are supported out-of-the-box. We’ll share how we enabled the QWEN0.6 (or Liquid AI) model in just 20 minutes.
-Interactive Validation: Run inference on the exported model directly in the Terminal or Colab to verify numerical correctness before deploying to device.
This workflow shows a smooth fine-tune-to-deployment story where everything stays within the original PyTorch/Hugging Face ecosystem. Viewers can "vibe code" along using Gemini CLI or other coding agents.
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
More on: LLM Engineering
View skill →Related AI Lessons
🎓
Tutor Explanation
DeepCamp AI