Lightning Talk: From Hugging Face To Handheld: Scaling LLM Deployment W... Cormac Brick & Weiyi Wang

Name: Lightning Talk: From Hugging Face To Handheld: Scaling LLM Deployment W... Cormac Brick & Weiyi Wang
Uploaded: 2026-04-20T20:21:45Z
Channel: PyTorch
Description: Lightning Talk: From Hugging Face To Handheld: Scaling LLM Deployment With LiteRT Generative API - Cormac Brick & Weiyi Wang, Google This session will d...

PyTorch · Advanced ·🧠 Large Language Models ·3w ago

Skills: LLM Engineering80%Model Deployment70%

Lightning Talk: From Hugging Face To Handheld: Scaling LLM Deployment With LiteRT Generative API - Cormac Brick & Weiyi Wang, Google This session will demonstrate the E2E journey of bringing custom PyTorch-based Open Source LLMs on cross platform devices using LiteRT. We will show developers how to take a custom Hugging Face Transformers checkpoint and convert them for on-device execution, including: -Taking the Pytorch model from conversion to deployment. -Automated Optimization: How LiteRT performs automated patching of performance-critical components, including architecture-specific rewrites for PyTorch models. -Seamless Fine-Tuning Integration: How to move from an Unsloth fine-tuning session to a TorchAO-quantized model and LiteRT export without leaving your script. -The "0-Day" Enablement Strategy: Well-known architectures are supported out-of-the-box. We’ll share how we enabled the QWEN0.6 (or Liquid AI) model in just 20 minutes. -Interactive Validation: Run inference on the exported model directly in the Terminal or Colab to verify numerical correctness before deploying to device. This workflow shows a smooth fine-tune-to-deployment story where everything stays within the original PyTorch/Hugging Face ecosystem. Viewers can "vibe code" along using Gemini CLI or other coding agents.

Watch on YouTube ↗ (saves to browser)