On-Device LLM Inference on Android With ExecuTorch and Qualcomm QNN - Shivay Lamba & Kartikey Rawat

PyTorch · Intermediate ·🛠️ AI Tools & Apps ·3w ago
On-Device LLM Inference on Android With ExecuTorch and Qualcomm QNN - Shivay Lamba & Kartikey Rawat, Qualcomm Multimodal models like CLIP are typically deployed in the cloud due to their size and computational demands, limiting their use in latency-sensitive, privacy-preserving, and offline-first applications. This talk demonstrates how one can run fully on-device CLIP inference on Android using ExecuTorch with the Qualcomm QNN backend, enabling real-time vision–language understanding without server dependency. One can run models like CLIP (ViT-B/32) model entirely on edge devices, leveraging QNN for hardware-accelerated inference. A key focus of the talk is a deep dive into ExecuTorch optimizations for QNN, including graph lowering, operator fusion, quantization strategies, memory planning, and backend-specific execution choices that materially impact latency, memory footprint, and power consumption. The talk will cover architectural insights, model export and compilation workflows, and real-world benchmarks covering latency, memory usage, and power efficiency. This talk highlights how large multimodal PyTorch models can be made production-ready on edge devices, unlocking new classes of private, offline-capable AI applications.
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Related AI Lessons

Up next
OpenClaw 5.16: What You Need To Know…
Julian Goldie SEO
Watch →