Lightning Talk: Running ExecuTorch Applications With Silicon Accelera... George Gekov & Aki Makkonen
Lightning Talk: Running ExecuTorch Applications With Silicon Acceleration, in Ultra-low Power - George Gekov, Arm; Aki Makkonen, Alif Semiconductor
Efficient deployment of ML models on low-power embedded systems has been a significant challenge for a number of years. At the same time, these embedded SoCs are all around us—from everyday appliances to the latest smart glasses.
ExecuTorch is a PyTorch-native framework for deploying neural networks on resource-constrained systems. In this session, we show how to build an end-to-end speech recognition application using PyTorch and ExecuTorch—from training a Transformer-based neural network in PyTorch, through quantization, all the way to deployment on a low-power embedded device.
We will introduce the key ExecuTorch APIs for quantization and explain how models are transformed and lowered into a form that can run efficiently on a device. The application is running on the Alif Ensemble E8 SoC, the first implementation of the leading Arm® Ethos-U85 NPU which brings native support for Transformer models to the ultra-low power domain.
Join the experts from Arm and Alif Semiconductor to see how we are bridging the gap between PyTorch and embedded deployment—and how you can bring PyTorch models to silicon-accelerated, ultra-low-power systems.
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
More on: LLM Engineering
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
Day 3 — The Transformer Architecture Deep Dive
Medium · Deep Learning
what is a optimizer in deep learning? what does it actually do? raw thoughts and intuition.
Medium · Deep Learning
The Deep Learning Compendium: From Mathematical Foundations to Modern Architectures
Medium · AI
The Deep Learning Compendium: From Mathematical Foundations to Modern Architectures
Medium · Machine Learning
🎓
Tutor Explanation
DeepCamp AI