Profiling-Driven Adaptive Distributed Transformer Inference on Embedded Edge Deployment

📰 ArXiv cs.AI

arXiv:2605.25682v1 Announce Type: cross Abstract: Distributing Transformer inference across embedded edge devices can alleviate individual memory and compute constraints, yet practical benefits on real hardware remain unclear: prior work relies largely on simulations that overlook hardware-specific communication overheads. We present a hardware prototype study on NVIDIA Jetson Orin Nano devices connected over WiFi. Our key finding is that the dominant bottleneck is not just network bandwidth but

Published 26 May 2026

Read full paper → ← Back to Reads