Cloud to Edge: Benchmarking LLM Inference On Hardware-Accelerated Single-Board Computers
📰 ArXiv cs.AI
arXiv:2604.24785v1 Announce Type: cross Abstract: Large language models (LLMs) are becoming increasingly capable at small parameter scales. At the same time, conventional cloud-centric deployment introduces challenges around data privacy, latency, and cost that are acute in operational technology and defence environments. Advances in model distillation, quantisation, and affordable edge accelerators now make local LLM inference on single-board computers feasible, but the high dimensionality of t
DeepCamp AI