Scale When Needed: Adaptive Neuron-level Mixed Precision Quantization Aware Training

📰 ArXiv cs.AI

arXiv:2605.25054v1 Announce Type: cross Abstract: Deploying deep neural networks on resource-constrained 6G edge devices demands aggressive compression with minimal accuracy loss. Quantization-Aware Training (QAT) has emerged as a leading compression approach; however, existing mixed-precision methods typically operate at coarse layer- or channel-level granularity. These methods often rely on heuristic or search-based bit-allocation strategies, which may overlook fine-grained variability at the

Published 26 May 2026
Read full paper → ← Back to Reads