Rethinking Layer Redundancy in Large Language Models: Calibration Objectives and Search for Depth Pruning

📰 ArXiv cs.AI

arXiv:2604.24938v1 Announce Type: cross Abstract: Depth pruning improves the inference efficiency of large language models by removing Transformer blocks. Prior work has focused on importance criteria and search algorithms, often treating layer redundancy as an inherent structural property of pretrained networks. In contrast, we adopt a \emph{functional perspective}, where redundancy is jointly influenced by the model and the evaluation objective, suggesting that a universal ranking may not be s

Published 29 Apr 2026
Read full paper → ← Back to Reads