On the Limits of Layer Pruning for Generative Reasoning in Large Language Models

📰 ArXiv cs.AI

arXiv:2602.01997v2 Announce Type: replace-cross Abstract: Recent work has shown that layer pruning can effectively compress large language models (LLMs) while retaining strong performance on classification benchmarks, often with little or no finetuning. In contrast, generative reasoning tasks, such as GSM8K and HumanEval\textsuperscript{+}, exhibit substantially weaker recovery. We show that beyond surface-level text degradation, pruning leads to a loss of key algorithmic capabilities, including

Published 13 Apr 2026

Read full paper → ← Back to Reads