LBLLM: Lightweight Binarization of Large Language Models via Three-Stage Distillation

📰 ArXiv cs.AI

arXiv:2604.19167v1 Announce Type: cross Abstract: Deploying large language models (LLMs) in resource-constrained environments is hindered by heavy computational and memory requirements. We present LBLLM, a lightweight binarization framework that achieves effective W(1+1)A4 quantization through a novel three-stage quantization strategy. The framework proceeds as follows: (1) initialize a high-quality quantized model via PTQ; (2) quantize binarized weights, group-wise bitmaps, and quantization par

Published 22 Apr 2026
Read full paper → ← Back to Reads