SoLA: Leveraging Soft Activation Sparsity and Low-Rank Decomposition for Large Language Model Compression

📰 ArXiv cs.AI

arXiv:2604.03258v1 Announce Type: cross Abstract: Large language models (LLMs) have demonstrated impressive capabilities across various tasks, but the billion-scale parameters pose deployment challenges. Although existing methods attempt to reduce the scale of LLMs, they require either special hardware support or expensive post-training to maintain model quality. To facilitate efficient and affordable model slimming, we propose a novel training-free compression method for LLMs, named "SoLA", whi

Published 7 Apr 2026
Read full paper → ← Back to News