SLaB: Sparse-Lowrank-Binary Decomposition for Efficient Large Language Models

📰 ArXiv cs.AI

SLaB decomposes large language model weights into sparse, low-rank, and binary components for efficient deployment

advanced Published 7 Apr 2026

Action Steps

Decompose each linear layer weight into sparse, low-rank, and binary components
Apply sparse decomposition to reduce memory usage
Use low-rank decomposition to decrease computational complexity
Combine binary decomposition with sparse and low-rank components for efficient deployment

Who Needs to Know This

ML researchers and engineers on a team can benefit from SLaB to improve the efficiency of large language models, while maintaining good performance

Key Insight

💡 Decomposing large language model weights into complementary components can lead to efficient deployment without sacrificing performance