Spectral Compact Training: Pre-Training Large Language Models via Permanent Truncated SVD and Stiefel QR Retraction

📰 ArXiv cs.AI

Spectral Compact Training (SCT) is a method for pre-training large language models using permanent truncated SVD and Stiefel QR retraction to reduce memory usage

advanced Published 2 Apr 2026
Action Steps
  1. Replace dense weight matrices with permanent truncated SVD factors
  2. Use standard backpropagation to flow gradients through compact spectral factors
  3. Retract U, V to the Stiefel manifold to maintain orthogonality
  4. Train large language models using the compact spectral factors without materializing the full dense matrix
Who Needs to Know This

ML researchers and engineers working on large language models can benefit from this method to reduce memory usage and improve training efficiency. This can be particularly useful for teams working on consumer hardware with limited memory resources

Key Insight

💡 SCT enables efficient pre-training of large language models on consumer hardware by reducing memory usage

Share This
🔍 Reduce memory usage for large language models with Spectral Compact Training (SCT) 🚀
Read full paper → ← Back to News