Spectral Compact Training: Pre-Training Large Language Models via Permanent Truncated SVD and Stiefel QR Retraction

📰 ArXiv cs.AI

Spectral Compact Training (SCT) is a method for pre-training large language models using permanent truncated SVD and Stiefel QR retraction to reduce memory usage

advanced Published 2 Apr 2026

Action Steps

Replace dense weight matrices with permanent truncated SVD factors
Use standard backpropagation to flow gradients through compact spectral factors
Retract U, V to the Stiefel manifold to maintain orthogonality
Train large language models using the compact spectral factors without materializing the full dense matrix

Who Needs to Know This

ML researchers and engineers working on large language models can benefit from this method to reduce memory usage and improve training efficiency. This can be particularly useful for teams working on consumer hardware with limited memory resources

Key Insight

💡 SCT enables efficient pre-training of large language models on consumer hardware by reducing memory usage