Spectral Compact Training: Pre-Training Large Language Models via Permanent Truncated SVD and Stiefel QR Retraction
📰 ArXiv cs.AI
Spectral Compact Training (SCT) is a method for pre-training large language models using permanent truncated SVD and Stiefel QR retraction to reduce memory usage
Action Steps
- Replace dense weight matrices with permanent truncated SVD factors
- Use standard backpropagation to flow gradients through compact spectral factors
- Retract U, V to the Stiefel manifold to maintain orthogonality
- Train large language models using the compact spectral factors without materializing the full dense matrix
Who Needs to Know This
ML researchers and engineers working on large language models can benefit from this method to reduce memory usage and improve training efficiency. This can be particularly useful for teams working on consumer hardware with limited memory resources
Key Insight
💡 SCT enables efficient pre-training of large language models on consumer hardware by reducing memory usage
Share This
🔍 Reduce memory usage for large language models with Spectral Compact Training (SCT) 🚀
DeepCamp AI