MSR-HuBERT: Self-supervised Pre-training for Adaptation to Multiple Sampling Rates

📰 ArXiv cs.AI

MSR-HuBERT is a self-supervised pre-training method for adapting to multiple sampling rates in speech processing

advanced Published 25 Mar 2026

Action Steps

Replace single-rate downsampling CNN with multi-sampling-rate adaptive downsampling CNN
Map raw waveforms to multiple sampling rates
Pre-train the model using self-supervised learning
Fine-tune the model on specific speech recognition tasks

Who Needs to Know This

Speech recognition engineers and researchers on a team benefit from MSR-HuBERT as it improves the robustness of speech models to varying sampling rates, and data scientists can apply this method to develop more accurate speech processing systems

Key Insight

💡 MSR-HuBERT enables speech models to adapt to different sampling rates, improving their robustness and accuracy