Data-Efficient On-Policy Distillation for Automatic Speech Recognition
📰 ArXiv cs.AI
Learn how to improve automatic speech recognition models using data-efficient on-policy distillation, reducing the need for large-scale audio supervision
Action Steps
- Train a strong teacher model using Qwen-ASR architecture
- Apply on-policy distillation to transfer knowledge from the teacher model to a smaller student model
- Evaluate the student model on Mandarin and English ASR benchmarks
- Compare the performance of the student model with the teacher model
- Fine-tune the student model for specific use cases or languages
Who Needs to Know This
Speech recognition engineers and researchers can benefit from this technique to develop more accurate and efficient ASR models, while also reducing costs associated with large-scale audio data collection
Key Insight
💡 On-policy distillation can effectively transfer recognition capability from a strong teacher model to a smaller student model, reducing the need for large-scale audio supervision
Share This
🗣️ Improve ASR models with data-efficient on-policy distillation! 📊
Full Article
Title: Data-Efficient On-Policy Distillation for Automatic Speech Recognition
Abstract:
arXiv:2605.28139v1 Announce Type: new Abstract: Building competitive automatic speech recognition (ASR) models usually requires large-scale au- dio supervision, which makes reproduction and specialization expensive. We study Ark-ASR, a 0.6B- parameter audio-conditioned language model trained with 100k hours of speech, and examine whether a strong Qwen-ASR teacher can transfer additional recognition capability through on-policy distillation. Across Mandarin and English ASR benchmarks, the propose
Abstract:
arXiv:2605.28139v1 Announce Type: new Abstract: Building competitive automatic speech recognition (ASR) models usually requires large-scale au- dio supervision, which makes reproduction and specialization expensive. We study Ark-ASR, a 0.6B- parameter audio-conditioned language model trained with 100k hours of speech, and examine whether a strong Qwen-ASR teacher can transfer additional recognition capability through on-policy distillation. Across Mandarin and English ASR benchmarks, the propose
DeepCamp AI