Data-Efficient On-Policy Distillation for Automatic Speech Recognition

📰 ArXiv cs.AI

Learn how to improve automatic speech recognition models using data-efficient on-policy distillation, reducing the need for large-scale audio supervision

advanced Published 28 May 2026

Action Steps

Train a strong teacher model using Qwen-ASR architecture
Apply on-policy distillation to transfer knowledge from the teacher model to a smaller student model
Evaluate the student model on Mandarin and English ASR benchmarks
Compare the performance of the student model with the teacher model
Fine-tune the student model for specific use cases or languages

Who Needs to Know This

Speech recognition engineers and researchers can benefit from this technique to develop more accurate and efficient ASR models, while also reducing costs associated with large-scale audio data collection

Key Insight

💡 On-policy distillation can effectively transfer recognition capability from a strong teacher model to a smaller student model, reducing the need for large-scale audio supervision

Full Article

Title: Data-Efficient On-Policy Distillation for Automatic Speech Recognition

Abstract:
arXiv:2605.28139v1 Announce Type: new Abstract: Building competitive automatic speech recognition (ASR) models usually requires large-scale au- dio supervision, which makes reproduction and specialization expensive. We study Ark-ASR, a 0.6B- parameter audio-conditioned language model trained with 100k hours of speech, and examine whether a strong Qwen-ASR teacher can transfer additional recognition capability through on-policy distillation. Across Mandarin and English ASR benchmarks, the propose

Read full paper → ← Back to Reads