Distilling Conversations: Abstract Compression of Conversational Audio Context for LLM-based ASR

📰 ArXiv cs.AI

arXiv:2603.26246v1 Announce Type: cross Abstract: Standard LLM-based speech recognition systems typically process utterances in isolation, limiting their ability to leverage conversational context. In this work, we study whether multimodal context from prior turns improves LLM-based ASR and how to represent that context efficiently. We find that, after supervised multi-turn training, conversational context mainly helps with the recognition of contextual entities. However, conditioning on raw con

Published 30 Mar 2026
Read full paper → ← Back to News