Efficient Training for Cross-lingual Speech Language Models

📰 ArXiv cs.AI

arXiv:2604.11096v1 Announce Type: cross Abstract: Currently, large language models (LLMs) predominantly focus on the text modality. To enable more natural human-AI interaction, speech LLMs are emerging, but building effective end-to-end speech LLMs remains challenging due to limited data and the difficulty in expanding to more languages. In this paper, we introduce Cross-lingual Speech Language Model (CSLM), an efficient training method for cross-lingual speech LLMs based on discrete speech toke

Published 14 Apr 2026
Read full paper → ← Back to Reads