Sommelier: Scalable Open Multi-turn Audio Pre-processing for Full-duplex Speech Language Models

📰 ArXiv cs.AI

Sommelier is a scalable open multi-turn audio pre-processing system for full-duplex speech language models

advanced Published 30 Mar 2026
Action Steps
  1. Developing full-duplex speech language models requires high-quality multi-speaker conversational data
  2. Existing large-scale resources are predominantly single-speaker or limited in volume
  3. Sommelier addresses the complex dynamics of natural dialogue by providing scalable open multi-turn audio pre-processing
  4. Practitioners can apply Sommelier to improve the performance of their speech language models
Who Needs to Know This

AI engineers and researchers working on speech language models can benefit from Sommelier as it enables real-time natural human-computer interaction, and data scientists can utilize it for high-quality multi-speaker conversational data

Key Insight

💡 Sommelier addresses the scarcity of high-quality multi-speaker conversational data for full-duplex speech language models

Share This
💡 Sommelier enables real-time human-computer interaction with scalable open multi-turn audio pre-processing #SLMs #AI
Read full paper → ← Back to News