Generating Synthetic Doctor-Patient Conversations for Long-form Audio Summarization

📰 ArXiv cs.AI

arXiv:2604.06138v1 Announce Type: cross Abstract: Long-context audio reasoning is underserved in both training data and evaluation. Existing benchmarks target short-context tasks, and the open-ended generation tasks most relevant to long-context reasoning pose well-known challenges for automatic evaluation. We propose a synthetic data generation pipeline designed to serve both as a training resource and as a controlled evaluation environment, and instantiate it for first-visit doctor-patient con

Published 8 Apr 2026
Read full paper → ← Back to News