Data Selection for Multi-turn Dialogue Instruction Tuning

📰 ArXiv cs.AI

arXiv:2604.07892v2 Announce Type: replace-cross Abstract: Instruction-tuned language models increasingly rely on large multi-turn dialogue corpora, but these datasets are often noisy and structurally inconsistent, with topic drift, repetitive chitchat, and mismatched answer formats across turns. We address this from a data selection perspective and propose \textbf{MDS} (Multi-turn Dialogue Selection), a dialogue-level framework that scores whole conversations rather than isolated turns. MDS comb

Published 14 Apr 2026

Read full paper → ← Back to Reads