DataEvolver: Automatic Data Preparation for Large Language Models through Multi-Level Self-Evolving

📰 ArXiv cs.AI

arXiv:2606.07001v1 Announce Type: cross Abstract: High-quality training data is essential to large language models (LLMs) and typically requires extensive and costly manual curation. Existing automatic data preparation methods rely on predefined pipelines or customized human instructions, which limits their adaptability to diverse data distributions and lacks principled guidance from high-quality examples. In this paper, we introduce DataEvolver, the first self-evolving data preparation system t

Published 8 Jun 2026

Read full paper → ← Back to Reads