HARP: Efficient Data Selection for Finetuning Large Language Models

📰 ArXiv cs.AI

arXiv:2606.07690v1 Announce Type: cross Abstract: Finetuning data selection requires balancing two competing goals: selecting examples that improve the downstream objective, and doing so without repeatedly finetuning models. Train-free selectors are scalable but rely on proxies such as embedding similarity or clustering, which may not match the target objective. Train-based selectors better reflect downstream utility through gradient signals, subset evaluation, or Shapley attribution, but requir

Published 9 Jun 2026

Read full paper → ← Back to Reads