HARP: Efficient Data Selection for Finetuning Large Language Models
📰 ArXiv cs.AI
arXiv:2606.07690v1 Announce Type: cross Abstract: Finetuning data selection requires balancing two competing goals: selecting examples that improve the downstream objective, and doing so without repeatedly finetuning models. Train-free selectors are scalable but rely on proxies such as embedding similarity or clustering, which may not match the target objective. Train-based selectors better reflect downstream utility through gradient signals, subset evaluation, or Shapley attribution, but requir
DeepCamp AI