ActiveDPO: Active Direct Preference Optimization for Sample-Efficient Alignment

📰 ArXiv cs.AI

arXiv:2505.19241v2 Announce Type: replace-cross Abstract: The recent success in using human preferences to align large language models (LLMs) has significantly improved their performance in various downstream tasks, such as question answering, mathematical reasoning, and code generation. However, achieving effective LLM alignment depends on high-quality datasets of human preferences. Collecting these datasets requires human preference annotation, which is costly and resource-intensive, necessita

Published 18 May 2026
Read full paper → ← Back to Reads