Scaling laws for reward model overoptimization

📰 OpenAI News
Published 19 Oct 2022
Read full article → ← Back to News