Letโ€™s Talk Tokens: AMA on Reinforcement Fine-Tuning (RFT), GRPO, and AI Rewards

Predibase by Rubrik ยท Advanced ยท๐Ÿ“„ Research Papers Explained ยท9mo ago
๐Ÿ”” SUBSCRIBE for the latest on LLM fine-tuning, AI scaling, and reinforcement learning hacks! ๐Ÿ‘‰https://www.youtube.com/@Predibase Reinforcement Fine-Tuning (RFT) is no longer theoreticalโ€”itโ€™s powering real-world GenAI systems today. In this live AMA, Predibaseโ€™s AI experts answer audience questions on: โœ… When to use RFT over SFT โœ… How GRPO (Group Relative Policy Optimization) works โœ… Designing robust reward functions (and avoiding reward hacking) โœ… How much data you really need for text-to-SQL, codegen, and logic tasks โœ… Future of RFT in agentic workflows and enterprise GenAI systems Thisโ€ฆ
Watch on YouTube โ†— (saves to browser)

Chapters (12)

Intro & Welcome
2:00 What Is RFT and Why It Matters Now
7:00 GRPO vs PPO vs RLHF Explained
11:00 Use Cases Best Suited for RFT
15:30 When to Use RFT vs SFT
20:00 How Much Data Do You Need for RFT?
25:00 GRPO Hyperparameters & Prompt Design
30:00 Reward Function Design & Best Practices
37:00 Common RFT Pitfalls and Reward Hacking
43:00 Handling Subjective Evaluations
48:00 Future of RFT: Agentic AI & Beyond
52:00 Final Thoughts + AMA Wrap-up
The Complete Guide to Blogging for a Living in 2022
Next Up
The Complete Guide to Blogging for a Living in 2022
Income School