Letโs Talk Tokens: AMA on Reinforcement Fine-Tuning (RFT), GRPO, and AI Rewards
๐ SUBSCRIBE for the latest on LLM fine-tuning, AI scaling, and reinforcement learning hacks!
๐https://www.youtube.com/@Predibase
Reinforcement Fine-Tuning (RFT) is no longer theoreticalโitโs powering real-world GenAI systems today.
In this live AMA, Predibaseโs AI experts answer audience questions on:
โ
When to use RFT over SFT
โ
How GRPO (Group Relative Policy Optimization) works
โ
Designing robust reward functions (and avoiding reward hacking)
โ
How much data you really need for text-to-SQL, codegen, and logic tasks
โ
Future of RFT in agentic workflows and enterprise GenAI systems
Thisโฆ
Watch on YouTube โ
(saves to browser)
Chapters (12)
Intro & Welcome
2:00
What Is RFT and Why It Matters Now
7:00
GRPO vs PPO vs RLHF Explained
11:00
Use Cases Best Suited for RFT
15:30
When to Use RFT vs SFT
20:00
How Much Data Do You Need for RFT?
25:00
GRPO Hyperparameters & Prompt Design
30:00
Reward Function Design & Best Practices
37:00
Common RFT Pitfalls and Reward Hacking
43:00
Handling Subjective Evaluations
48:00
Future of RFT: Agentic AI & Beyond
52:00
Final Thoughts + AMA Wrap-up
DeepCamp AI