Letโ€™s Talk Tokens: AMA on Reinforcement Fine-Tuning (RFT), GRPO, and AI Rewards

Predibase by Rubrik ยท Advanced ยท๐Ÿ“„ Research Papers Explained ยท10mo ago
๐Ÿ”” SUBSCRIBE for the latest on LLM fine-tuning, AI scaling, and reinforcement learning hacks! ๐Ÿ‘‰https://www.youtube.com/@Predibase Reinforcement Fine-Tuning (RFT) is no longer theoreticalโ€”itโ€™s powering real-world GenAI systems today. In this live AMA, Predibaseโ€™s AI experts answer audience questions on: โœ… When to use RFT over SFT โœ… How GRPO (Group Relative Policy Optimization) works โœ… Designing robust reward functions (and avoiding reward hacking) โœ… How much data you really need for text-to-SQL, codegen, and logic tasks โœ… Future of RFT in agentic workflows and enterprise GenAI systems This is a deep dive into best practices, landmines, and emerging strategies from the engineers behind the DeepLearning.AI course on RFT + GRPO and the Predibase RFT platform. ๐Ÿง  Speakers: โ€ข Travis Addair โ€“ CTO & Co-founder, Predibase โ€ข Arnav Garg โ€“ ML Engineering Lead, Predibase โ€ข Ajinkya Tejankar โ€“ Senior Research Engineer, Predibase ๐Ÿ”— Try Predibaseโ€™s RFT Platform: https://predibase.com/free-trial ๐Ÿ‘‰ Schedule a live demo: https://pbase.ai/41FZKfy 00:00 โ€” Intro & Welcome 02:00 โ€” What Is RFT and Why It Matters Now 07:00 โ€” GRPO vs PPO vs RLHF Explained 11:00 โ€” Use Cases Best Suited for RFT 15:30 โ€” When to Use RFT vs SFT 20:00 โ€” How Much Data Do You Need for RFT? 25:00 โ€” GRPO Hyperparameters & Prompt Design 30:00 โ€” Reward Function Design & Best Practices 37:00 โ€” Common RFT Pitfalls and Reward Hacking 43:00 โ€” Handling Subjective Evaluations 48:00 โ€” Future of RFT: Agentic AI & Beyond 52:00 โ€” Final Thoughts + AMA Wrap-up #reinforcementlearning #rft #grpo #aitraining #genai #llms #finetuning #machinelearning #agenticai #RLHF #predibase #AIinProduction #llminference
Watch on YouTube โ†— (saves to browser)
Sign in to unlock AI tutor explanation ยท โšก30

Related AI Lessons

โšก
The ABCs of reading medical research and review papers these days
Learn to critically evaluate medical research papers by accepting nothing at face value, believing no one blindly, and checking everything
Medium ยท LLM
โšก
#1 DevLog Meta-research: I Got Tired of Tab Chaos While Reading Research Papers.
Learn to manage research paper tabs efficiently and apply meta-research techniques to improve productivity
Dev.to AI
โšก
How to Set Up a Karpathy-Style Wiki for Your Research Field
Learn to set up a Karpathy-style wiki for your research field to organize and share knowledge effectively
Medium ยท AI
โšก
The Non-Optimality of Scientific Knowledge: Path Dependence, Lock-In, and The Local Minimum Trap
Scientific knowledge may be stuck in a local minimum, hindering optimal progress, and understanding this concept is crucial for advancing research
ArXiv cs.AI

Chapters (12)

Intro & Welcome
2:00 What Is RFT and Why It Matters Now
7:00 GRPO vs PPO vs RLHF Explained
11:00 Use Cases Best Suited for RFT
15:30 When to Use RFT vs SFT
20:00 How Much Data Do You Need for RFT?
25:00 GRPO Hyperparameters & Prompt Design
30:00 Reward Function Design & Best Practices
37:00 Common RFT Pitfalls and Reward Hacking
43:00 Handling Subjective Evaluations
48:00 Future of RFT: Agentic AI & Beyond
52:00 Final Thoughts + AMA Wrap-up
Up next
Microsoft Research Forum | Season 2, Episode 4
Microsoft Research
Watch โ†’