Let’s Talk Tokens: AMA on Reinforcement Fine-Tuning (RFT), GRPO, and AI Rewards

Predibase by Rubrik · Advanced ·📄 Research Papers Explained ·10mo ago

Skills: RL Foundations80%

🔔 SUBSCRIBE for the latest on LLM fine-tuning, AI scaling, and reinforcement learning hacks! 👉https://www.youtube.com/@Predibase Reinforcement Fine-Tuning (RFT) is no longer theoretical—it’s powering real-world GenAI systems today. In this live AMA, Predibase’s AI experts answer audience questions on: ✅ When to use RFT over SFT ✅ How GRPO (Group Relative Policy Optimization) works ✅ Designing robust reward functions (and avoiding reward hacking) ✅ How much data you really need for text-to-SQL, codegen, and logic tasks ✅ Future of RFT in agentic workflows and enterprise GenAI systems This is a deep dive into best practices, landmines, and emerging strategies from the engineers behind the DeepLearning.AI course on RFT + GRPO and the Predibase RFT platform. 🧠 Speakers: • Travis Addair – CTO & Co-founder, Predibase • Arnav Garg – ML Engineering Lead, Predibase • Ajinkya Tejankar – Senior Research Engineer, Predibase 🔗 Try Predibase’s RFT Platform: https://predibase.com/free-trial 👉 Schedule a live demo: https://pbase.ai/41FZKfy 00:00 — Intro & Welcome 02:00 — What Is RFT and Why It Matters Now 07:00 — GRPO vs PPO vs RLHF Explained 11:00 — Use Cases Best Suited for RFT 15:30 — When to Use RFT vs SFT 20:00 — How Much Data Do You Need for RFT? 25:00 — GRPO Hyperparameters & Prompt Design 30:00 — Reward Function Design & Best Practices 37:00 — Common RFT Pitfalls and Reward Hacking 43:00 — Handling Subjective Evaluations 48:00 — Future of RFT: Agentic AI & Beyond 52:00 — Final Thoughts + AMA Wrap-up #reinforcementlearning #rft #grpo #aitraining #genai #llms #finetuning #machinelearning #agenticai #RLHF #predibase #AIinProduction #llminference

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

More on: RL Foundations

View skill →

Build a Doom AI Model with Python | Gaming Reinforcement Learning Full Course

Build a Doom AI Model with Python | Gaming Reinforcement Learning Full Course

Nicholas Renotte

Deep Reinforcement Learning for Atari Games Python Tutorial | AI Plays Space Invaders

Deep Reinforcement Learning for Atari Games Python Tutorial | AI Plays Space Invaders

Nicholas Renotte

Training & Testing Deep reinforcement learning (DQN) Agent - Reinforcement Learning p.6

Training & Testing Deep reinforcement learning (DQN) Agent - Reinforcement Learning p.6

Build a Game Bot (LIVE)

Build a Game Bot (LIVE)

How to Win Slot Machines - Intro to Deep Learning #13

How to Win Slot Machines - Intro to Deep Learning #13

Build an Mario AI Model with Python | Gaming Reinforcement Learning

Build an Mario AI Model with Python | Gaming Reinforcement Learning

Nicholas Renotte

Related AI Lessons

The ABCs of reading medical research and review papers these days

Learn to critically evaluate medical research papers by accepting nothing at face value, believing no one blindly, and checking everything

#1 DevLog Meta-research: I Got Tired of Tab Chaos While Reading Research Papers.

Learn to manage research paper tabs efficiently and apply meta-research techniques to improve productivity

How to Set Up a Karpathy-Style Wiki for Your Research Field

Learn to set up a Karpathy-style wiki for your research field to organize and share knowledge effectively

The Non-Optimality of Scientific Knowledge: Path Dependence, Lock-In, and The Local Minimum Trap

Scientific knowledge may be stuck in a local minimum, hindering optimal progress, and understanding this concept is crucial for advancing research

Chapters (12)

Intro & Welcome

2:00 What Is RFT and Why It Matters Now

7:00 GRPO vs PPO vs RLHF Explained

11:00 Use Cases Best Suited for RFT

15:30 When to Use RFT vs SFT

20:00 How Much Data Do You Need for RFT?

25:00 GRPO Hyperparameters & Prompt Design

30:00 Reward Function Design & Best Practices

37:00 Common RFT Pitfalls and Reward Hacking

43:00 Handling Subjective Evaluations

48:00 Future of RFT: Agentic AI & Beyond

52:00 Final Thoughts + AMA Wrap-up

Microsoft Research Forum | Season 2, Episode 4

Microsoft Research