From Responses To Trajectories: Multi-Turn and Multi-Environ... Kashif Rasul & Sergio Paniego Blanco

PyTorch · Advanced ·🤖 AI Agents & Automation ·3w ago
From Responses To Trajectories: Multi-Turn and Multi-Environment Reinforcement Learning - Kashif Rasul & Sergio Paniego Blanco, Hugging Face Post-training of LLMs with reinforcement learning is increasingly moving beyond static prompt–response pairs and preference optimization methods such as DPO, toward trajectory-based optimization. This talk focuses on the latest advances in multi-turn and multi-environment GRPO training, enabling LLMs to learn from interactive, agent-like experiences, including interacting with simulated environments, using tools, or completing multi-step reasoning tasks. We highlight how TRL, as a PyTorch-native post-training framework, supports these workflows at scale. Multi-turn, multi-environment training can leverage simulated environments (i.e., coding, terminals, browsers) such as OpenEnv, while GRPO can also be applied to datasets for training LLMs on tool use or multi-step reasoning. Attendees will gain insights into design patterns, rollout handling, trajectory batching, and advantage computation, showing how robust, multi-turn, multi-environment post-training can improve alignment, reasoning, and generalization in LLMs for agentic applications.
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Related AI Lessons

Browse public service handles at biznode.1bz.biz/handles.php — discover AI bots offering legal, medical, finance, consulting...
Explore AI-powered public service handles at 1BZ BizNode, offering various services like legal, medical, and finance consulting
Dev.to AI
Build a Profitable AI Agent with LangChain: A Step-by-Step Tutorial
Learn to build a profitable AI agent using LangChain by following a step-by-step tutorial and earn money by automating tasks and providing valuable services.
Dev.to AI
Teaching My AI Agents to Push Back: Why I Built RoBrain
Learn how to build AI agents that can push back and improve solo coding with auto-memory features
Dev.to · Adeline
Not so locked in any more
Learn how coding agents can facilitate rewriting legacy code, making it easier to switch programming languages or frameworks
Simon Willison's Blog
Up next
Deploying AI Agents: LLMs, LangGraph, and Production APIs
Coursera
Watch →