Debjyoti Paul - Learning to Act Reinforcement Learning for Agentic LLM Systems

Cohere · Intermediate ·🛡️ AI Safety & Ethics ·1mo ago
Large Language Models (LLMs) have demonstrated impressive reasoning and generation abilities, but building agentic systems—AI that can plan, use tools, interact with environments, and achieve goals autonomously—requires more than prompting. A key challenge is enabling these systems to learn how to act, not just how to respond. This talk explores how Reinforcement Learning (RL) can transform LLMs into effective decision-making agents. We examine the architecture of modern agentic systems where LLMs serve as planners and reasoning engines, while RL provides the feedback loop that enables continuous improvement through interaction with tools, APIs, and external environments. The session will walk through practical design patterns for integrating RL with LLM-based agents, including task decomposition, action selection, tool execution, and reward shaping. We will discuss how RL techniques such as policy optimization and reward modeling can help agents improve planning, reduce hallucinations, and learn reliable strategies for complex multi-step tasks. Using concrete examples—from automated workflows to multi-step information retrieval and decision-making systems—we illustrate how RL-driven feedback can improve agent performance over time. We also discuss common challenges, including reward design, exploration, stability, and evaluation of agent behavior. By the end of the talk, attendees will gain a practical understanding of how to design self-improving agentic AI systems that combine the reasoning capabilities of LLMs with the learning dynamics of reinforcement learning. Debjyoti is a Data Scientist at Amazon with over 9 years of industrial experience in Natural Language Processing (NLP), Large Language Models (LLMs), and Agentic AI and Responsible AI, Currently Debjyoti is Leading Agentic development and actively working on Agent learning primarily focusing on Agentic System improvement from Context Engineering to RL based Learning framework. Prior to this Debjyoti
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Related AI Lessons

Behind the Scenes Hardening Firefox with Claude Mythos Preview
Learn how Mozilla used Claude Mythos to identify and fix hundreds of vulnerabilities in Firefox, improving browser security
Simon Willison's Blog
AI Alignment Might Be Optimizing the Wrong Objective
AI alignment might be optimizing the wrong objective, highlighting the need to redefine what alignment means and how it's achieved
Medium · AI
AI Alignment Might Be Optimizing the Wrong Objective
AI alignment might be optimizing the wrong objective, highlighting the need to redefine what alignment means and how it's achieved
Medium · Machine Learning
Cognitive Surrender: how much thinking should leaders outsource to AI?
Learn how leaders can effectively balance AI-driven insights with human judgment to avoid cognitive surrender
Medium · Data Science
Up next
Why you can’t love all animals and still eat meat
Vox
Watch →