Training LLM to play chess using Deepseek GRPO reinforcement learning

Name: Training LLM to play chess using Deepseek GRPO reinforcement learning
Uploaded: 2025-03-01T23:58:17+00:00
Channel: Efficient NLP
Description: Try Voice Writer - speak your thoughts and let AI handle the grammar: https://voicewriter.io In this video, we see how popular LLMs like GPT-4o, o1 Reas...

Efficient NLP · Beginner ·🧠 Large Language Models ·1y ago

Try Voice Writer - speak your thoughts and let AI handle the grammar: https://voicewriter.io In this video, we see how popular LLMs like GPT-4o, o1 Reasoning, and DeepSeek R1 show some understanding of chess, they often fail to play legal moves. To address this, we train our own reasoning-focused chess LLM using the Group Relative Policy Optimization (GRPO) method introduced in DeepSeek R1. We walk through how GRPO differs from traditional PPO (Proximal Policy Optimization) and fine-tune LLaMA 8B and Qwen 7B using TRL (Transformers Reinforcement Learning) and Unsloth libraries - the results a…

Watch on YouTube ↗ (saves to browser)