Tracing GRPO's Biased Objective Back to DeepSeek Math

Deep Learning with Yacine · Intermediate ·🛡️ AI Safety & Ethics ·2mo ago

Skills: AI Alignment Basics90%AI Safety Engineering80%

Zichen Liu, author of Dr. GRPO, walks through where the length normalization term in the standard GRPO formulation originates — the DeepSeek Math paper's equation and the common implementation choice of averaging loss over the token axis instead of summing. This biased formulation propagated through follow-up papers and major open-source libraries like TRL, OpenRLHF, and verl. amazing man wow.

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

More on: AI Alignment Basics

View skill →

Interpretable machine learning applications: Part 5

Interpretable machine learning applications: Part 5

GenAI news from Weights & Biases CEO, Lukas Biewald

GenAI news from Weights & Biases CEO, Lukas Biewald

Weights & Biases

Responsible AI Winners, 2020 PyTorch Summer Hackathon

Responsible AI Winners, 2020 PyTorch Summer Hackathon

Near Real-Time Analytics to GenAI Centralized Observability | Amazon Web Services

Near Real-Time Analytics to GenAI Centralized Observability | Amazon Web Services

Amazon Web Services

Kiro Hooks | Event-Driven Automation for Your IDE | Amazon Web Services

Kiro Hooks | Event-Driven Automation for Your IDE | Amazon Web Services

Amazon Web Services

Get Started with Raven AGI

Get Started with Raven AGI

Related AI Lessons

New Jersey’s 2026 AI Push

New Jersey advances AI legislation to combat deepfakes with harsher penalties, including up to 5 years imprisonment and $30,000 fines

The Empathy Algorithm Paradox

Learn how AI-personalized feedback affects authentic leadership vulnerability in diversity, equity, and inclusion, and why it matters for organizational growth

Your Future Has Already Happened

Explore the concept of a world with perfect prediction and its implications on personal autonomy and decision-making

O19# They Can Find You to Collect. They Can’t Find You to Save.

AI debt collection systems prioritize tracking debt over human well-being, highlighting a design flaw in their implementation

Guiding the AI disruption to the Good Place

Microsoft Research