Can You Trust an LLM Judge? An RL Researcher's Take
Zichen Liu from Dr. GRPO breaks down LLM-as-a-judge from an RL perspective:
why it's essentially a model-based reward function, how it compares to verification-based rewards, and why it can unlock dense rewards for reasoning tasks that rules simply can't verify.
yacine is still suspicious.
Watch on YouTube ↗
(saves to browser)
DeepCamp AI