Do LLMs Know When They're Wrong?

Martin Andrews · Beginner ·📄 Research Papers Explained ·8mo ago
We're moving past LLMs that just predict the next word. Discover a new frontier: models that can gauge their own uncertainty to improve reasoning. This video explores two brand new papers that turn the "Entropix" meme into practical, working code. Current methods like Chain-of-Thought are powerful, but they are essentially a model "thinking out loud." What if a model could recognize when it's on a bad path and correct itself? This is the core idea behind using token entropy and logprobs as a "confidence" signal. This video is for the AI builder, developer, and enthusiast who wants to look under the hood. We break down the history of this idea (from OpenAI's o-1 hints to Twitter theories) and then dive into the mechanics of two pivotal papers: 1. **ARPO**: Agentic Reinforced Policy Optimization 2. **Deep Think with Confidence**: A practical vLLM implementation from Meta By the end, you'll understand not just *what* LLM confidence is, but *how* it works, and *why* it's a compelling direction for building more capable and efficient agentic systems. --- ### Papers & Resources Mentioned * [ARPO : Agentic Reinforced Policy Optimization (Dong et al., 2025)](https://arxiv.org/abs/2507.19849) + [ARPO GitHub Repo](https://github.com/dongguanting/ARPO) * [Deep Think with Confidence (Fu et al., 2025)](https://arxiv.org/abs/2508.15260) + [DeepThink Project Page (Meta AI)](https://jiaweizzhao.github.io/deepconf/) + [DeepThink Pull Request for vLLM](https://github.com/vllm-project/vllm/pull/23201) * [OpenAI o-1 Blog Post](https://openai.com/index/learning-to-reason-with-llms/) + [Let's Verify Step-by-Step (OpenAI, 2023)](https://arxiv.org/abs/2305.20050) * [ICML 2024 Tutorial: Physics of Language Models](https://www.youtube.com/watch?v=yBL7J0kgldU) --- ### Chapters 00:00 - Introduction: The Idea of LLM Confidence 00:31 - Background: From OpenAI's o-1 to the "Entropix" Meme 05:26 - Paper 1: ARPO & Agentic Rollout Confidence 07:55 - Paper 2: Meta's "Deep Think wi
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Related AI Lessons

The ABCs of reading medical research and review papers these days
Learn to critically evaluate medical research papers by accepting nothing at face value, believing no one blindly, and checking everything
Medium · LLM
#1 DevLog Meta-research: I Got Tired of Tab Chaos While Reading Research Papers.
Learn to manage research paper tabs efficiently and apply meta-research techniques to improve productivity
Dev.to AI
How to Set Up a Karpathy-Style Wiki for Your Research Field
Learn to set up a Karpathy-style wiki for your research field to organize and share knowledge effectively
Medium · AI
The Non-Optimality of Scientific Knowledge: Path Dependence, Lock-In, and The Local Minimum Trap
Scientific knowledge may be stuck in a local minimum, hindering optimal progress, and understanding this concept is crucial for advancing research
ArXiv cs.AI

Chapters (4)

Introduction: The Idea of LLM Confidence
0:31 Background: From OpenAI's o-1 to the "Entropix" Meme
5:26 Paper 1: ARPO & Agentic Rollout Confidence
7:55 Paper 2: Meta's "Deep Think wi
Up next
Microsoft Research Forum | Season 2, Episode 4
Microsoft Research
Watch →