Do LLMs Know When They're Wrong?
We're moving past LLMs that just predict the next word. Discover a new frontier: models that can gauge their own uncertainty to improve reasoning. This video explores two brand new papers that turn the "Entropix" meme into practical, working code.
Current methods like Chain-of-Thought are powerful, but they are essentially a model "thinking out loud." What if a model could recognize when it's on a bad path and correct itself? This is the core idea behind using token entropy and logprobs as a "confidence" signal.
This video is for the AI builder, developer, and enthusiast who wants to look under the hood. We break down the history of this idea (from OpenAI's o-1 hints to Twitter theories) and then dive into the mechanics of two pivotal papers:
1. **ARPO**: Agentic Reinforced Policy Optimization
2. **Deep Think with Confidence**: A practical vLLM implementation from Meta
By the end, you'll understand not just *what* LLM confidence is, but *how* it works, and *why* it's a compelling direction for building more capable and efficient agentic systems.
---
### Papers & Resources Mentioned
* [ARPO : Agentic Reinforced Policy Optimization (Dong et al., 2025)](https://arxiv.org/abs/2507.19849)
+ [ARPO GitHub Repo](https://github.com/dongguanting/ARPO)
* [Deep Think with Confidence (Fu et al., 2025)](https://arxiv.org/abs/2508.15260)
+ [DeepThink Project Page (Meta AI)](https://jiaweizzhao.github.io/deepconf/)
+ [DeepThink Pull Request for vLLM](https://github.com/vllm-project/vllm/pull/23201)
* [OpenAI o-1 Blog Post](https://openai.com/index/learning-to-reason-with-llms/)
+ [Let's Verify Step-by-Step (OpenAI, 2023)](https://arxiv.org/abs/2305.20050)
* [ICML 2024 Tutorial: Physics of Language Models](https://www.youtube.com/watch?v=yBL7J0kgldU)
---
### Chapters
00:00 - Introduction: The Idea of LLM Confidence
00:31 - Background: From OpenAI's o-1 to the "Entropix" Meme
05:26 - Paper 1: ARPO & Agentic Rollout Confidence
07:55 - Paper 2: Meta's "Deep Think wi
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
More on: Research Methods
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
The ABCs of reading medical research and review papers these days
Medium · LLM
#1 DevLog Meta-research: I Got Tired of Tab Chaos While Reading Research Papers.
Dev.to AI
How to Set Up a Karpathy-Style Wiki for Your Research Field
Medium · AI
The Non-Optimality of Scientific Knowledge: Path Dependence, Lock-In, and The Local Minimum Trap
ArXiv cs.AI
Chapters (4)
Introduction: The Idea of LLM Confidence
0:31
Background: From OpenAI's o-1 to the "Entropix" Meme
5:26
Paper 1: ARPO & Agentic Rollout Confidence
7:55
Paper 2: Meta's "Deep Think wi
🎓
Tutor Explanation
DeepCamp AI