AI Benchmarks Are Lying to You? I Tested 8 Models

Next Tech and AI · Advanced ·🧠 Large Language Models ·3mo ago
Synthetic benchmarks are lying to you. When the newest "State of the Art" AI scores 100% on tests but fails to plan a safe mountain climb, those numbers are worthless. I threw the leaderboards in the trash and tested 8 top AI models on REAL problems to find the actual winner. In this video, I compare the biggest updates from OpenAI, Google, xAI, and Anthropic against open-source contenders and even a local model running offline on my PC. The results regarding ChatGPT-5.2 were shocking. 📥 Get my Test Prompts for FREE (No Paywall): https://www.patreon.com/posts/146852078/ 📺 Watch next: Why I…
Watch on YouTube ↗ (saves to browser)

Chapters (7)

Why benchmarks are lying
1:18 The 8 Models & Testing Methodology
2:09 ChatGPT-5.2
6:29 Gemini 3 Pro Thinking
8:20 Grok 4.1 Beta
9:14 Claude Opus 4.5
10:17 Perplexity (Th
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Next Up
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Dave Ebbelaar (LLM Eng)