Scaling the Next Paradigm of Heterogeneous Intelligence — Adrian Bertagnoli, Callosum
Skills:
LLM Foundations53%
A mixture of Qwen 3 VL8B and Kimi K2.5 beat the state of the art on Video Web Arena, outperforming the leading GPT and Gemini models by 18 and 25 percent while costing 3.7 times less and running 3 times faster. The reason it worked is that visual web navigation decomposes into subtasks that do not all need a frontier model: routing zoom and visual parsing to a smaller model alone produced 11x speed and 43x cost improvements on those steps.
Adrian Bertagnoli from Callosum makes the case that the GPU cluster era of identical hardware and monolithic models is ending. Heterogeneous intelligence treats model architectures, chip types, and workflows as variables to optimize together. A second result: running recursive long context reasoning tasks on Cerebras instead of a frontier model cuts cost by 7x and latency by 5x while matching accuracy. Callosum is building the automation layer that routes tasks to the right chip and model without bespoke decisions for each subtask.
Speaker info:
- https://www.linkedin.com/in/adrian-bertagnoli-bb3467178/
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
More on: LLM Foundations
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
Qwen 3.7 Max Developer Guide: 1M Context, $2.50/MTok, and the Anthropic-Protocol Drop-In (2026)
Dev.to AI
Gemma 4 and the Politics of Local AI
Dev.to · Ashmeet
🔥 What’s Happening in Tech World Right Now? — AI, React 19, GPT-4o & More
Dev.to · Prem Gaikwad
The One Word Change That Made My AI Images Look Professional
Medium · AI
🎓
Tutor Explanation
DeepCamp AI