Scaling the Next Paradigm of Heterogeneous Intelligence — Adrian Bertagnoli, Callosum

Name: Scaling the Next Paradigm of Heterogeneous Intelligence — Adrian Bertagnoli, Callosum
Uploaded: 2026-05-24T14:00:06Z
Channel: AI Engineer
Description: A mixture of Qwen 3 VL8B and Kimi K2.5 beat the state of the art on Video Web Arena, outperforming the leading GPT and Gemini models by 18 and 25 percen...

AI Engineer · Intermediate ·🧠 Large Language Models ·44m ago

Skills: LLM Foundations53%

A mixture of Qwen 3 VL8B and Kimi K2.5 beat the state of the art on Video Web Arena, outperforming the leading GPT and Gemini models by 18 and 25 percent while costing 3.7 times less and running 3 times faster. The reason it worked is that visual web navigation decomposes into subtasks that do not all need a frontier model: routing zoom and visual parsing to a smaller model alone produced 11x speed and 43x cost improvements on those steps. Adrian Bertagnoli from Callosum makes the case that the GPU cluster era of identical hardware and monolithic models is ending. Heterogeneous intelligence treats model architectures, chip types, and workflows as variables to optimize together. A second result: running recursive long context reasoning tasks on Cerebras instead of a frontier model cuts cost by 7x and latency by 5x while matching accuracy. Callosum is building the automation layer that routes tasks to the right chip and model without bespoke decisions for each subtask. Speaker info: - https://www.linkedin.com/in/adrian-bertagnoli-bb3467178/

Watch on YouTube ↗ (saves to browser)