GPT-5.1 scored 26%. Gemini 3 Flash scored 74%. Same prompt, same tools.

📰 Dev.to · ThomasP

In the previous article, I explained how we built the evaluation infrastructure for our AI agent: a...

Published 28 Mar 2026
Read full article → ← Back to Reads