I built the first open benchmark for federal contracting AI. Here's what it shows about frontier LLMs.

📰 Dev.to · Raihan

Frontier LLMs hallucinate FAR clause numbers somewhere between 0% and 32% of the time. A specialized 150M-parameter model trained in 4 minutes matches Claude Haiku on F1 with less than half the hallucination rate. Open dataset, open model, reproducible.

Published 12 May 2026