Deduplicating 401,000 Equipment Auction Records with LLM Calibration

📰 Dev.to · benzsevern

We ran GoldenMatch on 401,125 bulldozer auction records from Kaggle. Iterative LLM calibration learned the optimal match threshold from just 200 pairs (~$0.01). ANN hybrid blocking recovered 949 records that string blocking missed.

Published 4 Apr 2026
Read full article → ← Back to Reads