📰 Dev.to · benzsevern

17 articles · Updated every 3 hours · View all reads

All Articles 111,279 Blog Posts 121,416 Tech Tutorials 28,395 Research Papers 22,452 News 16,647 ⚡ AI Lessons

Wagner Was on OFAC in 2018: What 10 Years of Sanctions Data Reveals

Reconciled 85 sanctions lists + 10 years of OFAC history + a 13M-wallet attribution graph. Wagner was listed in 2018; 18% of designations get reversed.

Dev.to · benzsevern 2mo ago

infermap Now Runs in TypeScript: Schema Mapping on the Edge

You get a CSV from a vendor. The columns are fname, lname, tel, addr1. Your database expects...

Dev.to · benzsevern 🔐 Cybersecurity ⚡ AI Lesson 2mo ago

Reconciling 15 OSS Vulnerability Databases: What They Actually Cover

Cross-database ER across OSV, GHSA, PyPA, RustSec, Go vulndb — 869k records, 608k canonical vulns, and one structural blind spot.

Dev.to · benzsevern 📊 Data Analytics & Business Intelligence ⚡ AI Lesson 2mo ago

Wallet Attribution at Scale: ER on 13M Blockchain Records

Running entity resolution across 10 public blockchain attribution datasets surfaces cross-jurisdictional sanctions and universal infrastructure patterns.

Dev.to · benzsevern 2mo ago

The OSS ER Bargain: What Entity Resolution Actually Costs You

The OSS ER Bargain: What Entity Resolution Actually Costs You Benchmarking dedupe vs...

Dev.to · benzsevern 🤖 AI Agents & Automation ⚡ AI Lesson 3mo ago

Golden Suite + MCP: Giving AI Agents a Data Cleaning Toolkit

An AI agent can write SQL, draft an email, and refactor a repo. Ask it to deduplicate a 50,000-row...

Dev.to · benzsevern 📊 Data Analytics & Business Intelligence ⚡ AI Lesson 3mo ago

From Dirty CSV to Golden Records: A Python Walkthrough

Download a government CSV, load it into pandas, and you'll find "MEMORIAL HOSPITAL" listed twelve...

Dev.to · benzsevern 3mo ago

GoldenMatch vs. Splink vs. Dedupe vs. RecordLinkage: A Practical Comparison

We ran four Python entity resolution libraries on the same three datasets — Febrl, DBLP-ACM, and 10K real voter records. Here's where each shines.

Dev.to · benzsevern 🧠 Large Language Models ⚡ AI Lesson 3mo ago

GoldenMatch vs. BPID: Testing Against an EMNLP Benchmark

We benchmarked GoldenMatch on Amazon's BPID dataset — 10,000 adversarial PII pairs. With DOB parsing and Vertex AI embeddings, we hit 0.750 F1 — matching Ditto

Dev.to · benzsevern 🧠 Large Language Models ⚡ AI Lesson 3mo ago

Deduplicating 401,000 Equipment Auction Records with LLM Calibration

We ran GoldenMatch on 401,125 bulldozer auction records from Kaggle. Iterative LLM calibration learned the optimal match threshold from just 200 pairs (~$0.01).

Dev.to · benzsevern 🧠 Large Language Models ⚡ AI Lesson 3mo ago

AI-Powered Deduplication: How LLMs Supercharge the Golden Suite

Enable LLM boost across GoldenCheck, GoldenFlow, and GoldenMatch to catch what fuzzy matching misses — with real costs under $0.10.

Dev.to · benzsevern 📊 Data Analytics & Business Intelligence ⚡ AI Lesson 3mo ago

Getting Started with GoldenPipe: Clean Data in Your Python Backend

Add a production-ready data quality pipeline to your Python backend in 5 minutes. One pip install, one function call, zero config.

Dev.to · benzsevern 📊 Data Analytics & Business Intelligence ⚡ AI Lesson 3mo ago

Entity Resolution on 208,000 Real Records with the Golden Suite

We ran the full Golden Suite pipeline on 208,505 real NC voter registration records. 61 quality findings, 197K addresses cleaned, 10,718 duplicate clusters foun

Dev.to · benzsevern 📊 Data Analytics & Business Intelligence ⚡ AI Lesson 3mo ago

10 Data Problems Every Pipeline Hits (and the One-Liner Fixes)

The same 10 data quality issues show up in every dataset. Here's what they look like and how to fix each in one line.

Dev.to · benzsevern 📊 Data Analytics & Business Intelligence ⚡ AI Lesson 3mo ago

Two Hospitals Matched Patient Records Without Sharing a Single Name

Privacy-preserving record linkage with bloom filters. 92% accuracy. Zero raw data exchanged.

Dev.to · benzsevern 📊 Data Analytics & Business Intelligence ⚡ AI Lesson 3mo ago

I Deduplicated 100K Records in 12 Seconds With One Command

How GoldenMatch auto-detects columns, picks scoring algorithms, and hits 97% F1 with zero configuration.

Dev.to · benzsevern 📊 Data Analytics & Business Intelligence ⚡ AI Lesson 3mo ago

How to Deduplicate 100,000 Records in 13 Seconds with Python

You have a CSV with duplicate records. Maybe it's customer data exported from two CRMs, a product...