Launch HN: Pulse (YC S24) – Production-grade unstructured document extraction

📰 Hacker News · sidmanchkanti21

Hi HN, we’re Sid and Ritvik, co-founders of Pulse ( https://www.runpulse.com/ ). Pulse is a document extraction system to create LLM-ready text using hybrid VLM + OCR models. Here’s a demo video: https://video.runpulse.com/video/pulse-platform-walkthrough-... . Later in this post, you’ll find links to before-and-after examples on particularly tricky cases. Check those out to see what Pulse can really do! Modern vision language models are great at producing plausible text, but that makes them risky for OCR and data ingestion. Plausibility isn’t good enough when you need accuracy. When we started working on document extraction, we assumed the same thing many teams do: foundation models are improving quickly, multi-modal systems appear to read documents well, what’s not to like? And indeed, for small or clean inputs, those assumptions mostly give good results. However, limitations show up once you begin processing real documents in volume. Long PDFs, den

Published 18 Dec 2025
Read full article → ← Back to Reads