Show HN: LLM-aided OCR – Correcting Tesseract OCR errors with LLMs

📰 Hacker News · eigenvalue

Almost exactly 1 year ago, I submitted something to HN about using Llama2 (which had just come out) to improve the output of Tesseract OCR by correcting obvious OCR errors [0]. That was exciting at the time because OpenAI's API calls were still quite expensive for GPT4, and the cost of running it on a book-length PDF would just be prohibitive. In contrast, you could run Llama2 locally on a machine with just a CPU, and it would be extremely slow, but "free" if you had a spare machine lying around. Well, it's amazing how things have changed since then. Not only have models gotten a lot better, but the latest "low tier" offerings from OpenAI (GPT4o-mini) and Anthropic (Claude3-Haiku) are incredibly cheap and incredibly fast. So cheap and fast, in fact, that you can now break the document up into little chunks and submit them to the API concurrently (where each chunk can go through a multi-stage process, in which the output of the first stage is passed into an

Published 9 Aug 2024
Read full article → ← Back to Reads