OCR vs. Image Embeddings for PDF RAG: Which One is Better?

Weaviate vector database ยท Beginner ยท๐Ÿ” RAG & Vector Search ยท1mo ago
Skills: RAG Basics90%
My colleagues at Weaviate released IRPAPERS, a benchmark comparing ๐—ถ๐—บ๐—ฎ๐—ด๐—ฒ-๐—ฏ๐—ฎ๐˜€๐—ฒ๐—ฑ and ๐˜๐—ฒ๐˜…๐˜-๐—ฏ๐—ฎ๐˜€๐—ฒ๐—ฑ retrieval over 3,230 pages from 166 scientific papers. The setup: Take the same PDFs and process them two ways. For text, run OCR with GPT-4.1 and embed with Arctic 2.0 + BM25 hybrid search. For images, embed raw page images with ColModernVBERT multi-vector embeddings. Test both on 180 needle-in-the-haystack questions. ๐—ง๐—ต๐—ฒ ๐—ฟ๐—ฒ๐˜€๐˜‚๐—น๐˜๐˜€: Text edges out images at the top rank: 46% vs 43% Recall@1 But images match or exceed text at deeper recall: 93% vs 91% Recall@20 But text and image based methods actually fail on ๐˜ฅ๐˜ช๐˜ง๐˜ง๐˜ฆ๐˜ณ๐˜ฆ๐˜ฏ๐˜ ๐˜ฒ๐˜ถ๐˜ฆ๐˜ณ๐˜ช๐˜ฆ๐˜ด. At Recall@1: โ€ข 22 queries succeed with text but fail with images โ€ข 18 queries succeed with images but fail with text This complementarity is what makes ๐— ๐˜‚๐—น๐˜๐—ถ๐—บ๐—ผ๐—ฑ๐—ฎ๐—น ๐—›๐˜†๐—ฏ๐—ฟ๐—ถ๐—ฑ ๐—ฆ๐—ฒ๐—ฎ๐—ฟ๐—ฐ๐—ต work. By fusing scores from both text and image retrieval, they achieved: โ€ข 49% Recall@1 (beating either modality alone) โ€ข 81% Recall@5 โ€ข 95% Recall@20 00:00 - Intro 00:08 - Visual- vs Text-based methods 01:04 - The IRPapers dataset 01:59 - The 6 different search strategies 03:43 - The results 04:30 - The paper's most interesting finding... 05:11 - Conclusion
Watch on YouTube โ†— (saves to browser)
Sign in to unlock AI tutor explanation ยท โšก30

Related AI Lessons

โšก
What is RAG and How Does It Work with Modern AI Systems?
Learn about RAG, a key architecture pattern for enterprise AI and coding agents, and how it works with modern AI systems
Medium ยท AI
โšก
Limits of RAG and implications for self-hosted AI
Learn the limitations of Retrieval-Augmented Generation (RAG) and their implications for self-hosted AI, understanding that scalability is not infinite
Medium ยท RAG
โšก
Best Vector Databases for RAG (Free & Paid)
Learn about the best vector databases for RAG to enable large language models to interact with private and domain-specific information
Medium ยท RAG
โšก
Retrieval-Augmented Generation: The Architecture That Made AI Actually Useful in Production
Learn about Retrieval-Augmented Generation (RAG), the AI architecture that enables useful AI applications in production, and how to implement it
Medium ยท RAG

Chapters (7)

Intro
0:08 Visual- vs Text-based methods
1:04 The IRPapers dataset
1:59 The 6 different search strategies
3:43 The results
4:30 The paper's most interesting finding...
5:11 Conclusion
Up next
Watch this before applying for jobs as a developer.
Tech With Tim
Watch โ†’