Preprocessing Unstructured Data for LLM Applications

External: Coursera Courses ↗ · Coursera

Open Course on External: Coursera

Free to audit · Opens on External: Coursera

Preprocessing Unstructured Data for LLM Applications

Coursera · Intermediate ·🧠 Large Language Models ·2mo ago
Enhancing a RAG system’s performance depends on efficiently processing diverse unstructured data sources. In this course, you’ll learn techniques for representing all sorts of unstructured data, like text, images, and tables, from many different sources and implement them to extend your LLM RAG pipeline to include Excel, Word, PowerPoint, PDF, and EPUB files. 1. How to preprocess data for your LLM application development, focusing on how to work with different document types. 2. How to extract and normalize various documents into a common JSON format and enrich it with metadata to improve search results. 3. Techniques for document image analysis, including layout detection and vision transformers, to extract and understand PDFs, images, and tables. 4. How to build a RAG bot that is able to ingest different documents like PDFs, PowerPoints, and Markdown files. Apply the skills you’ll learn in this course to real-world scenarios, enhancing your RAG application and expanding its versatility.
Watch on External: Coursera ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Related AI Lessons

Eight Hours of Binary Sarcasm: My AI's Ordinary Day
Learn how an AI assistant spends its day answering questions and negotiating with computers in a humorous take on AI operations
Dev.to AI
Local LLM Inference, 1-Bit Image Generation, and Codex Dev Tooling Innovations
Learn about the latest innovations in local LLM inference, 1-bit image generation, and Codex dev tooling, and how they can improve your AI development workflow
Dev.to · soy
The art of chaining AI models for complex tasks
Learn to chain AI models for complex tasks and improve productivity by 30%
Dev.to · eternalsix
I Published an AI Memory Result. Then Real Retrieval Broke Everything.
Learn how to handle real-world retrieval challenges in AI memory systems and apply fixes to improve performance
Dev.to · Self-Correcting Systems
Up next
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Dave Ebbelaar (LLM Eng)
Watch →