Local RAG in 1.3s: LangGraph + Ollama (Free, No API Keys)

Shane | LLM Implementation · Intermediate ·🔍 RAG & Vector Search ·5mo ago
Most local RAG pipelines are painfully slow. But with the right routing and a simple prompt trick, you can get sub-2-second answers running entirely on your laptop. Here is the exact LangGraph architecture that makes local AI usable again. Build a local RAG agent that answers in ~1.3s—free, private, and fast. We’ll use LangGraph + Ollama with lightweight models, smart agentic routing, a relevance grader, and a single prompt tweak that fixes messy answers. Notebook & Code: https://github.com/LLM-Implementation/Practical-LLM-Implementation/blob/main/agents_frameworks/LangGraph_rag.ipynb What you’ll build Agentic RAG graph (retrieve → grade → rewrite → generate) with conditional edges Local-first stack: Ollama + LangChain/LangGraph + HuggingFace embeddings Prompting fix: clearly labeled Retrieved Context in triple quotes for focused answers Speed & cost: ~1.3s end-to-end on my machine, $0 per run Models & tools used ChatOllama (Granite family) for agent, grader, and answer https://ollama.com/library/granite4 Embedding-Gemma via HuggingFace (dim truncated to 256) https://huggingface.co/google/embeddinggemma-300m In-memory vector store + LangChain tool wrapper https://docs.langchain.com/oss/python/langgraph/agentic-rag#5-rewrite-question Chapters 00:00 Local RAG: Faster & Free 00:47 Architecture Overview (Agentic Graph) 02:11 Environment Setup (Local-First) 02:47 Step 1: Preprocess Docs 03:16 Step 2: Local Retriever (Embeddings) 03:55 Step 3: Agent Node (Tool Use) 04:28 Step 4: Relevance Grader 05:03 Step 5: Question Rewriter 05:45 Step 6: Answer Generator 06:24 Step 7: Assemble the Graph 07:17 Run & Stream the Agent 07:27 The Prompting Trick (Triple-Quoted Context) 08:22 Results: ~1.3s, $0 08:40 Code & Notebook Links 08:57 Outro Notes & fairness The speed comparison references the official LangChain RAG tutorial trace on my local hardware. Your results may vary by machine, models, and retrieval corpus. All trademarks belong to their owners. Enjoyed this? Subsc
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Related AI Lessons

Chapters (15)

Local RAG: Faster & Free
0:47 Architecture Overview (Agentic Graph)
2:11 Environment Setup (Local-First)
2:47 Step 1: Preprocess Docs
3:16 Step 2: Local Retriever (Embeddings)
3:55 Step 3: Agent Node (Tool Use)
4:28 Step 4: Relevance Grader
5:03 Step 5: Question Rewriter
5:45 Step 6: Answer Generator
6:24 Step 7: Assemble the Graph
7:17 Run & Stream the Agent
7:27 The Prompting Trick (Triple-Quoted Context)
8:22 Results: ~1.3s, $0
8:40 Code & Notebook Links
8:57 Outro
Up next
Watch this before applying for jobs as a developer.
Tech With Tim
Watch →