Advanced RAG Techniques with Arcee Trinity Mini (100% Local)

Name: Advanced RAG Techniques with Arcee Trinity Mini (100% Local)
Uploaded: 2026-01-09T17:17:35+00:00
Channel: Julien Simon
Description: In this video, we build a fully local RAG chatbot that runs entirely on a MacBook - no cloud APIs, no usage costs, complete privacy. ⭐️⭐️⭐️ More content...

Julien Simon · Advanced ·🧠 Large Language Models ·2mo ago

In this video, we build a fully local RAG chatbot that runs entirely on a MacBook - no cloud APIs, no usage costs, complete privacy. ⭐️⭐️⭐️ More content on Substack at https://julsimon.substack.com ⭐️⭐️⭐️ We use Arcee's Trinity Mini, a 26-billion-parameter mixture-of-experts model trained for real-world enterprise tasks, including RAG, function calling, and tool use. Running in Q8 quantization through llama.cpp with Metal acceleration, it's surprisingly capable on Apple Silicon. This builds on a previous video where we used Arcee Conductor for cloud-based inference. Same stack - LangChain f…

Watch on YouTube ↗ (saves to browser)

Next Up

5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems

Dave Ebbelaar (LLM Eng)