RAG Project: Build an AI Onboarding Chatbot with Streamlit, LangChain, and ChromaDB
Key Takeaways
This video demonstrates building an AI onboarding chatbot using Streamlit, LangChain, and ChromaDB, covering environment variables, PDF parsing, RAG pipelines, and vector search integration.
Original Description
In this comprehensive video, you'll learn how to build a fully functional AI-powered onboarding chatbot tailored for new employees at a fictional company (Umbrella Corporation). The tutorial walks through each stage of development, from managing environment variables and parsing company policy PDFs, to implementing retrieval-augmented generation (RAG) pipelines using LangChain, integrating vector search with Chroma, building a rich frontend with Streamlit, and instrumenting your app for traceability with LangSmith. Ideal for AI engineers, data scientists, and Python developers, the video provides hands-on implementation details, best practices, and debugging insights, making it a perfect portfolio project or a foundation for enterprise onboarding solutions.
## Topics
- Working with OpenAI and Groq API keys
- Understanding and using LangSmith for tracing/debugging
- Using Faker for generating sample employee data
- Parsing and extracting text from PDFs using PyPDF
- Handling structured and unstructured PDF data
- Loading, vectorizing, and splitting documents using LangChain and text splitters
- Creating and managing Chroma vector stores for document retrieval
- Building RAG (Retrieval Augmented Generation) applications
## Links and code
*Important*: This video is a module from the AI Engineering Bootcamp. Sign up here:
- 🚀 Complete AI Engineer Bootcamp: https://aibootcamp.dev
👉 Code (starting point): https://github.com/alejandro-ao/client-onboarding-rag-demo
👉 Code (solution): https://github.com/alejandro-ao/client-onboarding-rag-demo/tree/solution
- ❤️ Buy me a coffee... or a beer (thanks): https://link.alejandro-ao.com/l83gNq
- 💬 Join the Discord Help Server: https://link.alejandro-ao.com/HrFKZn
- ✉️ Get the news from the channel and AI Engineering: https://link.alejandro-ao.com/AIIguB
## Timestamps
0:00:00 - Intro
0:02:39 - Project setup
0:07:31 - Data Service
0:18:10 - Assistant Class
0:34:56 - GUI With Streamlit
0:44:35 - Run the Chatbot
0:51:22 - Trac
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Alejandro AO · Alejandro AO · 59 of 60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
▶
60
Linear Regression in R - Full Project for Beginners
Alejandro AO
Configure Webpack 5 in Wordpress (2025) with Typescript and SASS
Alejandro AO
R Programming 101 - Crash Course for beginners
Alejandro AO
Convert HTML template to WordPress Theme (2025) - Full Course
Alejandro AO
Javascript Interactive Map with Leaflet EASY (with Marker Clusters & Popups)
Alejandro AO
Vanilla JS Project: Multi Step form in HTML, CSS & OOP Javascript
Alejandro AO
How to do AJAX in WordPress correctly (2025)
Alejandro AO
React Leaflet Tutorial for Beginners (2025)
Alejandro AO
Linear Regression in Python - Full Project for Beginners
Alejandro AO
Logistic Regression Project: Cancer Prediction with Python
Alejandro AO
Display Equations in ChatGPT
Alejandro AO
Create a Chrome Extension (Manifest V3) for ChatGPT
Alejandro AO
Full-Stack Project | ChatGPT API, React, Node.js, Express
Alejandro AO
Streamlit Python Course: Build a Machine Learning App to Predict Cancer
Alejandro AO
Langchain PDF App (GUI) | Create a ChatGPT For Your PDF in Python
Alejandro AO
LangChain Memory Tutorial | Building a ChatGPT Clone in Python
Alejandro AO
Chat with a CSV | LangChain Agents Tutorial (Beginners)
Alejandro AO
Create a ChatGPT clone using Streamlit and LangChain
Alejandro AO
Chat with Multiple PDFs | LangChain App Tutorial in Python (Free LLMs and Embeddings)
Alejandro AO
Full Python Environment Setup for AI (or other) Apps + Virtual Environments
Alejandro AO
Langchain + Qdrant Cloud | Pinecone FREE Alternative (20GB) | Tutorial
Alejandro AO
LangChain Version 0.1 Explained | New Features & Changes
Alejandro AO
Create a RAG Chain using LangChain 0.1 (New version)
Alejandro AO
Tutorial | Chat with any Website using Python and Langchain (LATEST VERSION)
Alejandro AO
Deploy Your AI Streamlit App for FREE | Step-by-Step (Heroku Alternative)
Alejandro AO
What is Google's Gemini 1.5 Pro | 10 Million Token Window
Alejandro AO
Chat with MySQL Database with Python | LangChain Tutorial
Alejandro AO
Stream LLMs with LangChain + Streamlit | Tutorial
Alejandro AO
Chat with MySQL Database using GPT-4 and Mistral AI | Python GUI App
Alejandro AO
#1 Harrison Chase: LangChain and The Future of LLM Applications | Alejandro AO
Alejandro AO
CrewAI Step-by-Step | Complete Course for Beginners
Alejandro AO
Python: Automating a Marketing Team with AI Agents | Planning and Implementing CrewAI
Alejandro AO
Build a Web App (GUI) for your CrewAI Automation (Easy with Python)
Alejandro AO
Early days of RAG and LlamaIndex - Jerry Liu
Alejandro AO
LlamaParse: Convert PDF (with tables) to Markdown
Alejandro AO
#2 Jerry Liu - What is LlamaIndex, Agents & Advice for AI Engineers
Alejandro AO
CrewAI + Exa: Generate a Newsletter with Research Agents (Part 1)
Alejandro AO
#3 Joe Moura | Multi Agent Systems and CrewAI
Alejandro AO
Python: Create a ReAct Agent from Scratch
Alejandro AO
New Groq Models: Best for Function-Calling Agents
Alejandro AO
Introduction to LlamaIndex with Python (2025)
Alejandro AO
LlamaIndex: How to use LLMs
Alejandro AO
LlamaIndex: How to Get Structured Data from LLMs
Alejandro AO
Multimodal RAG: Chat with PDFs (Images & Tables) [2025]
Alejandro AO
Advanced RAG with LlamaIndex - Metadata Extraction [2025]
Alejandro AO
Learn MCP Servers with Python (EASY)
Alejandro AO
Create MCP Clients in JavaScript - Tutorial
Alejandro AO
Create an MCP Client in Python - FastAPI Tutorial
Alejandro AO
How to Build an MCP Client GUI with Streamlit and FastAPI
Alejandro AO
Vibe Coding For Engineers (make it ACTUALLY work)
Alejandro AO
LlamaExtract Tutorial: Convert PDF & Images into JSON
Alejandro AO
Local MCP Servers for Cursor (Step by step)
Alejandro AO
Anthropic: How to Build Multi Agent Systems
Alejandro AO
Deploy Remote MCP Servers in Python (Step by Step)
Alejandro AO
GPT-5 for Developers: API Changes, Pricing, Model Router & Security
Alejandro AO
Tutorial: Auth for Remote MCP Servers (Step by Step) | OAuth 2.1 with ScaleKit
Alejandro AO
Generate UI Tests with TestSprite MCP Server + TRAE
Alejandro AO
#4 Allan Guo | 19-yo YC Founder - Willow Voice
Alejandro AO
RAG Project: Build an AI Onboarding Chatbot with Streamlit, LangChain, and ChromaDB
Alejandro AO
MCP Security | Malicious MCP Servers (Protect Yourself)
Alejandro AO
More on: RAG Basics
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
Why you shouldn’t search your documents directly with AI
Medium · Programming
Your AI Keeps Making Things Up. RAG Is How You Make It Use Real Facts Instead.
Medium · RAG
Evaluation Metrics for RAG: Measure Retrieval, Generation, and End-to-End Quality With Numbers That…
Medium · AI
Evaluation Metrics for RAG: Measure Retrieval, Generation, and End-to-End Quality With Numbers That…
Medium · Data Science
Chapters (7)
Intro
2:39
Project setup
7:31
Data Service
18:10
Assistant Class
34:56
GUI With Streamlit
44:35
Run the Chatbot
51:22
Trac
🎓
Tutor Explanation
DeepCamp AI