"How to 10x chatbot UX? ๐Ÿค– ๐Ÿ–ผ๏ธ " - Add Image Responses to GPT knowledge retrieval apps

AI Jason ยท Beginner ยท๐Ÿง  Large Language Models ยท2y ago

Key Takeaways

The video demonstrates how to enhance chatbot UX by enabling it to respond with images using HTML to text library, Llama index, and GPT 3.5. It provides a step-by-step tutorial on how to build a Q&A chatbot that can respond with image references and utilizes the Prompt Engineering framework to design and optimize chatbot UX.

Full Transcript

one of the core use case of large language model is knowledge retrieval and there are a lot of tutorials about how can you build chatway's PDF or chatway's website already however one problem I noticed is that almost all of those systems are useful but a bit boring for example even though your team spent weeks and days trading really well written documentations with image gifs and videos but when those q a answer the question it normally just answered in plain text which is still useful but a lot less useful and engaging then content like those ones and in some cases text just simply didn't convince message for example for image like this is very hard to communicate with just text so rich media is definitely a very important part of knowledge retrieval and the reason why those q a apps can't respawn in image is because we didn't feed any image data to large Lounge model let's take website as an example normally there are two types of data a scripting service return it's either text or raw HTML file and most of the time we choose text because it's very clean so their large language model didn't have much noise like the row HTML file were half but the problem is it removed all the links including the reference link as well as image URL and for PDF file it's the same case most of PDF data loader just simply extract text and ignore all the image files so in most case we simply didn't feed any image URL data to the large knowledge model so it can't retrieve any but it's totally possible to extract both text and reference link like image as part of context that's why I want to show you a case study of how can you build a large language model q and a bot that can respond with image reference in solution Explorer is basically turn all the content into clean markdown format if you don't know what markdown is it's a live language for creating formatted content for example you can use hashtag with text to Define title and you can also use certain syntax to insert the image so with markdown format we can still keep structure of the documents in a very clean way and on the other side we can also use markdown format to display the data as well and that's exactly how chat GPT is displayed different type of format like table or code and I will show you example of how can we turn the raw HTML file to clean markdown format that has both text and image URL in structured away so that the large language model can use those data to return rage response back like this so let's get it as always let's open a project folder in Visual Studio code and there will be four steps firstly we scripted row HTML file of the website and then we'll convert the HTML to markdown which should include all the image URL and then we will create a vector index from those markdown data and in the end we will let large knowledge model to build a q a chain and the first thing we're going to do is create an EMV file to store the API key if you don't know what browserlas is it is popular service that people use for scraping website so once you create account on both service you can just put this API key here and next let's go back to the app.py and we're first link to import a list of different libraries that we're going to use and load the environment variables that we put in dot EMV file and if you haven't installed those libraries you can click on this top right corner and do pip install HTML to text land chain llama index open Ai python.emv and build for sub4 once you did it we can go back and implement the first function script the row HTML from the website so we will create a function called Squidward site pass on URL create a header as well as Define the body structure so we will pass on URL element selector where we just want the body and once we did that we'll convert this body request into Json format and send a post request and once we get the response from the API endpoint we will need to do some parsing to extract the raw HTML once we did that that should return the HTML stream so let's try this all right they return the HTML file successfully but as you can see the raw HTML is very messy and has a lot of information that we don't really need and so if we just pass this information to the larger management model it probably won't produce any meaningful results that's why we want to do the next step which is convert the HTML to the markdown so our query this function called convert HTML to markdown and we will use a library called HTML to text which is a library that can automatically convert HTML to markdown format so we will Define the converter and also set ignore links to be false because we do want the links to be kept and then run the converter so this function should be able to convert the messy HTML into Clean Master let's try this okay as you can see the results is a lot better after we do this cleanup and it does kept all the image URL like this one and we can use some website like markdown live preview to test it out whether the outcome works so I can try to paste to the markdown preview and you can see like the image has been successfully loaded which means if we pass has this information to larger knowledge model and ask it to generate answer we should be able to display this image as well however there are some caveats for some website converted markdown doesn't have absolute URL for the image assets so they might have something like this for those situations we will need to create new functions that can help turn those URL into proper ones and the way we're going to do that is I will create one function called get based URL it will try to extract the domain from the URL that we try to script and then we'll try to convert those reality URL in the HTML to the absolute URL that we're running and we'll use a library called Beautiful soup which allow us to filter and choose different html text and modify it so we'll run some for Loop for all the image tags in the HTML file we'll try to get the source URL and if the source URL start with HTTP or https then we will Skip and continue but if it is not then we will try to convert that URL to the absolute URL that we need and we'll repeat the same process for the so Source Link in the image tag as well as data source as some website do use data source tag instead and we will do the same thing for the link as well just in case and then we'll return the update HTML so those functions should be able to help us convert all those image URL to the proper one that we need and in the end I will just create one function to bring all those things together so called get markdown from the URL if you try to use this function and you will see all the URL will be passed public so this is just one example for different websites they do have different type of structures so you might need to make some more adjustment to making sure the markdown format is actually clean and nice alright the next step is to create a vector index so we can do similarity search and in here I want to use a llama index the Llama index is a open source library that provides a lot of different data loaders so for example allow you to load data from airtable Asana and many other things very easily and they also provide a list of very useful features for you to manage Vector index for example allow allow me to modify and add new data into existing index without recreating the whole index from scratch so it will be more cost effective and on the other side if I have a lot of different types of documents it can automatically break down a pretty complicated question into sub questions that query different documents and in the end bring them together into one answer and for our purpose I will use the most basic function of llama index which is create a vector index and retrieve information let's create a function called generate answer with two inputs the user query as well as a vector index we create above so the first thing we will Define a data retriever they can get a list of relevant notes and nodes to some extent is like the relevant documents but it also have things like metadata and other stuff but in this case we only need a text so I will do this to just extract the tax data so this should give us all the relevant information about this user query and then we will give those contacts to large language model like GPT 3.5 to General answer so I'll Define the model and create a prompt template you are a helpful assistant above its own context please answer a question with all the rules below answer the question only based on context provided do not make things up and answer questions in a helpful manner that's straight to a point with clear structure and all relevant information that might help users answer the question and answer should be formatted in markdown and if there are any random image video links they are very important reference data please include them as part of answer and I'm using the new launching expression language here it's basically the same as you created llm chain component but this new expression language just has singular syntax and once it's finished I will return the response so this is pretty much it let's try it out so I will try to use this webflow help doc as an example where it has reference link as well as image and GIF so our Lotus URL give a query how can I create a webflow app and use all the functions that we defined above so let's try this I'll do python app.py alright I get this response back and I can copy paste the response here into a markdown preview so the answer here include both the texts as well as a gif here so the content is much more engaging this is how you can extract clean markdown format with image reference data from website and internal PDF file it's basically the same thing we can convert those PDF into structured markdown format and there are libraries actually doing this called as posts which allow us to convert a PDF file like this into a structured markdown format with image reference as well as extract image file however this library is not free it actually costs around one thousand dollars per year to use if you do want to use it I do have example here about how you can extract clean markdown content with this Library I'm also pretty Keen to explore whether we can create an open source version so if you really want this PDF to markdown Library please comment below let me know so this example of how you can create a knowledge retrieval app that returns not just text but also rich media like this and once you have this markdown you can either create a front-end by yourself or use Library like streamlit to quickly create a UI wrapper I'm really Keen to see what kind of interesting apps that UW if you do enjoy this content please consider giving me subscribe and I see you next time

Original Description

A step by step tutorial of how to enhance your chat with PDF/website app user experience by enabling it to respond with not just text, but also images ๐Ÿ”ฅ๐Ÿš€ ๐Ÿ”— Links - Join my community: https://www.skool.com/ai-builder-club/about - Follow me on twitter: https://twitter.com/jasonzhou1993 - Join my AI email list: https://www.ai-jason.com/ - AI Researcher Github: https://github.com/JayZeeDesign/knowledge-retrieval-with-imgs - My discord: https://discord.gg/eZXprSaCDE โฑ๏ธ Timestamps 0:00 Intro 0:43 Why GPT chatbots don't return imgs 1:23 Case study overview 2:20 Step 1: Scrape HTML 4:03 Step 2: Convert HTML to Markdown 6:22 Step 3: Create vector index with llama index 7:10 Step4: Retrieval Augmented Generation (RAG) 8:43 Handle PDF docs ๐Ÿ‘‹๐Ÿป About Me My name is Jason Zhou, a product designer who shares interesting AI experiments & products. Email me if you need help building AI apps! ask@ai-jason.com #gpt #chatgpt #ai #artificialintelligence #tutorial #stepbystep #openai #llm #langchain #largelanguagemodels #largelanguagemodel #chatwithpdf #autogpt #chatgpt4 #gpt4 #gpt3 #aiautomation #aiautomation #aiagents #nocode
Watch on YouTube โ†— (saves to browser)
Sign in to unlock AI tutor explanation ยท โšก30

Playlist

Uploads from AI Jason ยท AI Jason ยท 15 of 60

1 Build Your Own Auto-GPT Apps without coding Step by Step (Dust.tt Tutorial)
Build Your Own Auto-GPT Apps without coding Step by Step (Dust.tt Tutorial)
AI Jason
2 AutoGPT tutorial: Build your personal assistant WITHOUT code (Via Relevance AI)
AutoGPT tutorial: Build your personal assistant WITHOUT code (Via Relevance AI)
AI Jason
3 Create your own AI girlfriend that talks โค๏ธ
Create your own AI girlfriend that talks โค๏ธ
AI Jason
4 How to build with Langchain 10x easier | โ›“๏ธ LangFlow & Flowise
How to build with Langchain 10x easier | โ›“๏ธ LangFlow & Flowise
AI Jason
5 I build an autonomous researcher via GPT | Langchain โ›“๏ธ Tutorial
I build an autonomous researcher via GPT | Langchain โ›“๏ธ Tutorial
AI Jason
6 Smol AI tutorial in 5 mins | Build ENTIRE codebase with a single prompt
Smol AI tutorial in 5 mins | Build ENTIRE codebase with a single prompt
AI Jason
7 Hugging Face + Langchain in 5 mins | Access 200k+ FREE AI models for your AI apps
Hugging Face + Langchain in 5 mins | Access 200k+ FREE AI models for your AI apps
AI Jason
8 How to let GPT control anything & 10x powerful | 8 mins tutorial about GPT funtion calling
How to let GPT control anything & 10x powerful | 8 mins tutorial about GPT funtion calling
AI Jason
9 Extract data & automate EVERYTHING | 10x GPT function calling power
Extract data & automate EVERYTHING | 10x GPT function calling power
AI Jason
10 Finally, an AI agent that actually works
Finally, an AI agent that actually works
AI Jason
11 "okay, but I want GPT to perform 10x for my specific use case" - Here is how
"okay, but I want GPT to perform 10x for my specific use case" - Here is how
AI Jason
12 "Wait..this AI Agent does research for you 24hrs without hallucination?!" - Here is how
"Wait..this AI Agent does research for you 24hrs without hallucination?!" - Here is how
AI Jason
13 "How to give GPT my business knowledge?" - Knowledge embedding 101
"How to give GPT my business knowledge?" - Knowledge embedding 101
AI Jason
14 โ€œAutomation 2.0 comingโ€ฆNo more boring data entry jobโ€
โ€œAutomation 2.0 comingโ€ฆNo more boring data entry jobโ€
AI Jason
โ–ถ "How to 10x chatbot UX? ๐Ÿค– ๐Ÿ–ผ๏ธ " - Add Image Responses to GPT knowledge retrieval apps
"How to 10x chatbot UX? ๐Ÿค– ๐Ÿ–ผ๏ธ " - Add Image Responses to GPT knowledge retrieval apps
AI Jason
16 โ€œLLAMA2 supercharged with vision & hearing?!โ€ | Multimodal 101 tutorial
โ€œLLAMA2 supercharged with vision & hearing?!โ€ | Multimodal 101 tutorial
AI Jason
17 "Next Level Prompts?" - 10 mins into advanced prompting
"Next Level Prompts?" - 10 mins into advanced prompting
AI Jason
18 Build AI agent workforce - Multi agent framework with MetaGPT & chatDev
Build AI agent workforce - Multi agent framework with MetaGPT & chatDev
AI Jason
19 How to scale your AI automation pipeline
How to scale your AI automation pipeline
AI Jason
20 AI agent manages community 24/7 - Build Agent workforce ep#1
AI agent manages community 24/7 - Build Agent workforce ep#1
AI Jason
21 Autogen - Microsoft's best AI Agent framework that is controllable?
Autogen - Microsoft's best AI Agent framework that is controllable?
AI Jason
22 StreamingLLM - Extend Llama2 to 4 million token & 22x faster inference?
StreamingLLM - Extend Llama2 to 4 million token & 22x faster inference?
AI Jason
23 AI agent + Vision = Incredible
AI agent + Vision = Incredible
AI Jason
24 After 7 days letting AI agents control my email inbox... ๐Ÿ“ฎ
After 7 days letting AI agents control my email inbox... ๐Ÿ“ฎ
AI Jason
25 How to use New OpenAI DevDay features - GPT4V x TTS demo tutorial
How to use New OpenAI DevDay features - GPT4V x TTS demo tutorial
AI Jason
26 What is Q* | Reinforcement learning 101 & Hypothesis
What is Q* | Reinforcement learning 101 & Hypothesis
AI Jason
27 "Research agent 3.0 - Build a group of AI researchers" - Here is how
"Research agent 3.0 - Build a group of AI researchers" - Here is how
AI Jason
28 GPT4V + Puppeteer = AI agent browse web like human? ๐Ÿค–
GPT4V + Puppeteer = AI agent browse web like human? ๐Ÿค–
AI Jason
29 Real Gemini demo? Rebuild with GPT4V + Whisper + TTS
Real Gemini demo? Rebuild with GPT4V + Whisper + TTS
AI Jason
30 AI Robot's ChatGPT moment at 2024?
AI Robot's ChatGPT moment at 2024?
AI Jason
31 GPT5 unlocks LLM System 2 Thinking?
GPT5 unlocks LLM System 2 Thinking?
AI Jason
32 The REAL cost of LLM (And How to reduce 78%+ of Cost)
The REAL cost of LLM (And How to reduce 78%+ of Cost)
AI Jason
33 OpenAI's Agent 2.0: Excited or Scared?
OpenAI's Agent 2.0: Excited or Scared?
AI Jason
34 Real time AI Conversation Co-pilot on your phone, Crazy or Creepy?
Real time AI Conversation Co-pilot on your phone, Crazy or Creepy?
AI Jason
35 INSANELY Fast AI Cold Call Agent- built w/ Groq
INSANELY Fast AI Cold Call Agent- built w/ Groq
AI Jason
36 AI Employees Outperform Human Employees?! Build a real Sales Agent
AI Employees Outperform Human Employees?! Build a real Sales Agent
AI Jason
37 Future of E-commerce?! Virtual clothing try-on agent
Future of E-commerce?! Virtual clothing try-on agent
AI Jason
38 Unlock AI Agent real power?! Long term memory & Self improving
Unlock AI Agent real power?! Long term memory & Self improving
AI Jason
39 "I want Llama3 to perform 10x with my private knowledge" - Local Agentic RAG w/ llama3
"I want Llama3 to perform 10x with my private knowledge" - Local Agentic RAG w/ llama3
AI Jason
40 โ€œWait, this Agent can Scrape ANYTHING?!โ€ - Build universal web scraping agent
โ€œWait, this Agent can Scrape ANYTHING?!โ€ - Build universal web scraping agent
AI Jason
41 "Make Agent 10x cheaper, faster & better?" -  LLM System Evaluation 101
"Make Agent 10x cheaper, faster & better?" - LLM System Evaluation 101
AI Jason
42 Claude 3.5 struggle too?! The $Million dollar challenge
Claude 3.5 struggle too?! The $Million dollar challenge
AI Jason
43 Make your agents 10x more reliable? Flow engineer 101
Make your agents 10x more reliable? Flow engineer 101
AI Jason
44 "I want Llama3.1 to perform 10x with my private knowledge" - Self learning Local Llama3.1 405B
"I want Llama3.1 to perform 10x with my private knowledge" - Self learning Local Llama3.1 405B
AI Jason
45 AI process thousands of videos?! - SAM2 deep dive 101
AI process thousands of videos?! - SAM2 deep dive 101
AI Jason
46 "Wait, I'm using OpenAI Structured Output wrong ?!" - Advanced Structured Output tutorial
"Wait, I'm using OpenAI Structured Output wrong ?!" - Advanced Structured Output tutorial
AI Jason
47 How to use Cursor AI build & deploy production app in 20 mins
How to use Cursor AI build & deploy production app in 20 mins
AI Jason
48 Best Cursor Workflow that no one talks about...
Best Cursor Workflow that no one talks about...
AI Jason
49 This is how I scrape 99% websites via LLM
This is how I scrape 99% websites via LLM
AI Jason
50 Better than Cursor? Future Agentic Coding available today
Better than Cursor? Future Agentic Coding available today
AI Jason
51 EASIEST Way to Train LLM Train w/ unsloth (2x faster with 70% less GPU memory required)
EASIEST Way to Train LLM Train w/ unsloth (2x faster with 70% less GPU memory required)
AI Jason
52 1000x Cursor workflow for building apps
1000x Cursor workflow for building apps
AI Jason
53 Easiest way to build fancy UI with Cursor/Windsurf/Bolt/Lovable
Easiest way to build fancy UI with Cursor/Windsurf/Bolt/Lovable
AI Jason
54 From $0 to $4m with just 2 people (ComfyUI Crash-course for E-commerce)
From $0 to $4m with just 2 people (ComfyUI Crash-course for E-commerce)
AI Jason
55 Deepseek R1 - The Era of Reasoning models
Deepseek R1 - The Era of Reasoning models
AI Jason
56 Yep, o3-mini is WORTH the money - Build your own reasoning agent
Yep, o3-mini is WORTH the money - Build your own reasoning agent
AI Jason
57 The ONLY way to run your own Deepseek on mobile...
The ONLY way to run your own Deepseek on mobile...
AI Jason
58 Those MCP totally 10x my Cursor workflowโ€ฆ
Those MCP totally 10x my Cursor workflowโ€ฆ
AI Jason
59 MCP = Next Big Opportunity? EASIST way to build your own MCP business
MCP = Next Big Opportunity? EASIST way to build your own MCP business
AI Jason
60 Gemini 2.0 blew me away - The future of Multimodal Model
Gemini 2.0 blew me away - The future of Multimodal Model
AI Jason

This video tutorial teaches how to enhance chatbot UX by enabling it to respond with images using HTML to text library, Llama index, and GPT 3.5. It provides a step-by-step guide on how to build a Q&A chatbot that can respond with image references and utilizes the Prompt Engineering framework to design and optimize chatbot UX. By following this tutorial, viewers can learn how to improve the user experience of their chatbots and make them more engaging and informative.

Key Takeaways
  1. Create an EMV file to store API key
  2. Script raw HTML from website using HTML to text library
  3. Convert raw HTML to markdown format to include image URLs
  4. Create a vector index from markdown data
  5. Build a Q&A chatbot that can respond with image references
  6. Define a function to convert HTML to markdown using the `html2text` library
  7. Create a function to convert relative URLs to absolute URLs using the `BeautifulSoup` library
  8. Use the Llama index to create a vector index for similarity search
  9. Load data from Airtable and Asana using the Llama index data loaders
  10. Define a data retriever to extract relevant text data
๐Ÿ’ก The key to enhancing chatbot UX is to enable it to respond with images, which can be achieved by utilizing the HTML to text library, Llama index, and GPT 3.5. The Prompt Engineering framework can be used to design and optimize chatbot UX, and the FAISS library can be used for efficient similarity se
๐Ÿ”’ Pro feature: Ask AI to explain this lesson โ†’

Related AI Lessons

โšก
Spring AI Tutorial โ€” Your First REST Endpoint with OpenAI (2026)
Build a REST endpoint with Spring Boot 3 and OpenAI to create an LLM-powered API, leveraging the power of AI in your applications
Dev.to AI
โšก
10 ChatGPT Prompts for Job Seekers: Resumes, Interviews & Career Growth
Learn how to leverage ChatGPT for job searching, resume building, and career growth with 10 actionable prompts
Medium ยท ChatGPT
โšก
Lost in Transcription: The Week the Machine Started Lying
Learn how Whisper AI transcription can be flawed and understand the importance of validation in AI-generated text
Medium ยท AI
โšก
From Sci-Fi to Source Code: Why the Future of LLMs Looks Like Pure Number Theory
Explore how number theory is revolutionizing Large Language Models, enabling more efficient and effective models
Medium ยท LLM

Chapters (8)

Intro
0:43 Why GPT chatbots don't return imgs
1:23 Case study overview
2:20 Step 1: Scrape HTML
4:03 Step 2: Convert HTML to Markdown
6:22 Step 3: Create vector index with llama index
7:10 Step4: Retrieval Augmented Generation (RAG)
8:43 Handle PDF docs
Up next
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Dave Ebbelaar (LLM Eng)
Watch โ†’