"How to 10x chatbot UX? 🤖 🖼️ " - Add Image Responses to GPT knowledge retrieval apps

AI Jason · Beginner ·🧠 Large Language Models ·2y ago

Skills: LLM Foundations80%Prompt Craft70%Prompting Basics60%LLM Engineering50%

Key Takeaways

The video demonstrates how to enhance chatbot UX by enabling it to respond with images using HTML to text library, Llama index, and GPT 3.5. It provides a step-by-step tutorial on how to build a Q&A chatbot that can respond with image references and utilizes the Prompt Engineering framework to design and optimize chatbot UX.

Full Transcript

one of the core use case of large language model is knowledge retrieval and there are a lot of tutorials about how can you build chatway's PDF or chatway's website already however one problem I noticed is that almost all of those systems are useful but a bit boring for example even though your team spent weeks and days trading really well written documentations with image gifs and videos but when those q a answer the question it normally just answered in plain text which is still useful but a lot less useful and engaging then content like those ones and in some cases text just simply didn't convince message for example for image like this is very hard to communicate with just text so rich media is definitely a very important part of knowledge retrieval and the reason why those q a apps can't respawn in image is because we didn't feed any image data to large Lounge model let's take website as an example normally there are two types of data a scripting service return it's either text or raw HTML file and most of the time we choose text because it's very clean so their large language model didn't have much noise like the row HTML file were half but the problem is it removed all the links including the reference link as well as image URL and for PDF file it's the same case most of PDF data loader just simply extract text and ignore all the image files so in most case we simply didn't feed any image URL data to the large knowledge model so it can't retrieve any but it's totally possible to extract both text and reference link like image as part of context that's why I want to show you a case study of how can you build a large language model q and a bot that can respond with image reference in solution Explorer is basically turn all the content into clean markdown format if you don't know what markdown is it's a live language for creating formatted content for example you can use hashtag with text to Define title and you can also use certain syntax to insert the image so with markdown format we can still keep structure of the documents in a very clean way and on the other side we can also use markdown format to display the data as well and that's exactly how chat GPT is displayed different type of format like table or code and I will show you example of how can we turn the raw HTML file to clean markdown format that has both text and image URL in structured away so that the large language model can use those data to return rage response back like this so let's get it as always let's open a project folder in Visual Studio code and there will be four steps firstly we scripted row HTML file of the website and then we'll convert the HTML to markdown which should include all the image URL and then we will create a vector index from those markdown data and in the end we will let large knowledge model to build a q a chain and the first thing we're going to do is create an EMV file to store the API key if you don't know what browserlas is it is popular service that people use for scraping website so once you create account on both service you can just put this API key here and next let's go back to the app.py and we're first link to import a list of different libraries that we're going to use and load the environment variables that we put in dot EMV file and if you haven't installed those libraries you can click on this top right corner and do pip install HTML to text land chain llama index open Ai python.emv and build for sub4 once you did it we can go back and implement the first function script the row HTML from the website so we will create a function called Squidward site pass on URL create a header as well as Define the body structure so we will pass on URL element selector where we just want the body and once we did that we'll convert this body request into Json format and send a post request and once we get the response from the API endpoint we will need to do some parsing to extract the raw HTML once we did that that should return the HTML stream so let's try this all right they return the HTML file successfully but as you can see the raw HTML is very messy and has a lot of information that we don't really need and so if we just pass this information to the larger management model it probably won't produce any meaningful results that's why we want to do the next step which is convert the HTML to the markdown so our query this function called convert HTML to markdown and we will use a library called HTML to text which is a library that can automatically convert HTML to markdown format so we will Define the converter and also set ignore links to be false because we do want the links to be kept and then run the converter so this function should be able to convert the messy HTML into Clean Master let's try this okay as you can see the results is a lot better after we do this cleanup and it does kept all the image URL like this one and we can use some website like markdown live preview to test it out whether the outcome works so I can try to paste to the markdown preview and you can see like the image has been successfully loaded which means if we pass has this information to larger knowledge model and ask it to generate answer we should be able to display this image as well however there are some caveats for some website converted markdown doesn't have absolute URL for the image assets so they might have something like this for those situations we will need to create new functions that can help turn those URL into proper ones and the way we're going to do that is I will create one function called get based URL it will try to extract the domain from the URL that we try to script and then we'll try to convert those reality URL in the HTML to the absolute URL that we're running and we'll use a library called Beautiful soup which allow us to filter and choose different html text and modify it so we'll run some for Loop for all the image tags in the HTML file we'll try to get the source URL and if the source URL start with HTTP or https then we will Skip and continue but if it is not then we will try to convert that URL to the absolute URL that we need and we'll repeat the same process for the so Source Link in the image tag as well as data source as some website do use data source tag instead and we will do the same thing for the link as well just in case and then we'll return the update HTML so those functions should be able to help us convert all those image URL to the proper one that we need and in the end I will just create one function to bring all those things together so called get markdown from the URL if you try to use this function and you will see all the URL will be passed public so this is just one example for different websites they do have different type of structures so you might need to make some more adjustment to making sure the markdown format is actually clean and nice alright the next step is to create a vector index so we can do similarity search and in here I want to use a llama index the Llama index is a open source library that provides a lot of different data loaders so for example allow you to load data from airtable Asana and many other things very easily and they also provide a list of very useful features for you to manage Vector index for example allow allow me to modify and add new data into existing index without recreating the whole index from scratch so it will be more cost effective and on the other side if I have a lot of different types of documents it can automatically break down a pretty complicated question into sub questions that query different documents and in the end bring them together into one answer and for our purpose I will use the most basic function of llama index which is create a vector index and retrieve information let's create a function called generate answer with two inputs the user query as well as a vector index we create above so the first thing we will Define a data retriever they can get a list of relevant notes and nodes to some extent is like the relevant documents but it also have things like metadata and other stuff but in this case we only need a text so I will do this to just extract the tax data so this should give us all the relevant information about this user query and then we will give those contacts to large language model like GPT 3.5 to General answer so I'll Define the model and create a prompt template you are a helpful assistant above its own context please answer a question with all the rules below answer the question only based on context provided do not make things up and answer questions in a helpful manner that's straight to a point with clear structure and all relevant information that might help users answer the question and answer should be formatted in markdown and if there are any random image video links they are very important reference data please include them as part of answer and I'm using the new launching expression language here it's basically the same as you created llm chain component but this new expression language just has singular syntax and once it's finished I will return the response so this is pretty much it let's try it out so I will try to use this webflow help doc as an example where it has reference link as well as image and GIF so our Lotus URL give a query how can I create a webflow app and use all the functions that we defined above so let's try this I'll do python app.py alright I get this response back and I can copy paste the response here into a markdown preview so the answer here include both the texts as well as a gif here so the content is much more engaging this is how you can extract clean markdown format with image reference data from website and internal PDF file it's basically the same thing we can convert those PDF into structured markdown format and there are libraries actually doing this called as posts which allow us to convert a PDF file like this into a structured markdown format with image reference as well as extract image file however this library is not free it actually costs around one thousand dollars per year to use if you do want to use it I do have example here about how you can extract clean markdown content with this Library I'm also pretty Keen to explore whether we can create an open source version so if you really want this PDF to markdown Library please comment below let me know so this example of how you can create a knowledge retrieval app that returns not just text but also rich media like this and once you have this markdown you can either create a front-end by yourself or use Library like streamlit to quickly create a UI wrapper I'm really Keen to see what kind of interesting apps that UW if you do enjoy this content please consider giving me subscribe and I see you next time

Original Description

A step by step tutorial of how to enhance your chat with PDF/website app user experience by enabling it to respond with not just text, but also images 🔥🚀 🔗 Links - Join my community: https://www.skool.com/ai-builder-club/about - Follow me on twitter: https://twitter.com/jasonzhou1993 - Join my AI email list: https://www.ai-jason.com/ - AI Researcher Github: https://github.com/JayZeeDesign/knowledge-retrieval-with-imgs - My discord: https://discord.gg/eZXprSaCDE ⏱️ Timestamps 0:00 Intro 0:43 Why GPT chatbots don't return imgs 1:23 Case study overview 2:20 Step 1: Scrape HTML 4:03 Step 2: Convert HTML to Markdown 6:22 Step 3: Create vector index with llama index 7:10 Step4: Retrieval Augmented Generation (RAG) 8:43 Handle PDF docs 👋🏻 About Me My name is Jason Zhou, a product designer who shares interesting AI experiments & products. Email me if you need help building AI apps! ask@ai-jason.com #gpt #chatgpt #ai #artificialintelligence #tutorial #stepbystep #openai #llm #langchain #largelanguagemodels #largelanguagemodel #chatwithpdf #autogpt #chatgpt4 #gpt4 #gpt3 #aiautomation #aiautomation #aiagents #nocode

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from AI Jason · AI Jason · 15 of 60

← Previous Next →

Build Your Own Auto-GPT Apps without coding Step by Step (Dust.tt Tutorial)

Build Your Own Auto-GPT Apps without coding Step by Step (Dust.tt Tutorial)

AutoGPT tutorial: Build your personal assistant WITHOUT code (Via Relevance AI)

AutoGPT tutorial: Build your personal assistant WITHOUT code (Via Relevance AI)

Create your own AI girlfriend that talks ❤️

Create your own AI girlfriend that talks ❤️

How to build with Langchain 10x easier | ⛓️ LangFlow & Flowise

How to build with Langchain 10x easier | ⛓️ LangFlow & Flowise

I build an autonomous researcher via GPT | Langchain ⛓️ Tutorial

I build an autonomous researcher via GPT | Langchain ⛓️ Tutorial

Smol AI tutorial in 5 mins | Build ENTIRE codebase with a single prompt

Smol AI tutorial in 5 mins | Build ENTIRE codebase with a single prompt

Hugging Face + Langchain in 5 mins | Access 200k+ FREE AI models for your AI apps

Hugging Face + Langchain in 5 mins | Access 200k+ FREE AI models for your AI apps

How to let GPT control anything & 10x powerful | 8 mins tutorial about GPT funtion calling

How to let GPT control anything & 10x powerful | 8 mins tutorial about GPT funtion calling

Extract data & automate EVERYTHING | 10x GPT function calling power

Extract data & automate EVERYTHING | 10x GPT function calling power

Finally, an AI agent that actually works

Finally, an AI agent that actually works

"okay, but I want GPT to perform 10x for my specific use case" - Here is how

"okay, but I want GPT to perform 10x for my specific use case" - Here is how

"Wait..this AI Agent does research for you 24hrs without hallucination?!" - Here is how

"Wait..this AI Agent does research for you 24hrs without hallucination?!" - Here is how

"How to give GPT my business knowledge?" - Knowledge embedding 101

"How to give GPT my business knowledge?" - Knowledge embedding 101

“Automation 2.0 coming…No more boring data entry job”

“Automation 2.0 coming…No more boring data entry job”

"How to 10x chatbot UX? 🤖 🖼️ " - Add Image Responses to GPT knowledge retrieval apps

"How to 10x chatbot UX? 🤖 🖼️ " - Add Image Responses to GPT knowledge retrieval apps

“LLAMA2 supercharged with vision & hearing?!” | Multimodal 101 tutorial

“LLAMA2 supercharged with vision & hearing?!” | Multimodal 101 tutorial

"Next Level Prompts?" - 10 mins into advanced prompting

"Next Level Prompts?" - 10 mins into advanced prompting

Build AI agent workforce - Multi agent framework with MetaGPT & chatDev

Build AI agent workforce - Multi agent framework with MetaGPT & chatDev

How to scale your AI automation pipeline

How to scale your AI automation pipeline

AI agent manages community 24/7 - Build Agent workforce ep#1

AI agent manages community 24/7 - Build Agent workforce ep#1

Autogen - Microsoft's best AI Agent framework that is controllable?

Autogen - Microsoft's best AI Agent framework that is controllable?

StreamingLLM - Extend Llama2 to 4 million token & 22x faster inference?

StreamingLLM - Extend Llama2 to 4 million token & 22x faster inference?

AI agent + Vision = Incredible

AI agent + Vision = Incredible

After 7 days letting AI agents control my email inbox... 📮

After 7 days letting AI agents control my email inbox... 📮

How to use New OpenAI DevDay features - GPT4V x TTS demo tutorial

How to use New OpenAI DevDay features - GPT4V x TTS demo tutorial

What is Q* | Reinforcement learning 101 & Hypothesis

What is Q* | Reinforcement learning 101 & Hypothesis

"Research agent 3.0 - Build a group of AI researchers" - Here is how

"Research agent 3.0 - Build a group of AI researchers" - Here is how

GPT4V + Puppeteer = AI agent browse web like human? 🤖

GPT4V + Puppeteer = AI agent browse web like human? 🤖

Real Gemini demo? Rebuild with GPT4V + Whisper + TTS

Real Gemini demo? Rebuild with GPT4V + Whisper + TTS

AI Robot's ChatGPT moment at 2024?

AI Robot's ChatGPT moment at 2024?

GPT5 unlocks LLM System 2 Thinking?

GPT5 unlocks LLM System 2 Thinking?

The REAL cost of LLM (And How to reduce 78%+ of Cost)

The REAL cost of LLM (And How to reduce 78%+ of Cost)

OpenAI's Agent 2.0: Excited or Scared?

OpenAI's Agent 2.0: Excited or Scared?

Real time AI Conversation Co-pilot on your phone, Crazy or Creepy?

Real time AI Conversation Co-pilot on your phone, Crazy or Creepy?

INSANELY Fast AI Cold Call Agent- built w/ Groq

INSANELY Fast AI Cold Call Agent- built w/ Groq

AI Employees Outperform Human Employees?! Build a real Sales Agent

AI Employees Outperform Human Employees?! Build a real Sales Agent

Future of E-commerce?! Virtual clothing try-on agent

Future of E-commerce?! Virtual clothing try-on agent

Unlock AI Agent real power?! Long term memory & Self improving

Unlock AI Agent real power?! Long term memory & Self improving

"I want Llama3 to perform 10x with my private knowledge" - Local Agentic RAG w/ llama3

"I want Llama3 to perform 10x with my private knowledge" - Local Agentic RAG w/ llama3

“Wait, this Agent can Scrape ANYTHING?!” - Build universal web scraping agent

“Wait, this Agent can Scrape ANYTHING?!” - Build universal web scraping agent

"Make Agent 10x cheaper, faster & better?" - LLM System Evaluation 101

"Make Agent 10x cheaper, faster & better?" - LLM System Evaluation 101

Claude 3.5 struggle too?! The $Million dollar challenge

Claude 3.5 struggle too?! The $Million dollar challenge

Make your agents 10x more reliable? Flow engineer 101

Make your agents 10x more reliable? Flow engineer 101

"I want Llama3.1 to perform 10x with my private knowledge" - Self learning Local Llama3.1 405B

"I want Llama3.1 to perform 10x with my private knowledge" - Self learning Local Llama3.1 405B

AI process thousands of videos?! - SAM2 deep dive 101

AI process thousands of videos?! - SAM2 deep dive 101

"Wait, I'm using OpenAI Structured Output wrong ?!" - Advanced Structured Output tutorial

"Wait, I'm using OpenAI Structured Output wrong ?!" - Advanced Structured Output tutorial

How to use Cursor AI build & deploy production app in 20 mins

How to use Cursor AI build & deploy production app in 20 mins

Best Cursor Workflow that no one talks about...

Best Cursor Workflow that no one talks about...

This is how I scrape 99% websites via LLM

This is how I scrape 99% websites via LLM

Better than Cursor? Future Agentic Coding available today

Better than Cursor? Future Agentic Coding available today

EASIEST Way to Train LLM Train w/ unsloth (2x faster with 70% less GPU memory required)

EASIEST Way to Train LLM Train w/ unsloth (2x faster with 70% less GPU memory required)

1000x Cursor workflow for building apps

1000x Cursor workflow for building apps

Easiest way to build fancy UI with Cursor/Windsurf/Bolt/Lovable

Easiest way to build fancy UI with Cursor/Windsurf/Bolt/Lovable

From $0 to $4m with just 2 people (ComfyUI Crash-course for E-commerce)

From $0 to $4m with just 2 people (ComfyUI Crash-course for E-commerce)

Deepseek R1 - The Era of Reasoning models

Deepseek R1 - The Era of Reasoning models

Yep, o3-mini is WORTH the money - Build your own reasoning agent

Yep, o3-mini is WORTH the money - Build your own reasoning agent

The ONLY way to run your own Deepseek on mobile...

The ONLY way to run your own Deepseek on mobile...

Those MCP totally 10x my Cursor workflow…

Those MCP totally 10x my Cursor workflow…

MCP = Next Big Opportunity? EASIST way to build your own MCP business

MCP = Next Big Opportunity? EASIST way to build your own MCP business

Gemini 2.0 blew me away - The future of Multimodal Model

Gemini 2.0 blew me away - The future of Multimodal Model

This video tutorial teaches how to enhance chatbot UX by enabling it to respond with images using HTML to text library, Llama index, and GPT 3.5. It provides a step-by-step guide on how to build a Q&A chatbot that can respond with image references and utilizes the Prompt Engineering framework to design and optimize chatbot UX. By following this tutorial, viewers can learn how to improve the user experience of their chatbots and make them more engaging and informative.

Key Takeaways

Create an EMV file to store API key
Script raw HTML from website using HTML to text library
Convert raw HTML to markdown format to include image URLs
Create a vector index from markdown data
Build a Q&A chatbot that can respond with image references
Define a function to convert HTML to markdown using the `html2text` library
Create a function to convert relative URLs to absolute URLs using the `BeautifulSoup` library
Use the Llama index to create a vector index for similarity search
Load data from Airtable and Asana using the Llama index data loaders
Define a data retriever to extract relevant text data

💡 The key to enhancing chatbot UX is to enable it to respond with images, which can be achieved by utilizing the HTML to text library, Llama index, and GPT 3.5. The Prompt Engineering framework can be used to design and optimize chatbot UX, and the FAISS library can be used for efficient similarity se

🔒 Pro feature: Ask AI to explain this lesson →

More on: LLM Foundations

View skill →

Getting Started with Vertex AI Gemini 1.5 Flash

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

How to use the ChatGPT API with Python!!

How to use the ChatGPT API with Python!!

Nicholas Renotte

Gemini 2.5: Create an interactive plot of economic data

Gemini 2.5: Create an interactive plot of economic data

Google DeepMind

LangChain Chatbots: Building a Personalized AI Assistant

LangChain Chatbots: Building a Personalized AI Assistant

Analytics Vidhya

Auto-generating meeting notes with Python

Auto-generating meeting notes with Python

Related AI Lessons

Spring AI Tutorial — Your First REST Endpoint with OpenAI (2026)

Build a REST endpoint with Spring Boot 3 and OpenAI to create an LLM-powered API, leveraging the power of AI in your applications

10 ChatGPT Prompts for Job Seekers: Resumes, Interviews & Career Growth

Learn how to leverage ChatGPT for job searching, resume building, and career growth with 10 actionable prompts

Medium · ChatGPT

Lost in Transcription: The Week the Machine Started Lying

Learn how Whisper AI transcription can be flawed and understand the importance of validation in AI-generated text

From Sci-Fi to Source Code: Why the Future of LLMs Looks Like Pure Number Theory

Explore how number theory is revolutionizing Large Language Models, enabling more efficient and effective models

Chapters (8)

Intro

0:43 Why GPT chatbots don't return imgs

1:23 Case study overview

2:20 Step 1: Scrape HTML

4:03 Step 2: Convert HTML to Markdown

6:22 Step 3: Create vector index with llama index

7:10 Step4: Retrieval Augmented Generation (RAG)

8:43 Handle PDF docs

5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems

Dave Ebbelaar (LLM Eng)