Web Scraping Agent (Tutorial)
Key Takeaways
The video demonstrates building a web scraping agent using Flow ICI, Retriever tool, Vector store, OpenI embeddings, and Chibio, and showcases its application in scraping information, querying, and answering questions.
Full Transcript
in this demo I'm going to demonstrate how to build a simple scripper agent using flow ICI so I'm going to use agent flows right here so I'm going to select that and then add new so I'm going to start off from scratch so the first thing I need to do is go to this plus sign here and I'm going to drag an agent you can find the agent right under sequential agents right there so that's the agents part this agent has two things that I need to fill in which is access to the tools and the model that's going to guide its reasoning and so forth so let's work on the tools first so what I'm interested in building here is an agent that can pull information from the web so it can scrape pages in real time if I give it a link it should be able to go to pick up information from that website store that information and I should be able to query that information using this agent and I can update that database essentially specifically I'm going to use a vector store for this so let's build that out so there is a thing called retriever tool here I'm going to search Retriever and it's a tool right so it's expecting me to do a tool so this will build out the retriever component that I need but it needs this to connect things so this retriever has a name I can give it a name I could call it something like search web something like that and then here I could be very descriptive but I'm going to do something really simple so I'm going to searches and return related box something like that again you can experiment with this this is a first iteration of this project so I'm just going to leave it as is but you can experiment and iterate on this as much as you need after that I need a retriever so I'm going to go here and I'm going to select a retriever cancel that and what I'm looking for is a vector store let me go into Vector store here and there are different kinds of vector stores popular ones and there's an Open Source One F which I love to use because this project is going to work locally so I don't really need something too sophisticated here this is good enough for me I'm going to drag this here to Retriever and then I need to give it a basp so because this is the first time I'm collecting this information or scraping this information I actually need to index it and so I'm going to show you the process of indexing this I'm actually going to select here this directory so this needs to be a directory that exists so in this case you can see it here it's F index and I'm going to change this this is usually different folders I have for different projects and this one that I have is prompting so this already exists so F index prompting exists under documents here that's the directory all right so that looks good now I need to add document and embeddings for embeddings you can choose any embedding service that you already use or you can use something that's very common so I'm going to use open a embeddings I use open a mes a lot so this makes sense for me to use I'm going to drag this here and then here I'm going to select my credentials now you'll need to create your credentials here is very straightforward all you need is an API key go to your dashboard inside openi and create a key and bring it here and then just upload it here to connect your models here and this one is just the model name I'm not going to change that but you can experiment with the different ones that are available here so after that I need a document okay so for document what I need here is I'm not going to use any local PDF document or file I am actually going to use a scraper so this is why I said initially this was a scraper agent and this one that I'm using is called chibio there are a couple of them that are available here inside of blo eyes but this is the one I regularly use so this one connects to the document and then here is where I would provide the URL that I'm scraping the project that I will be scraping here is this prompt engineering guide and my idea with this agent is that I'm actually going to do something useful with that in fact I want to build something like an agent that can be used here within this documentation that we have on how to prompt models better there's a lot of content here and it's getting really hard to find key information so potentially what I want to build is like a chat agent that you can interact with and can give you different tips and can pull information from different places and so on that's kind of the rough idea and so what I want to do with this first iteration of this project is just work on this prompting guide scrape it and then test on it and see if you can answer basic questions related on the information that the prompt engineering guide already contains so the link to the project is prompt prompting guide I have it here already so that's the link and after that there is a text splitter and I recommend using a text splitter here so the one I want to use is recursive recursive recursive this is a very popular one to start very standard experiments I'm going to do that here I'm not going to discuss the different Splitters and so on that's probably another video that I'll be doing we want to chunk information reduce it you know to smaller chunks and then embed those information and index them in our Vector store here which is fice so the chunk size is 1,000 and then I can easily experiment with different chunk sizes but every time I change this I would have to reindex my information that's the only thing that I would like to say there so once I have that part then it looks like that's ready and the last part I want to add here is this part of this agent okay start because we need to tell the agent where it actually begins where the conversation begins so for this I need to go here and I need to direct it and so what I'm going to do is I'm going to use the start nodes and end nodes so the start node again starting point of the conversation this will tell the agent this is where we're starting and so I'm just going to connect that there and this start node actually requires a chat model there's also agent memory this is something for another video but this chat model here is I can go and select the openi chat model chat open a chat open AI so this is the one I'm looking for I'm going to drag that there and then I can go and connect this here so now I have my chat model this is the one that's going to be doing all the planning reasoning and enable the agent to call tools when it needs to so the connect credentials again I'm going to use my same credentials here and for the model usually want to pick a good model that's good at reasoning and planning so I'm going to select the latest one gp4 a mini this is cheaper as well but you can choose any model that you want or any provider that you're already using As for the expanation for the system prompt here also you can get very specific about what you want but I'm just going to keep it as is the default I think it's reasonable and then I'm just going to call this scraper agent so just going to name that here all right there's one more piece that's missing here we have the beginning of the conversation we also need the end of the conversation so I'm going to go here and let me see here I will put this here this is the end of the conversation the notes the end of the conversation now I need to connect this here and that should complete the workflow here okay so this is ready to go so what I'll do now is I'm going to demonstrate it to you but before I do that cannot forget the important part here which is I need to be able to Index this information before I'm able to chat with information so I'm going to go to manage links and you will see here that I can fetch links so this fetches the links this is a scraper again and it's going to fetch a few links here not all the links are here but I can manually add links if I want but these are okay because this is just for demonstration purposes so I'm going to go and save this so it's going to scrape those links so before it does that I'm going to save this we're going to call this graper agent prompting guide I'm going to save that all right so that's save and notice that this green icon appears and this is basically prompting you to Index this information so it's called upserting I'm going to go here and you will see that it's telling you you can upsert this information we can go and upsert here it takes a bit of time to do this it really depends on how much information it's indexing but you can see here what I really like about this is that it's telling you how much it added how much it updated skip deleted right so if you're reindexing you will see that it's telling you whether it's updating things and so on so this is information it picked up you can see that this is definitely a scraper it has all this JavaScript code here and it has information about the website itself and so on so that's really nice to see and there's a lot of information that is scraped that's ready to go so I can experiment with this now I'm going to close this and then I can save it again okay don't forget to save once I've saved this I think this is ready to be used and I'm going to try now so I'm going to go to this chat icon here everything in flowise is going to happen through this chat icon okay but the cool thing about flowise is that they have an API as well so you can use this entire workflow and interact with it via some API as well it's very flexible in that sense but I'm just going to use this chat interface because it's just easier for me to demo it so here I'm going to ask it a question what are the prompting techniques you recommend very simple question so it's taking some time because in needs to decide whether it's going to call tools and so forth all right so there is a bunch of information that it sent and let's look at it bit by bit here so this is a scraper agent okay so I'm going to actually enlarge this just so you can see it so the scraper agent calls a search web and the search web basically is telling you exactly that tool and the information is pulling okay so the input to that was prompting techniques you can see it there so all of that is handled by the scraper agent so this is the tool usage capability at play and it's telling the retriever system that this is what I want to search for search for that information and this is the information that was returned and so here we go and then the agent takes it so this is the language model at work here and here are some effective prompting techniques that can enhance your interactions with language models so it says fot prompting summarizes that gives me an example of that and then it says instruction based prompting so there's some instruction here Chain of Thought met up all these things we have in our guide so these are all great tips so this one experimentation well it's good advice on how to do prompt engineering but it's not necessarily A prompting technique specifically so I can be more specific with the question and just test the agent to see if it's doing the right job with the queries that's something that I have to evaluate on so that's the example this is a very basic example of how I scrape a website and I can now interact with it and in the future I want to build AA as an actual service so I'm working on that already and it will eventually be made available in the prompting guide so our users and learners can interact with such an agent and again pull information from the guide but also pull information from the web if it needs to pull that information from the web so that's an exciting project that I'm working on and I will have updates about that please leave your questions in the comments if this is interesting to you if you have any ideas or maybe any requests on this type of agents and I will look into those comments and see if I can build something out really quickly here using flowy or any of the other tools that I use to build agents I want for us to have deeper discussions in terms of design patterns to build these agents and so on this is something I'm working very hard on and we have developed courses as well if you interested in I'm going to leave a link in the description below to check out our recent courses on introduction to AI agents if this is something that interests you we use flowise for that course so it's the same tool that I'm showing you here that's it for this video thank you so much for watching please consider leaving a like And subscribe to the channel if you haven't and I will see you all in the next one
Original Description
Learn more about how to build AI agents in my new course: https://dair-ai.thinkific.com/courses/introduction-ai-agents
Use code AGENTS20 to get an extra 20% off. Limited time offer.
---
An agent with access to tools for scraping the web and enabling querying over that information.
#ai #chatgpt #artificialintelligence #tech
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Elvis Saravia · Elvis Saravia · 0 of 60
← Previous
Next →
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
101 ways to solve search (by Pratik Bhavsar)
Elvis Saravia
TLDR Generation of Scientific Documents | ML Interview #1 with Isabel Cachola
Elvis Saravia
Sentiment Analysis: Key Milestones, Challenges and New Directions
Elvis Saravia
Discriminative Adversarial Search for Abstractive Summarization (by Thomas Scialom)
Elvis Saravia
Question Understanding: COVID-Q: 1,600+ Questions about COVID-19
Elvis Saravia
Getting Started with NLP
Elvis Saravia
Building tools and frameworks for large-scale social media mining (by Dr. Juan M. Banda)
Elvis Saravia
TextAttack: A Framework for Data Augmentation and Adversarial Training in NLP
Elvis Saravia
Dive into Deep Learning (Study Group): Introduction to Deep Learning | Session 1
Elvis Saravia
Dive into Deep Learning (Study Group): Multilayer Perceptrons | Session 4
Elvis Saravia
How I read and annotate ML papers
Elvis Saravia
Keep Learning ML (Session 1) | DSV, CompLex, Modern tools for emotions
Elvis Saravia
Dive into Deep Learning (Study Group): Preliminaries | Session 2
Elvis Saravia
Keep Learning ML #2 | Language-conditioned policy learning, Effective ML Testing, EagerPy
Elvis Saravia
Dive into Deep Learning (Study Group): Linear Neural Networks | Session 3
Elvis Saravia
Dive into Deep Learning (Study Group): Multilayer Perceptrons | Session 4
Elvis Saravia
Keep Learning ML #3 | Contrastively Trained Structured World Models
Elvis Saravia
Dive into Deep Learning (Study Group): Deep Learning Computation with PyTorch | Session 5
Elvis Saravia
Dive into Deep Learning (Study Group): Convolutional Neural Networks | Session 6
Elvis Saravia
Dive into Deep Learning (Study Group): Modern CNNs | Session 7
Elvis Saravia
101 ways to solve neural search with Jina
Elvis Saravia
(Hopefully-Reusable) Life Lessons for PhD Students in NLP
Elvis Saravia
How to save the world and forward your career in 5 easy steps | Women in NLP Talks
Elvis Saravia
Prompt Engineering Overview
Elvis Saravia
Getting Started with the OpenAI Playground
Elvis Saravia
LM-Guided Chain of Thought
Elvis Saravia
Elements of a Prompt
Elvis Saravia
Reasoning with Intermediate Revision and Search with LLMs #chatgpt #ai #llms #science #programming
Elvis Saravia
General Tips for Designing Prompts
Elvis Saravia
Efficient Infinite Context Transformers #ai #machinelearning #research #llms #science
Elvis Saravia
Best Practices and Lessons Learned on Synthetic Data for Language Models #ai #machinelearning #genai
Elvis Saravia
Reducing Hallucinations in Structured Outputs via RAG #chatgpt #ai #llms #programming
Elvis Saravia
Basic Prompt Examples for LLMs
Elvis Saravia
LLM In Context Recall is Prompt Dependent #llms #ai #chatgpt #machinelearning
Elvis Saravia
Zero-shot Prompting Explained
Elvis Saravia
RAG Faithfulness #llms #ai #gpt4
Elvis Saravia
Understanding LLM Settings
Elvis Saravia
Llama 3 is here! | First impressions and thoughts
Elvis Saravia
Llama 3 is Here! #ai #llms #llama3
Elvis Saravia
Microsoft introduces Phi-3 | The most capable small language model?
Elvis Saravia
Microsoft introduces Phi-3! #ai #llms #microsoft
Elvis Saravia
Make Your LLM Fully Utilize the Context #ai #llms #machinelearning
Elvis Saravia
When to Retrieve? #ai #llms #machinelearning
Elvis Saravia
Training an LLM to effectively use information retrieval
Elvis Saravia
State-of-the-art open-source LLM judges #ai #machinelearning #gpt4
Elvis Saravia
Better and Faster LLMs via Multi-token Prediction
Elvis Saravia
AlphaMath Almost Zero #ai #science #machinelearning
Elvis Saravia
SWE-Agent | An LLM-based Software Engineering Agent
Elvis Saravia
[LLM NEWS] AlphaFold 3, xLSTM, OpenAI's Model Spec, DeepSeek-V2, OpenDevin CodeAct 1.0
Elvis Saravia
LLM-powered tool for web scraping #ai #chatgpt #engineering
Elvis Saravia
Learn about LLMs in this NEW course #ai #chatgpt #engineering
Elvis Saravia
[LLM NEWS] KANs, Gemma 10M Context, OpenAI Updates?, Automatic Prompt Engineering, Tokenizer Arena
Elvis Saravia
[LLM News] GPT4-o, Project Astra, Veo, Copilot+ PCs, Gemini 1.5 Flash, Chameleon
Elvis Saravia
Enhancing Answer Selection in LLMs #ai #machinelearning #engineering
Elvis Saravia
On exploring LLMs #ai #promptengineering #chatgpt
Elvis Saravia
Transformers Can Do Arithmetic with the Right Embeddings #ai #machinelearning #engineering
Elvis Saravia
[LLM News] xAI Series B, Codestral, LLM Guide, AutoGen Course, Symbolic Chain-of-Thought
Elvis Saravia
PR-Agent #ai #gpt4 #software
Elvis Saravia
Extracting features from Claude 3 Sonnet
Elvis Saravia
Has prompt engineering been solved?
Elvis Saravia
More on: LLM Engineering
View skill →
🎓
Tutor Explanation
DeepCamp AI