I Used AI To Scrape The Web & Write PDF Reports
Skills:
Tool Use & Function Calling90%Multi-Agent Systems80%Autonomous Workflows80%Agent Foundations70%
Key Takeaways
The video showcases GPT Researcher, an autonomous agent that generates research reports from a single prompt, aggregating information from visited websites and trained OpenAI models, using tools like GPT-Researcher, OpenAI API, Git, VS Code, Python 3.11, Conda, Pip, Selenium, and FastAPI.
Full Transcript
three simple prompts we now have a full research report we have various resources on how to create a tutorial and then we also have an outline of how we should create a structure the tutorial alright today we're going to look into GPT researcher another exciting tool that I've been wanting to dive into for a while so it's another GPT based autonomous agent that we can use to create research reports for US based on a single prompt so highlight for the architecture you provide a task it will generate research questions for us and then it will summarize all of the visited websites so it will use actual information from website not just from the trained openai models for example to get factual information and then it will aggregate that into a research report so very exciting and we're going to put it to the test see if it is actually useful now we are going to look at the code to try and do this but the nice thing is this they also made it available in a nice web portal so you can just go to this website I will leave the link in the description and you can put in your openai API key with gpt4 access so that is a must that is the only requirement that you need in order to follow along so you can use it through the browser so if you're not interested in the coding side of things just use this and you can try it out yourself but we're going to look at the hit the repository clone it and then run this little application locally so we can still use the the UI that they created but then we can also have a look at the code underneath all of this to kind of like see what what's going on and get a better understanding now still if you're kind of like new to coding you should still be able to follow along with this if you're interested in it and want to learn it all of the steps that I'm going to show you how to do with this within this tutorial they're very basic steps but if at any point you feel like this is kind of like too much for me I don't understand what even what python is or or pip then I would refer you to my free group data Alchemy this is completely free to join I will put a link in the description and in here I've specifically created the Alchemy codecs and I've created this because I found a lot of people watching my videos beginners feel like they can't follow the tutorials because there's this initial hurdle this these initial steps that you have to do to install python understand what git is so if that's you if you're interested in that want to follow along but don't know where to start check this out I will walk you through everything and then you should be able to follow everything that we're doing so to get started we have to clone the repository so I've created a YouTube Project folder already here let's open up the terminal and I'm going to get clone into this let's see I already got the full one so we're going to clone into the repository and so what we're now going to do I like to use vs codes for all of my coding work so I'm going to open up this repository in fias code and then let's see what the next steps are so we we go to the requirements so we have to create a new environment where we are going to install all the dependencies that we need so let's quickly do that and we see that we need a python 3.10 that should of course be 3.11 installation let me go to PS code I'm going to create a new terminal I'm going to say Honda create and I'm going to specify python 3.11 now again all of this I will show you how to do this within data Alchemy in the Alchemy codecs how you can work with python environments specifically with conda environments to make sure we work with isolated environments and now we're simply going to activate the environment that we've just created called my activates and now let's see now we're going to install a requirements so you can see these are all the libraries that we need right now so how do how do we do this we can go to or copy this pip install or requirements so let's do that make sure the environment is active so so we can run it in there so this will take some time to go through all of these libraries over here and install it within our python environment that we've just created alright so the installation is finished let's clear that up go back to the instructions okay so now we have to export our openai API key you just type export and then open AI API key and then this is where you would put in your API key and then hit enter so I'm going to do that now for my key alright so I just put it in clear the terminal again so make sure you have GPT 4 axis that is important and now let's go back to running the agent with fast API so we have another command that we can run let's put this in and now this will start to run the server on our local host and here it says that we run into an error this is something you could also potentially run into so let's see how we can quickly debug this and I definitely ran into some error some bugs here but I managed to solve them with a little chat GPT help of course so I'm not gonna bother you with all the details I will just list the steps that I had to follow maybe this is the case for you as well but first I had to install uvicorn with Konda within the environment then also we had an attribute error on the HTTP core Library which I could solve by running pip install and then forcing it to 0.15 which I've added to my required requirements file and then finally I also had to install Wheezy print and this is something they mentioned on the page here as well if you're having issues with Wheezy print you can follow the instructions it set with Mac do it with Brew but since we're using Konda for me it worked if I just ran this okay so quick little debugging tour this is what you run into using these open source tools depending on the system and the environment that you're using but right now getting back to the library I can now run the uvicorn main app reloads and this should start up the server on Port 8000 so let's click on that boom there we go so now you can see let me actually compare this to so we have the online version which is the app and now you also have the version that's long running locally and you can see we don't have to put in the openai API key because that is already in the system basically now then you would ask why go through all this trouble setting this up locally if you can also use it here just because we can and we want to understand this right and what we can now also do is actually look into the code over here and see what's going on maybe even make some changes all right but now let's get into some actual research all right so I've never used this tool so let's see what we can do so it starts off with what would you like me to research and then we can specify research report research resource report and outline report okay so let's see what we can do let's say we want to do some research and want to understand what is a large language model and then let's start with the report let's go so now let's see how long this will take and we're also going to closely monitor the costs let's see how long this takes all right there we go it's now starting to produce output so here you can see thinking about initiated agent and then it starts to create various research questions how does it work applications implications and criticisms and now if at any point you get stuck or it doesn't continue you can always look at the log files or at the terminal so we seem to be running in into another error over here where it says cannot connect to the surface selenium okay so this is something we have to look into okay so it's finally running right now we managed to solve the other bug which was due to the Chrome driver so like you're seeing they're quite there are quite some bugs that we ran into so it's it is not as straightforward as I thought this was so you can always refer to the web version but we are getting some results right now so it's still in progress but how we solve the bug it had what had to do with the Chrome driver which for me was installed so this is under selenium and under the Chrome driver and I if I open that up I would get a warning notification from Apple from on my MacBook not uh being able to open it and I had to go to the system settings then privacy security and then here it would show up and then I had to press like allow because it was blocking to open up the Chrome driver and then when I ran the script again a little pop-up game and I had to click oks but with that all out of the way we managed to get it work and it now completed so you can hear see in the logs all the URLs it went through so it also says on the get the page goes over 20 web sources which is pretty impressive and then here we can see we now have the a research report and we can either copy this to clipboard or we can download it as the PDF and voila this is amazing right we now have a report large language models and overview even with a source over here so here we can see an article that it used to let's see it use that so it it lists the reference here or references the article I should say the report in 2023 very interesting I expected it to include more references since we did went over 20 20 something URLs let's dive it a little bit deeper to see how it actually got to this information because we can find some more because if we look this is the cool thing about using the uh the local version we can open up this folder and we can see some more information so here are all the research questions so you have advantages and disadvantages other large language models work use cases and also what is a large language model and then here you can see also the the URL that it gots its information from so this is quite interesting something I thought I have to look into so it's generating text files and then it's combining it into a report so here you can also see some additional information so we have things like total work and time total run times as you can see this took 15 minutes so it's not a press of a button and it's completed it really does some extensive research scraping all the information from the site but this is of course only just the first try and I'm really going to read through this now see if it actually makes sense also go through the other materials and then also run some more experiments and get back to you with kind of like my observations another good thing to note is costs of of course so if we look into this this first iteration here you can see this costs like 50 cents or so with some trial and error so probably a little less because we ran into some bugs the first time okay so I've read through the report I've also done another iteration where we were using so this is actually the third one so I did one with the resource report and I'm now running an outline report so for the resource report we ask you to create a lang chain tutorial so then it gives us a source analysis with all kinds of resources that we can reference to potentially create a line chain tutorial if that's what we want to do and this is pretty cool these are some pretty interesting tutorials so it's referencing a quick start from Lang chain which is actually a really good place to start and then tutorials on free code Camp so I can see how this is how this can turn into a very interesting tool that you can use and also really I've read through through the whole report and it's it's really good it's written in such a way that it's very understandable it's not too long and I did not see any things or errors that to me like immediately stood out but this is the this is the tricky thing with outputs from large language models since they are trained to produce natural looking text you can quite quickly get tricked into this but since it's using all of the uh sources all of your websites to actually get factual information um I believe this is a very solid approach to to counter that so the report is actually really good all right now the outline report also finished and what you can see right here this is just an outline as you would expect for a report so also quite interesting so we have research report a resource report and an outline report okay so this is pretty cool right so how am I going to use this myself potentially and how could you maybe also use this well to me I think the report generation is really cool so I've looked at the one for the large language model like I've said this is actually a solid a solid report good information in there and I like how it just references the sources that it uses so this is something I could look into for for example research for these videos that I'm creating or research for educational content that I create where I just have an idea like hey I want to research something on this let GPT help me with that so that's that's one thing just general research I think it's actually pretty solid a very solid tool to do that but what I also find interesting is actually looking more into the code actually behind the scenes and what what's going on and then for example look into the prompts that they're using to create this and and more specifically the research agent itself and what I kind of would like to experiment with is use the the underlying technology and principles that they're using over here to maybe create some kind of like automated content bot this is something I've really been interesting to to explore and to create and maybe using the techniques that they're using over here you can like every day do some quick research on a certain topic and then try to create I don't know a tweet or an email newsletter or whatever kind of like content about that and in doing so you can get or for example list various resources websites trusted resources that you would like to get information from and in this way you have the uh you can create content in a very controllable way and avoid hopefully a lot of hallucination if we just use the the main models that have a training cut off somewhere 2021 so I think that could be quite quite interesting to see if we can yeah like I said look at the underlying code everything that's going on get some elements and turn that into a Content generation bot so those are really think I think the two ways you can use this tool so do quick research and get some nice PDFs that you can use for yourself or share with someone else or use this tool since it's all open source do you really understand what's going on and try to extract some features from it turn it into some kind of an application all right now and if that's something you're interested in creating applications with these large language models then again you should really check out data Alchemy so like I've said it's completely free to join and here in the classroom we have introduction Alchemy codecs that will teach you everything basically all my workflows that I use and then in a bit we will also have resources on building applications with large language models and more data science content as well so that's all pretty exciting so if you want to learn more about that then definitely make sure to check out the group alright that's it for now also make sure to like this video subscribe to the channel and then I'll see you in the next one [Music] thank you
Original Description
Let's dive into GPT Researcher to see what it can actually do and whether it is useful. GPT Researcher is an autonomous agent designed for comprehensive online research on a variety of tasks.
Want to start freelancing? Let me help: https://academy.datalumina.com/freelance
Want to learn real AI Engineering? Go here: https://academy.datalumina.com/accelerator
💼 Need help with a project?
Work with me: https://www.datalumina.com/
🔗 Links
https://github.com/assafelovic/gpt-researcher
https://app.tavily.com/
🕷️ Debugging
conda install -c conda-forge uvicorn
conda install -c conda-forge weasyprint
pip install --force-reinstall httpcore==0.15
Chrome driver: Apple System Settings → Privacy & Security → Allow
👋🏻 About Me
Hey there, my name is @daveebbelaar and I work as a freelance data scientist and run a company called Datalumina. You've stumbled upon my YouTube channel, where I give away all my secrets when it comes to working with data. If you want to learn more about what I do, then head over to https://www.datalumina.io/
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Dave Ebbelaar · Dave Ebbelaar · 49 of 60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
▶
50
51
52
53
54
55
56
57
58
59
60
How to Install Homebrew on Mac (Getting Started)
Dave Ebbelaar
How to Install Python on Mac (Homebrew)
Dave Ebbelaar
How to Install Anaconda on Mac (Getting Started)
Dave Ebbelaar
How to Set up VS Code for Data Science & AI
Dave Ebbelaar
How to Use Git in VS Code for Data Science
Dave Ebbelaar
Data Science Desk Setup to Maximize Productivity
Dave Ebbelaar
THIS Is How I Write Clean Data Science Code EVERY TIME
Dave Ebbelaar
Data Science Tutorial - Project Structure
Dave Ebbelaar
Changing rcParams for Better Data Science Plots | Matplotlib Tutorial
Dave Ebbelaar
How to Read Excel Files with Python (Pandas Tutorial)
Dave Ebbelaar
My Data Science Journey (Zero to Freelance)
Dave Ebbelaar
How I Automate Data Visualization in Python
Dave Ebbelaar
16 Apps I Use Daily as a Data Scientist
Dave Ebbelaar
How to Manage Conda Environments for Data Science
Dave Ebbelaar
How to Export Machine Learning Models in Python
Dave Ebbelaar
VS Code Speed Hack for Data Science
Dave Ebbelaar
17 VS Code Tips That Will Change Your Data Science Workflow
Dave Ebbelaar
How to Predict the Future with Python (Forecasting Tutorial)
Dave Ebbelaar
How to Use Python Environment Variables
Dave Ebbelaar
7 Data Science Tips for Beginners in 2023
Dave Ebbelaar
How to Effectively Use the Data Science Lifecycle
Dave Ebbelaar
Full Machine Learning Project — Coding a Fitness Tracker with Python (Part 1)
Dave Ebbelaar
Full Machine Learning Project — Processing Raw Data (Part 2)
Dave Ebbelaar
Full Machine Learning Project — Data Visualization with Matplotlib (Part 3)
Dave Ebbelaar
This Will Change Data Science as We Know It (ChatGPT)
Dave Ebbelaar
Full Machine Learning Project — Detecting Outliers in Sensor Data (Part 4)
Dave Ebbelaar
Full Machine Learning Project — Low-pass Filter & Principal Component Analysis (Part 5a)
Dave Ebbelaar
Full Machine Learning Project — Fourier Transformation & Clustering (Part 5b)
Dave Ebbelaar
Full Machine Learning Project — Predictive Modelling (Part 6)
Dave Ebbelaar
Automate Machine Learning with ChatGPT
Dave Ebbelaar
Scraping Web Datasets for Data Science Projects
Dave Ebbelaar
Full Machine Learning Project — Counting Repetitions (Part 7)
Dave Ebbelaar
How to Use GitHub Copilot for Data Science (Python + VS Code)
Dave Ebbelaar
Every Beginner Data Scientist Should Understand This
Dave Ebbelaar
Revealing My New AI-Powered Data Science Workflow
Dave Ebbelaar
Auto-GPT Tutorial - Create Your Personal AI Assistant 🦾
Dave Ebbelaar
Build Your Own Auto-GPT Apps with LangChain (Python Tutorial)
Dave Ebbelaar
Building Slack AI Assistants with Python & LangChain
Dave Ebbelaar
ChatGPT Code Interpreter - Goodbye Data Analysts?
Dave Ebbelaar
How to Deploy AI Apps to the Cloud with Flask & Azure
Dave Ebbelaar
How to Build an AI Document Chatbot in 10 Minutes
Dave Ebbelaar
Is Falcon LLM the OpenAI Alternative? An Experimental Setup with LangChain
Dave Ebbelaar
GPT Engineer... Generate an entire codebase with one prompt
Dave Ebbelaar
Pandas DataFrame Agent... the future of data analysis?
Dave Ebbelaar
OpenAI Function Calling - Full Beginner Tutorial
Dave Ebbelaar
How to use ChatGPT's new “Code Interpreter” feature
Dave Ebbelaar
LangChain just launched their new "LangSmith" platform
Dave Ebbelaar
How I'd Learn AI (if I could start over)
Dave Ebbelaar
I Used AI To Scrape The Web & Write PDF Reports
Dave Ebbelaar
LangSmith Tutorial - LLM Evaluation for Beginners
Dave Ebbelaar
7 Lessons for New AI Engineers - Beginner’s Guide
Dave Ebbelaar
The Rise of the "New-Age" Machine Learning Engineer
Dave Ebbelaar
OpenAI Assistants Tutorial for Beginners
Dave Ebbelaar
How To Connect OpenAI To WhatsApp (Python Tutorial)
Dave Ebbelaar
How to Build Chatbot Interfaces with Python
Dave Ebbelaar
PostgreSQL as VectorDB - Beginner Tutorial
Dave Ebbelaar
My MacBook Setup (as a coder & business owner)
Dave Ebbelaar
Easiest Way to Connect AI Chatbots to WhatsApp
Dave Ebbelaar
ClickUp Tutorial - What Is ClickUp Brain? 🧠
Dave Ebbelaar
My Development Workflow for Data & AI Projects
Dave Ebbelaar
More on: Tool Use & Function Calling
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
My agent kept reading data it wasn't allowed to. The prompt was never going to stop it.
Dev.to AI
8 Must-Know AI Chatbot Tools That Actually Help Small Businesses
Dev.to AI
Agent-Ready Commerce, Part 9: Evidence and Audit Are Part of the Product
Dev.to AI
Agent-Ready Commerce, Part 8: Generated Claims Need Review, Evidence, and Expiry
Dev.to AI
🎓
Tutor Explanation
DeepCamp AI