I build an autonomous researcher via GPT | Langchain ⛓️ Tutorial
Key Takeaways
The video demonstrates building an autonomous researcher using GPT, Langchain, and other AI tools to perform research on any given topic and generate a report. It covers the use of Serp for searching relevant articles, large language models for summarization, and Streamlit for creating a user interface.
Full Transcript
during the weekend I built this ai-powered autonomous researcher who can help you do research on any given topic and I'm going to show you how can you do this step by step I receive email from signal AI which is made by Greg Cameron where you talk about latest AI business ideas and this is the one that really caught on my eyes which is autonomous researcher the idea is simple imagine you use GPT to build an agent that can do research on any given topic and then generate a high quality content based on it Michael has this idea about use GPD to write Twitter stress every other day about how a famous people got rich embed you be at 100K followers in just one year when you think about it this idea is really expandable it's not just for Twitter stress it can be for so many different purposes for example Market or competitor analysis you can give GPD access to your competitors Discord channel so that it can monitor all the feedback and this is a prime use case that GPT are so capable to do but no product out there yet that's why during the weekend I built this AI powered autonomous researcher who can help you do research on any given topic and I'm going to show you how can you build this step by step this autonomous researcher should do two things it refers to get the URL link of the best articles for a certain topic and then we will use a summarization chain to summarize all those information into a Twitter stress initially I will try to use floor wise and land flow to build a whole autonomous research agent but unfortunately there are quite a bit of limitations of what I can do there for example and try to use this conversation agent that has access to the web browser to do multiple searchs with a given topic and return the URL links of those best articles however it is very hard to control what results it actually returns even though I give a very specific instruction that give me the URL of the best articles that talk about how Jeff Bezos get rich it never returned me a proper link just give me the summarization instead or when it actually give me the URL the URL didn't actually exist and that is quite annoying because I know for sure the agent had access to the link as part of context but unfortunately I couldn't figure out a way for it to return the URL links that's why instead of use the level of floor wise I decided to use launching to build up this whole atoms agent and you stream that to quickly build a user interface for people to put in the inquiry let's think sure how we're going to do this step by step firstly we will use a service called serp which will help you search relevant articles on the internet and then we will pass that list of Articles to a large language model and give it prompts so that it can choose the best articles out of this list there were scraps content from each article and use GPT to make a summaries of each article in the end we're gonna turn all those summarizations into his Twitter stress let's get to it firstly we're going to use Serve to get a list of relevant articles from internet service blog post service for you to do Google Search and return the list of readouts so we're going to create account and get API key from serp firstly we're going to import a few libraries and I'm going to explain how we're going to use those libraries then next thing is your create.env file this is a place where we will store the credential and API key so you just copy your API key here from serp and we'll do this which will load the environment and we'll redo this to get the API key to this variable and then next we're going to create a function to get a list of random articles from the server so it will use serps API pass on the query and API key and let's try this foreign Bezos get rich and a list of different answers cool and next we're gonna pass all those data to the large language model and make it choose what's the best article that can help us extract information about the topic to do that we're going to use a few more libraries and this is a second function find best articles URLs where firstly turn the search results from a Json file to a string so we can pass on to a large language model we will create a larger language model with GPT 3.5 turbo we will create template that UI world-class journalist and researcher you're extremely good at find most relevant articles to a certain topic and will pass on the response from the serp and above is the list of search results for the query please choose the best three articles from the list return only an array of URLs and we're going to create it prompt template from this with two input variables and from there we will create an llm chain so it can extract the URLs of best articles in the end we're going to turn those response which probably is a string into an array okay now let's try this first I create test data here so it'll be easier for us debug and then let's making sure we add open AI API into the EnV file and then put the open AI API key here at top now let's run this okay so you can see we pass on those research data and into the large language model and here you go it does return a few different links and let's say if those links are legit yeah okay so those are actually legit articles about why Nvidia stock price is sorry so next step we're going to do is fetch the data from each of those website let larger language model to do a summary of each piece of this articles to do that we're going to use one of the dot loader from launching code URL and this will basically load HTML docs from the list of URLs so firstly we will import that top loader and then move here create a function put get content from URLs so we'll get this URL passed on to this function and this should get the content let's just try this again foreign ER Returns the content of the whole website and put into a array and then next thing we're gonna do is to summarize those content those contents sometimes can be quite big so we want to break down this content into small chunks and Summarize each chunk and here is a function called summarize and we will import another Library called text splitter and this will basically break down a big amount of text into small ones so we firstly loaded text splitter and we're going to use character text Builder separator means when it's gonna separate the content and then chunk size is how big the size is I choose 3000 because the token limit of GPD 3.5 is around 4000 and we're around the text splitter of the data that we got from the content and after that we're going to create large language model with GPD 3.5 and the template here is we're gonna first say pass in the text data and your world-class journalist and you will try to summarize tags above in order to create Twitter threats about certain things and please follow all the rules below this is kind of the format I found quite useful firstly making sure the content is engaging informative with good data and then the content shouldn't be too long it should be no more than three to five tweets it should address the topic very well it needs to be viral and get a list that list of 1000 likes it needs to be written in a way that is easy to read and understand and it should give audience some actionable device insights in this summary so this is a problem I come up with I'm sure you can fine tune it and get a better one as well and again we'll go create a prompt template from this with two input variables we will create a summarizer chain and then I'll create a array of summarize for each chunk of hex then for each Chunk in the text and for each Chunk in this text array we're gonna run this summarizer chain and add results into the array we have so this should make a summary of each chunk that we split up from the content we got from the internet and let's run this okay so now you can see it start running that it passes a chunk of the content and then give this instruction about summarizing and once it finished it moved to the next chunk and you can see it continue doing so for the demo purpose I'm going to stop here because I know it's going to continue on the other hand this might not be the most efficient way into not doing the summarization you can actually have another option called summarize chain which kind of doing the same thing but I just implemented from scratch it should give very similar results so last thing we're going to do is feed all those summary chunks into a large language model and ask it to create a viral Tweeter stress from all those contacts this is a function that we're gonna use called generous stress so we will pass on the array of summaries as well as a topic that we are researching about firstly we're going to turn this array of summaries into a string and then we'll do same thing get large language model from GPD 3.5 giving new templates I will pass on all the summary string and your world-class journalists and Twitter influencers text above is some context about the topic we are searching please write a viral tweet twitter stress about this topic using the text above and following all the rules below at this didn't spend too much time on this prom itself I think you can really fine tune this prompt to make this any results much much better and in the end we're just gonna run this larger nature model which should generate a Twitter stress from here and let's run this so we have a summary and we're gonna do and then we've got summary and we're gonna use this generates thread and print to the thread all right let's try this so one thing I noticed that the content we got from HTML dock it does has a lot of unnecessary text so if we want to optimize it I think you can also try to do some filtering on the content as well okay great so you can see here it feeds the summary of all the chunks to the large language model and then ask it to create it to your stress and this is a Twitter thread it generated Nvidia stock price has been throwing recently and there are a few reasons why so you can see look at detail and it's pretty decent and try to get a lot good data as well like her direct grows or 50 in the next financial year okay great so now we get everything working and it actually generate a Twitter stress the last thing we're going to do is add a user interface so that people can play with it and as mentioned I'm gonna use streamlined to quickly create a user interface so we will import streamlit if you haven't installed it you will need to pip install streamlit or do this and the create a function okay so this is user interface that we just created we will use ST which is streamlit set page config to set up title and icon of the page and create a header of the page which autonomous researcher through this threads and we will create a St dot text input this is what basically created Tech input in the page ask for the topic of Twitter threats and we'll pass on the results of input to query and if the query exists then we will start display so we'll firstly display some content St Wright generating Twitter stress for the parade and then we'll use St dot expander this basically created accordion on the interface and we're gonna create five of them to display the results that we got from each step and that's pretty much it so all we need to do now is run app again all right we will use streamlit round.app.py foreign so let's say I want to create a Twitter Thread about why did Nvidia stock go up and once I put in here enter it should start running as you can see on the top right corner and if we go back to the terminal you can see it start running and we just need to wait for it to finish then it will start showing the results on the interface all right you can see on this page you start displaying the results about what kind of search results we got what's the three best article he thinks can help us understand why did Nvidia stock go up and then this is data we extracted and this is summary we got from all those content in the end it feeds to the larger launch model and it generates pretty decent Twitter threats the Twitter has a good hook it gives all the good reasonings about why did it go up as well as with good data to backup so here is your autonomous researcher and you can imagine I can start using this to research for all type of different topics and write 10 or even 20 Twitter shreds per day on the other hand I also want to share a platform that I found is pretty powerful and allow you to build those autonomous researcher use case very easily with no code UI I try to review the whole autonomous researcher that building launching in relevance Ai and it only took me like 10 minutes because it provides a lot of different building blocks that are similar to land chain the providers UI that is much more flexible than floor wise so in here I can just build exactly the same functions so first they make a API call to server which will get the URL of Articles relevant to my topics and then I will use GPT to pick up the battery articles and return their URL they provide function to extract website text with browse less so I can pass on URL and get the content out and then I will combine the error together and split them into different chunks of text so that I can use this to pass on to the large language model to summarize for each chunk and in the end I'm going to combine the summary of every single chunk into one string and pass on to our final prompt that writing the Twitter stress and what's cool about it is very easy to debug because they provide a preview of the data they've got for each step and also once I finish I have this website that I can directly use and share with others to do exactly the same thing that I built with streamlit so highly recommend I put the link to the autonomous researcher prototype I just built in the description so you can try it out yourself I'm pretty excited to see what other type of autonomous researchers that people start building especially when you start giving access to more and more different data source and as always comment below for any question you have see you next time
Original Description
Imagining an autonomous researcher who can do research on any given topic for you and generate a amazing report 24/7; I built it over the weekend, with attempts of Langchain, Langflow & flowsise.
Join my community: https://www.skool.com/ai-builder-club/about
Follow me on twitter: https://twitter.com/jasonzhou1993
Join my AI email list: https://www.ai-jason.com/
My discord: https://discord.gg/eZXprSaCDE
🔗 Links
- Autonomous researcher demo: https://jayzeedesign-autonomous-researcher-app-g1pll8.streamlit.app/
- Github link: https://github.com/JayZeeDesign/autonomous-researcher
- Langchain: https://python.langchain.com/en/latest/index.html
- Langflow: https://github.com/logspace-ai/langflow
- Flows: https://flowiseai.com/
- Relevance AI: https://relevanceai.com/
- Early signal: https://earlysignal.ai/
⏱️ Timestamps
0:00 Use case: Autonomous researcher
1:05 Langflow & flowise limitations
2:15 Tutorial overview
2:40 Step 1: Search internet
3:35 Step 2: Scrape data
5:08 Step 3: Split text
5:49 Step 4: Summarise each chunk
8:05 Step 5: Write twitter thread
9:45 Step 6: Build UI with Streamlit
12:00 No code tool alternative: Relevance AI
👋🏻 About Me
My name is Jason Zhou, a product designer who share interesting AI experiments & products. Email me if you need help building AI apps! ask@ai-jason.com
#langchain #autogpt #ai #nocode #tutorial #stepbystep #langflow #flowise #gpt #researchergpt #relevanceai #twitterbot #twitter #twitterthreads
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from AI Jason · AI Jason · 5 of 60
1
2
3
4
▶
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Build Your Own Auto-GPT Apps without coding Step by Step (Dust.tt Tutorial)
AI Jason
AutoGPT tutorial: Build your personal assistant WITHOUT code (Via Relevance AI)
AI Jason
Create your own AI girlfriend that talks ❤️
AI Jason
How to build with Langchain 10x easier | ⛓️ LangFlow & Flowise
AI Jason
I build an autonomous researcher via GPT | Langchain ⛓️ Tutorial
AI Jason
Smol AI tutorial in 5 mins | Build ENTIRE codebase with a single prompt
AI Jason
Hugging Face + Langchain in 5 mins | Access 200k+ FREE AI models for your AI apps
AI Jason
How to let GPT control anything & 10x powerful | 8 mins tutorial about GPT funtion calling
AI Jason
Extract data & automate EVERYTHING | 10x GPT function calling power
AI Jason
Finally, an AI agent that actually works
AI Jason
"okay, but I want GPT to perform 10x for my specific use case" - Here is how
AI Jason
"Wait..this AI Agent does research for you 24hrs without hallucination?!" - Here is how
AI Jason
"How to give GPT my business knowledge?" - Knowledge embedding 101
AI Jason
“Automation 2.0 coming…No more boring data entry job”
AI Jason
"How to 10x chatbot UX? 🤖 🖼️ " - Add Image Responses to GPT knowledge retrieval apps
AI Jason
“LLAMA2 supercharged with vision & hearing?!” | Multimodal 101 tutorial
AI Jason
"Next Level Prompts?" - 10 mins into advanced prompting
AI Jason
Build AI agent workforce - Multi agent framework with MetaGPT & chatDev
AI Jason
How to scale your AI automation pipeline
AI Jason
AI agent manages community 24/7 - Build Agent workforce ep#1
AI Jason
Autogen - Microsoft's best AI Agent framework that is controllable?
AI Jason
StreamingLLM - Extend Llama2 to 4 million token & 22x faster inference?
AI Jason
AI agent + Vision = Incredible
AI Jason
After 7 days letting AI agents control my email inbox... 📮
AI Jason
How to use New OpenAI DevDay features - GPT4V x TTS demo tutorial
AI Jason
What is Q* | Reinforcement learning 101 & Hypothesis
AI Jason
"Research agent 3.0 - Build a group of AI researchers" - Here is how
AI Jason
GPT4V + Puppeteer = AI agent browse web like human? 🤖
AI Jason
Real Gemini demo? Rebuild with GPT4V + Whisper + TTS
AI Jason
AI Robot's ChatGPT moment at 2024?
AI Jason
GPT5 unlocks LLM System 2 Thinking?
AI Jason
The REAL cost of LLM (And How to reduce 78%+ of Cost)
AI Jason
OpenAI's Agent 2.0: Excited or Scared?
AI Jason
Real time AI Conversation Co-pilot on your phone, Crazy or Creepy?
AI Jason
INSANELY Fast AI Cold Call Agent- built w/ Groq
AI Jason
AI Employees Outperform Human Employees?! Build a real Sales Agent
AI Jason
Future of E-commerce?! Virtual clothing try-on agent
AI Jason
Unlock AI Agent real power?! Long term memory & Self improving
AI Jason
"I want Llama3 to perform 10x with my private knowledge" - Local Agentic RAG w/ llama3
AI Jason
“Wait, this Agent can Scrape ANYTHING?!” - Build universal web scraping agent
AI Jason
"Make Agent 10x cheaper, faster & better?" - LLM System Evaluation 101
AI Jason
Claude 3.5 struggle too?! The $Million dollar challenge
AI Jason
Make your agents 10x more reliable? Flow engineer 101
AI Jason
"I want Llama3.1 to perform 10x with my private knowledge" - Self learning Local Llama3.1 405B
AI Jason
AI process thousands of videos?! - SAM2 deep dive 101
AI Jason
"Wait, I'm using OpenAI Structured Output wrong ?!" - Advanced Structured Output tutorial
AI Jason
How to use Cursor AI build & deploy production app in 20 mins
AI Jason
Best Cursor Workflow that no one talks about...
AI Jason
This is how I scrape 99% websites via LLM
AI Jason
Better than Cursor? Future Agentic Coding available today
AI Jason
EASIEST Way to Train LLM Train w/ unsloth (2x faster with 70% less GPU memory required)
AI Jason
1000x Cursor workflow for building apps
AI Jason
Easiest way to build fancy UI with Cursor/Windsurf/Bolt/Lovable
AI Jason
From $0 to $4m with just 2 people (ComfyUI Crash-course for E-commerce)
AI Jason
Deepseek R1 - The Era of Reasoning models
AI Jason
Yep, o3-mini is WORTH the money - Build your own reasoning agent
AI Jason
The ONLY way to run your own Deepseek on mobile...
AI Jason
Those MCP totally 10x my Cursor workflow…
AI Jason
MCP = Next Big Opportunity? EASIST way to build your own MCP business
AI Jason
Gemini 2.0 blew me away - The future of Multimodal Model
AI Jason
More on: Tool Use & Function Calling
View skill →Related Reads
📰
📰
📰
📰
AI Server Cooling Evolution: From Air Cooling to System-Level Thermal Engineering
Medium · AI
I Would Not Mind Being Stuck on Opus 4.8 Forever
Medium · AI
How I Built a Free Online Image & PDF Processing Platform with Vue 3 + FastAPI
Dev.to · IAMUU
I Built a Free AI-Powered YouTube SEO Toolkit With Zero Budget. Here’s What Actually Happened.
Medium · Startup
Chapters (10)
Use case: Autonomous researcher
1:05
Langflow & flowise limitations
2:15
Tutorial overview
2:40
Step 1: Search internet
3:35
Step 2: Scrape data
5:08
Step 3: Split text
5:49
Step 4: Summarise each chunk
8:05
Step 5: Write twitter thread
9:45
Step 6: Build UI with Streamlit
12:00
No code tool alternative: Relevance AI
🎓
Tutor Explanation
DeepCamp AI