PDF Summary with LLMs in Python - LangChain Tutorial

NeuralNine · Beginner ·🧠 Large Language Models ·1y ago

Key Takeaways

This video demonstrates how to use LangChain with OpenAI, Anthropic, and Llama to summarize PDF documents in Python, covering topics such as large language models, prompt engineering, and natural language processing.

Full Transcript

what is going on guys welcome back in this video today we're going to learn how to summarize PDF documents using large language models and L chain in Python so let us get right into it [Music] noted all right so we're going to learn how to summarize PDF documents in Python by using large language models and L chain and for this I have two documents in this directory here I have one time the attention is all you need paper which is quite popular a lot of you guys will already know about this um it's the Transformer paper basically this is the first one and the second one is a paper that I got a random paper that I got from archive uh today I think it was published today or at least quite recently I called it new.pdf I'm not even sure what it is about it's a pre-trained graph-based model for adaptive sequencing of educational documents so that's just some paper that was uh released quite recently I wanted to use something that is not already the knowledge base off uh the language models and we're going to summarize the essence of these papers we're going to try to do that using large language models we're going to do that with GPT and we're going to do that with claw so we're going to use the open AI models and also the anthropic models uh I want to show you how to do that and for this we're going to need a bunch of uh packages here on our system so let's zoom in here a little bit so you can see this better we're going to start by installing the following packages first of all python d.n we want to use this because I'm going to load the API keys from a n file which I already have in this directory um we're going to also use lank chain we're going to use lank chain Das Community we're going to use lank chain Das openai and we're going to use l chain uh- anthropic so just run this make sure everything is installed actually Community now is downloaded here so I think there was an update uh hopefully this doesn't destroy my code that I've prepared but these are the packages that we're going to need now what you also need is the API key so how do we get an API key you just go to the open AI API here you can just create a new secret key you go to settings you go to API Keys create a new secret key copy it for anthropic similar you go to console you go to settings you go to API Keys you create a key you copy it that's the basic thing and how you get this now into your code is you create a file called n so do NF like this um and I'm going to now create a different file let's call it N2 in this file what you want to do is you want to create an open AI API key uh variable and you want to set it to your API key whatever it is and you want to do the same thing for anthropic API key again some stuff then you want to save this and then we can get started with our code and our code is going to be actually not too comprehensive so we're going to open up a main.py file I'm going to start by saying import OS because we want to get environment variables I'm going to import um actually fromn I want to import load. NF and then we want to import the lank chain related stuff so we want to say from lank chain we want to import from the chains from chains summarized we want to import the load summarized chain so this is basically a pipeline that will uh summarize our text I'm going to explain in a second um what we're going to do here exactly then we're going to say also from L chain Community um I want to or actually from L chain Community document loaders uh I want to import Pi PDF loader this is going to allow us to load the PD uh the the PDF document and to um to to split it into chunks and then we're going to also say from Lang chain open AI I want to import the chat open aai so this is how it's done right now often times things are changed in Lang chain so maybe when you're watching this a year or two from now this might have changed the syntax might have changed uh often times what you can do though is you can run the command open AI migrate I'm not sure if it works with L chain but often times it does work um with open AI related code so maybe you have some success with doing that uh but what we want to do now is we want to load the API key so we're going to say load. n this is going to load the keys from the end file into our environment and then what we want to do is we want to um this is automatically going to be recognized by chat open AI so I don't think we need to specify this uh but we're going to define a function now summarize PDF and this function is going to just get a file path um and if we want you can also add a custom prompt so let's just add it here as a uh as something that can be done I'm going to leave it empty by default just so we have some more control over the process and what we want to do in this function now is we want to load the PDF uh document so we want to say Pi PDF loader and we want to focus on the file path here and then we want to load the PDF document as chunks we want to split it into chunks so we want to say here the documents so the chunks are also documents want to say are equal to loader. load and split and this basically takes the PDF document and splits it up into manageable pieces and what we do now is we summarize these manageable pieces and then we combine them to a complete summary to a final summary so we're going to say here our large language model is going to be chat open AI the temperature is going to be set to zero so we don't want to make it crazy or creative and the model name now you can choose whatever model you like you just need to be aware that you pay for this um and also that the newer models or the more advanced models are often times slower so if I do something like GPT 40 this works but it will take more time than something like GPT 3.5 turbo so just for the sake of speed here I'm going to use GPD 3.5 40 is probably going to give you better results so consider using that if you have time but for the video now I want to use uh 3.5 turbo and now I want to do this um summarization chain so I'm going to say chain is equal to um to load summarize chain and I'm going to pass here the large language model and then I'm going to say chain type equals map reduce so the basic idea here is we're uh mapping the summarization so we're summarizing the individual uh chunks that we have individual documents and then we're reducing them together we're merging these summaries into a final summary that's like the high level explanation of what's going on here and the resulting summary is just going to be the chain invoked onto the documents actually I don't know why I had this custom prompt here let's just get rid of this um so I'm going to return the summary here in this case and then we're just going to run this in a main section and uh we're going to say that our we want to summarize our attention is all you need document so attention. PDF notp PDF loader. PDF there you go and I want to print then that the summary is the following and then just summary and to be precise we're going to get a Json object and from this Json object we want to have the output uncore text that is going to be the raw text of the response so if everything was done correctly we can now run this Python 3 Main py and it's hopefully going to give us uh a summary of that now while we're doing that we can adjust the code to also work with um with clot so with anthropic and for this we're going to jump into the same directory that we're coding that we were coding in and we're going to uh copy now the main and we're going to copy it to main clot py now we do have a result here so let's take a look at that um it says the paper by this guy introduces the Transformer model which utilizes attention mechanisms for sequence modeling and transduction transduction tasks out before performing traditional models in machine translation and so on and so forth so we have here the gist of the paper just giving us uh a broad summary of what this paper is about um summarized into like two three four sentences something like this uh now let's see if we can also get such a summary for the other one new.pdf let's run this and while we're doing that let's um rewrite the whole thing now for CLA for this all we need to do is we need to change the model object obiously so we're going to say main Cloud py what I want to do here is instead of from L chain open AI importing chat open AI I want to say from L chain uh anthropic I want to import the chat anthropic and uh then I just want to replace chat open AI with chat en Tropic temperature zero model name of course has to be changed as well to for example a fast model is 35 hiou latest and the rest if I'm not mistaken stays the same so let's just save this and let's just run this Python 3 main CLA py let's also see now what we have here this study introduces a data efficient framework for learning path personalization in M's using a pre-trained recommender system with reinforcement learning it formulates the problem as a mark of decision process and so on and so forth so it gives us again a high level overview of the paper um let's see what Claud does and let's see if probably I think there's a problem with the with the maxed chunk size or with a maxed uh context window I'm not sure about this but we're going to see I think this takes a little bit longer than the GPT models but in general this is how you do that and you can do the same thing also with local models you can do it with llama if you have o llama running uh basically you just have to say from lank chain was it L Chain O Lama or was it l chain Community o Lama I'm not sure um let me just see is it from Lang chain actually I need to create a python file to be able to see that so let's jump into this directory here again uh oh it starts downloading stuff interesting so let's see if that is going to give us a result now but basically okay let's first take a look at this this is now a more lengthy summary so we have here's a concise summary of the Transformer paper the Transformer is a novel neural network architecture for sequence transduction so on and then key features include the model achiev state-of-the-art performance actually to be honest this is a much better summary than what we got from GPT so I want to run this again uh but also for the other file but it doesn't surprise me that clot outperforms GPT here this is usually the case nowadays at least wasn't like that in the beginning in my op opinion uh but let's see if I can figure out I'm not going to code this here but let's just say py I think it's from Lang [Music] chain uh yeah we do have Lang Chain O Lama lm's import do we have some autoc completion here AMA llm probably something like this uh just look it up I'm sure you can find an easy way to do the same thing with your local large language models but this is very useful because it gives you a quick start when you're looking for papers also what you can do is of course you can automate this so you can just uh put a loop around it you can say for example if you like clot more you want to use the summarized PDF function so what you do is you just have a directory full of 20 files or 30 files or 100 papers and you just say for I in or four file in OS list there and then all these PDF documents and then what you do is you just summarize them you get the summaries you save them somewhere and then you have very quick descriptions of the papers and you don't have to open them you don't have to read the abstract you don't have to read the conclusion you can just look at this small summary and see if it's relevant to your project or not and that's it I think that's a very useful thing so let's see what we get here now as a response uh a novel approach to personal personalizing learning paths and online education and then we get again here the uh research Focus we get the methodological Innovations we get the distinctions and we get also the results I think Claude does a very good job at summarization and this is how you can easily summarize PDF documents in Python using large language models so that's it for today's video I hope you enjoyed it and hope you learned something if so let me know by hitting a like button and leaving a comment in the comment section down below and of course don't forget to subscribe to this Channel and hit the notification Bell to not miss a single future video for free other than that thank you much for watching see you on the next video and bye

Original Description

In this video, we learn how to summarize PDFs easily using LLMs and LangChain in Python. ◾◾◾◾◾◾◾◾◾◾◾◾◾◾◾◾◾ 📚 Programming Books & Merch 📚 🐍 The Python Bible Book: https://www.neuralnine.com/books/ 💻 The Algorithm Bible Book: https://www.neuralnine.com/books/ 👕 Programming Merch: https://www.neuralnine.com/shop 💼 Services 💼 💻 Freelancing & Tutoring: https://www.neuralnine.com/services 🌐 Social Media & Contact 🌐 📱 Website: https://www.neuralnine.com/ 📷 Instagram: https://www.instagram.com/neuralnine 🐦 Twitter: https://twitter.com/neuralnine 🤵 LinkedIn: https://www.linkedin.com/company/neuralnine/ 📁 GitHub: https://github.com/NeuralNine 🎙 Discord: https://discord.gg/JU4xr8U3dm Timestamps: (0:00) Intro (0:18) Installation & Setup (3:00) PDF Summary with GPT (9:27) PDF Summary with Claude (13:46) Outro
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from NeuralNine · NeuralNine · 0 of 60

← Previous Next →
1 Visualizing Stock Data With Candlestick Charts in Python
Visualizing Stock Data With Candlestick Charts in Python
NeuralNine
2 Python Beginner Tutorial #1 - Installation and First Program
Python Beginner Tutorial #1 - Installation and First Program
NeuralNine
3 Python Beginner Tutorial #2 - Variables and Data Types
Python Beginner Tutorial #2 - Variables and Data Types
NeuralNine
4 Python Beginner Tutorial #3 - Operators and User Input
Python Beginner Tutorial #3 - Operators and User Input
NeuralNine
5 Python Beginner Tutorial #4 - If Statements and Conditions
Python Beginner Tutorial #4 - If Statements and Conditions
NeuralNine
6 Python Beginner Tutorial #5 - Loops
Python Beginner Tutorial #5 - Loops
NeuralNine
7 Python Beginner Tutorial #6 - Sequences and Collections
Python Beginner Tutorial #6 - Sequences and Collections
NeuralNine
8 Python Beginner Tutorial #7 - Functions
Python Beginner Tutorial #7 - Functions
NeuralNine
9 Python Beginner Tutorial #8 - Exception Handling
Python Beginner Tutorial #8 - Exception Handling
NeuralNine
10 Python Beginner Tutorial #9 - File Operations
Python Beginner Tutorial #9 - File Operations
NeuralNine
11 Python Beginner Tutorial #10 - String Functions
Python Beginner Tutorial #10 - String Functions
NeuralNine
12 Python Intermediate Tutorial #1 - Classes and Objects
Python Intermediate Tutorial #1 - Classes and Objects
NeuralNine
13 Python Intermediate Tutorial #2 - Inheritance
Python Intermediate Tutorial #2 - Inheritance
NeuralNine
14 Python Intermediate Tutorial #3 - Multithreading
Python Intermediate Tutorial #3 - Multithreading
NeuralNine
15 Python Intermediate Tutorial #4 - Synchronizing Threads
Python Intermediate Tutorial #4 - Synchronizing Threads
NeuralNine
16 Python Intermediate Tutorial #5 - Events and Daemon Threads
Python Intermediate Tutorial #5 - Events and Daemon Threads
NeuralNine
17 Python Intermediate Tutorial #6 - Queues
Python Intermediate Tutorial #6 - Queues
NeuralNine
18 Python Intermediate Tutorial #7 - Sockets and Network Programming
Python Intermediate Tutorial #7 - Sockets and Network Programming
NeuralNine
19 Python Intermediate Tutorial #8 - Database Programming
Python Intermediate Tutorial #8 - Database Programming
NeuralNine
20 Python Intermediate Tutorial #9 - Recursion
Python Intermediate Tutorial #9 - Recursion
NeuralNine
21 Python Intermediate Tutorial #10 - XML Processing
Python Intermediate Tutorial #10 - XML Processing
NeuralNine
22 Python Intermediate Tutorial #11 - Logging
Python Intermediate Tutorial #11 - Logging
NeuralNine
23 Python Data Science Tutorial #1 - Anaconda and PyCharm Setup
Python Data Science Tutorial #1 - Anaconda and PyCharm Setup
NeuralNine
24 Python Data Science Tutorial #2 - NumPy Arrays
Python Data Science Tutorial #2 - NumPy Arrays
NeuralNine
25 Python Data Science Tutorial #3 - Numpy Functions
Python Data Science Tutorial #3 - Numpy Functions
NeuralNine
26 Python Data Science Tutorial #4 - Plotting Functions With Matplotlib
Python Data Science Tutorial #4 - Plotting Functions With Matplotlib
NeuralNine
27 Python Data Science Tutorial #5 - Subplots and Multiple Windows
Python Data Science Tutorial #5 - Subplots and Multiple Windows
NeuralNine
28 Python Data Science Tutorial #6 - Matplotlib Styling
Python Data Science Tutorial #6 - Matplotlib Styling
NeuralNine
29 Python Data Science Tutorial #7 - Bar Charts with Matplotlib
Python Data Science Tutorial #7 - Bar Charts with Matplotlib
NeuralNine
30 Python Data Science Tutorial #8 - Pie Charts with Matplotlib
Python Data Science Tutorial #8 - Pie Charts with Matplotlib
NeuralNine
31 Python Data Science Tutorial #9 - Plotting Histograms with Matplotlib
Python Data Science Tutorial #9 - Plotting Histograms with Matplotlib
NeuralNine
32 Python Data Science Tutorial #10 - Scatter Plots with Matplotlib
Python Data Science Tutorial #10 - Scatter Plots with Matplotlib
NeuralNine
33 Python Data Science Tutorial #11 - 3D Plotting with Matplotlib
Python Data Science Tutorial #11 - 3D Plotting with Matplotlib
NeuralNine
34 Python Data Science Tutorial #12 - Pandas Series
Python Data Science Tutorial #12 - Pandas Series
NeuralNine
35 Python Data Science Tutorial #13 - Pandas Data Frames
Python Data Science Tutorial #13 - Pandas Data Frames
NeuralNine
36 Python Data Science Tutorial #14 - Pandas Statistics
Python Data Science Tutorial #14 - Pandas Statistics
NeuralNine
37 Python Data Science Tutorial #15 - Pandas Sorting and Functions
Python Data Science Tutorial #15 - Pandas Sorting and Functions
NeuralNine
38 Python Data Science Tutorial #16 - Pandas Merging Data Frames
Python Data Science Tutorial #16 - Pandas Merging Data Frames
NeuralNine
39 Python Data Science Tutorial #17 - Pandas Queries
Python Data Science Tutorial #17 - Pandas Queries
NeuralNine
40 Python Machine Learning Tutorial #1 - What is Machine Learning?
Python Machine Learning Tutorial #1 - What is Machine Learning?
NeuralNine
41 Python Machine Learning Tutorial #2 - Linear Regression
Python Machine Learning Tutorial #2 - Linear Regression
NeuralNine
42 Python Machine Learning Tutorial #3 - K-Nearest Neighbors Classification
Python Machine Learning Tutorial #3 - K-Nearest Neighbors Classification
NeuralNine
43 Python Machine Learning #4 - Support Vector Machines
Python Machine Learning #4 - Support Vector Machines
NeuralNine
44 Python Machine Learning Tutorial #5 - Decision Trees and Random Forest Classification
Python Machine Learning Tutorial #5 - Decision Trees and Random Forest Classification
NeuralNine
45 Python Machine Learning Tutorial #6 - K-Means Clustering
Python Machine Learning Tutorial #6 - K-Means Clustering
NeuralNine
46 Python Machine Learning Tutorial #7 - Neural Networks
Python Machine Learning Tutorial #7 - Neural Networks
NeuralNine
47 Python Machine Learning Tutorial #8 - Handwritten Digit Recognition with Tensorflow
Python Machine Learning Tutorial #8 - Handwritten Digit Recognition with Tensorflow
NeuralNine
48 Generating Poetic Texts with Recurrent Neural Networks in Python
Generating Poetic Texts with Recurrent Neural Networks in Python
NeuralNine
49 Stock Portfolio Visualization with Matplotlib in Python
Stock Portfolio Visualization with Matplotlib in Python
NeuralNine
50 Analyzing Coronavirus with Python (COVID-19)
Analyzing Coronavirus with Python (COVID-19)
NeuralNine
51 Making Text Images Readable Again with Python and OpenCV
Making Text Images Readable Again with Python and OpenCV
NeuralNine
52 Neural Networks Simply Explained (Theory)
Neural Networks Simply Explained (Theory)
NeuralNine
53 Motion Filtering with OpenCV in Python
Motion Filtering with OpenCV in Python
NeuralNine
54 Top 5 Programming Languages To Learn in 2020
Top 5 Programming Languages To Learn in 2020
NeuralNine
55 Simple TCP Chat Room in Python
Simple TCP Chat Room in Python
NeuralNine
56 Image Classification with Neural Networks in Python
Image Classification with Neural Networks in Python
NeuralNine
57 Edge Detection with OpenCV in Python
Edge Detection with OpenCV in Python
NeuralNine
58 S&P 500 Web Scraping with Python
S&P 500 Web Scraping with Python
NeuralNine
59 Simple Sentiment Text Analysis in Python
Simple Sentiment Text Analysis in Python
NeuralNine
60 Introduction - Algorithms & Data Structures #1
Introduction - Algorithms & Data Structures #1
NeuralNine

This video teaches how to use LangChain with OpenAI, Anthropic, and Llama to summarize PDF documents in Python, covering topics such as large language models, prompt engineering, and natural language processing. The video provides a step-by-step guide on how to define a function to summarize PDF documents, split PDF into chunks, and combine summaries into a final summary. By following this tutorial, viewers can learn how to automate summarizing PDF documents using large language models.

Key Takeaways
  1. Load API key from environment
  2. Define a function to summarize PDF documents
  3. Split PDF into chunks and summarize each chunk
  4. Combine summaries into a final summary
  5. Create a summarization chain with map-reduce
  6. Run Python 3 Main.py to generate a summary of a paper
  7. Copy Main.py to Main CLA.py to use Anthropic model
  8. Change model object in Main CLA.py to use Anthropic model
  9. Import LangChain models from different providers
💡 The video highlights the importance of using large language models for PDF summarization and demonstrates how to use LangChain with different models to achieve this goal.

Related AI Lessons

I Asked ChatGPT to Fix My Life. It Couldn’t — Until I Changed One Thing
Learn how to effectively use AI like ChatGPT to improve your life by changing your approach
Medium · AI
I Asked ChatGPT to Fix My Life. It Couldn’t — Until I Changed One Thing
Learn how to effectively use ChatGPT to solve personal problems by changing your approach
Medium · ChatGPT
Claude Sonnet 5 Is Here: Why It Might Replace Your Opus Subscription
Learn about Claude Sonnet 5, a new AI model that offers near-flagship performance at a lower price, and its potential to replace Opus subscriptions
Medium · Programming
Introducing Claude Sonnet 5 on AWS: Anthropic’s most capable Sonnet model
Learn about Claude Sonnet 5, Anthropic's most advanced Sonnet model, now available on AWS, and how it delivers top-tier intelligence for coding, agents, and professional tasks
AWS Machine Learning
Up next
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Dave Ebbelaar (LLM Eng)
Watch →