Recursively summarize text of any length with GPT-3

David Shapiro · Beginner ·🧠 Large Language Models ·4y ago

Key Takeaways

The video demonstrates how to use GPT-3 for recursive summarization of text of any length, utilizing OpenAI API key and Python scripting to break down input documents into chunks and summarize them until a desired level of brevity is achieved. The process involves using tools like GPT-3, text wrap, and OpenAI API key to create an executable script for recursive summarization.

Full Transcript

hey everybody david shapiro here with a quick video well i think it's gonna be quick um i've had a few requests for something that's pretty similar um also let me make sure the sound is good okay cool you can hear me um something that's pretty similar so one person asked for something to uh like summarize um documentation another person asked for like summarizing notes of um some type you know basically creating executive summaries um this is already a solved problem but there's enough people asking for how to do this i figured why not make a video on it um so we're gonna do recursive summary summarizer um public add a readme and license mit um so basically all we're going to do is create a loop so there'll be an input document um and then uh we'll break that down into chunks using using the module text wrap and from there we will um we'll just summarize each chunk and put it together and what you can do is you can rehearse like you assemble all those chunks and you can recursively summarize it again and again until you end up with you know something that's unrecognizable basically okay so get clone summarizer okay so let's open up my c drive recursive summarizer there we go let's open another one add my get ignore and my openai api key just start with some boilerplate stuff and then in auto muse i did use a recursive summaries hey look at that um i did not use text wrap in this one i think it was in book to chunks yeah okay so import text wrap so basically import text wrap so what this does is um you give it just a block of a string and it'll break it into um chunks of strings that are more or less the same size um so let's go to recursive summarizer and let's just this will probably be like one thing so we'll just do um recursively summarize dot pi um and so then what we'll do is i'll have a let's start with a book let's see what's the shortest one we have here alice in wonderland cool we'll start with that um alice in wonderland we'll go back to here so we'll copy that we'll just call this input so that whatever you do you'll have input.txt and then output.txt so this is a technique that i've used um i had a contract a um an operating agreement for a company that i had to read and it was an 80 page document i was like i don't want to read 80 pages it was like 60 000 words 80 pages so i'd use this technique and i summarized it down to like 15 000 words so i made it a quarter as long obviously i'm not going to show you people on the public you know this private you know legal contract but i can show you the same principle and we'll go from there okay so we've got input recursively summarize text uh so on and so forth i'll copy my open file function just because it's super useful [Music] and then also save file because also useful okay so whoops if name equals main so this just says this is our main function um we want to what do we want to do who's bugging me okay sorry muted my phone um lost my train of thought right we're going to open a file let me make sure that i do it right book to chunks so chunks equals yes all right so here's here's basically what you do um uh let's see all text equals open file and we'll just do input.txt so whatever you want it'll be this name um you could make this into a command line argument thing um i personally don't like doing that kind of thing um but you're welcome to make this a command line based tool if you want oh yeah we're going to need to do this as well our open ai key okay so all text equals that and then chunks equals let me make sure i do this right text wrap there we go okay so we could just call text wrap and then dot wrap and then we put this here so we're going to do a little bit longer of chunks we're going to do 4 000 um chunks because we're just doing one summary each and then another thing that we're going to need to do is let's see result equals list so we're going to have a list of strings as the final result um [Music] and then let's see for chunk in chunks we will then summarize that so i need to grab my my gpt3 completion function and put that up here and again i just i recycle code all the time you get a function that works you just copy paste it add infinite item so then we'll do import os import um no from time import time sleep because those are two things that i need for that to work oh and then we need a prompt so we go over here let's just go to um here selection all right that's about 4 000 characters so how how do we want to how do we want to summarize this so we'll say we'll start with write a concise summary of the following and then we do concise summary and we'll leave this on point seven so that it can be creative um okay so it says in this passage roger chillingworth and reverend dimmesdale discuss the secrecy of some sinners dimmesdale argues that some men keep their secrets because they hope to redeem themselves while while chillingworth suggests that they are simply afraid of being found out the conversation is interrupted by the sound of pearl's laughter and they watch as she plays in the cemetery okay that seems good to me um let's see before roger chillingworth can answer they heard the clear wild laughter of a young child's voice yeah i'm taking a handful of these um let's so i had this idea because okay this is a good concise summary but um let's see write it write a concise summary of the following be sure to preserve preserve um important details so then we'll add concise summary with details so let's see how much how different this is so let's copy this so that we can save it so this it goes from it went from 4 000 characters to 376 so that's a factor of more than 10 um in terms of reduction but if we say let's keep some details let's see how that is hester print and her daughter pearl are walking through the burial ground when pearl starts skipping and dancing around irreverently hester doesn't stop her but merely tells her to behave more decorously brolin starts arranging burrs along the lines of the scarlet letter on hester's bosom is that really what happened that seems like it's an entirely different i don't know that any of that actually happened where did it get this burr oh okay oh interesting so it kind of it it summarized like the details in the final bit hmm okay um i don't necessarily like that summary because the first one like you look how different these are so the first one was in this passage roger chillingworth etc etc um but we miss out on the details of the burs right so it's like okay um so the wording that i used in another time was using a moderate summary of the following so moderate summary means like okay compress it but not too much so we'll just say moderate summary interesting so it's kind of ignoring the beginning in both of these because even this one this moderate summary is more like the second one so like hester prin and her daughter pearl are walking through a graveyard but then you see this one like it's almost like we need to have both of these so we want what i'm trying to do here is get something that that feels like a good summary um [Music] so let's say let's write a detailed summary um detailed summary that looks a little bit better okay so this this captures both okay so if we say detailed summary it's about twice as long but it looks like it got all of the details that we want um so let's say uh it the le the selection that we did is 838 characters so that's still more than a factor of four because we went from 4 000 characters to less than a thousand so okay so it's a quarter as long um i like that i think we're going to stick with this as as our as our prompt um okay so we're gonna do this we're gonna all right here i'll just copy all this in okay write a detailed summary of the following summary and then we kill all that and we say this is our new prompt prompt prompt dot text okay so then what we do is for each chunk here let me close some of this excessive stuff okay so for each for chunk and chunk we're going to say prompt equals open file dot text dot replace um summary summary sorry with chunk so basically what that'll do is each of these four thousand character chunks will get put in here and then we'll send it up to gpth3 to summarize and i hope i don't run out of tokens we'll see all right pardon me i went for a really long bike ride earlier so i'm still rehydrating um okay gpt3 completion [Music] um token limit 1000 yeah because uh we'll probably that's fine all right so then summary equals gpt three completion prompt okay so that'll give us our summary we'll print out the summary just so that we can watch it going and then we'll do result dot append summary um that should be good and then once it's all done we'll do let's see how is it that you join a list let's see if i can remember python um l equals one one two three and then you do was i think join l uh what i'm trying to do is like join um join it all into a string um let's see python join list of strings into single string join list of strings that's what i did expected string instance and found oh that's what i did wrong okay so we do i did do it right i just wrong data type okay so then we just add a space dot join l okay that'll out of space cool all right that's what i wanted um all right so then we do uh save file save file and then the content will be uh let's see um space dot join result and then it'll be output dot text and so that'll just um actually here let's do let's do double new line i think that'll be better because then there will be a vertical space between each um each section so we can see where the summary boundaries of the summarization happened um that'll make it a little bit easier to see i think um yeah okay so that's good and yeah okay then let's do another thing so i i sometimes do this import re and so then text will equal text equals re sub and i'll do s um white space plus so this is like if it adds in too much um too much white space so like uh oh i already closed it um if there's too much two minute too much vertical white space or too many new lines this will this will compress the output into a single single line so resub this is regex sub which is substitute so we substitute anything that is more than one white space with just a single normal space and so white space is vertical new lines tabs anything like that and then this is what we're acting on um so that'll make it nice and compressed and pretty um i think that's it i think so uh let's see how long is this gonna be because the input is 171 kilobytes and it's about four kilobytes each so how many chunks um so 171 kilobytes divided by four kilobytes so that'll be about 42 sections that shouldn't be too bad 42 instances um and really what i should do is save it as we go just so i can show you and then i need to add gpt3 logs folder so that's where i have this function save it out to you see right here gpt3 logs um let's go ahead and run it heck with it let's see how it does um cd recursive summarizer python recursive recursively summarize open ai is not defined what do you mean i gotta import openai always forget something import open ai and away it goes so i can probably make these chunks a little bit longer like five thousand um there we go she escapes by climbing a tree excellent alice falls down a rabbit hole and finds herself in a long dark tunnel yeah okay this is great um all right so this is running it looks like it's doing just fine i'm gonna go ahead and pause the video so you don't have to watch this run through you know 40 40 iterations or whatever but it looks like it's doing pretty well so let's let's pause it and then we'll be back in just a second okay and we're back it didn't take too terribly long but we're done it was 42 um 42 chunks total and i predicted 42.75 so spot on um when it's encoded as utf-8 it's about one to one in terms of characters and um and uh or a thousand characters is roughly one kilobyte uh put it that way um okay so all that being said um here's the output so you see where we've got the double new line so you can kind of see each each section um and all told the length is 45 000 characters um this is alice in wonderland and the input was 174 000 characters so we go from um let's see 174 down to 45 so let's see 45 divided by what did i just say 174 okay so that's right that's almost exactly a quarter um so it's a quarter the length um you could do this with anything so like i said i've gotten questions about like can you do this with um with uh uh like text uh like um academic texts yes you can do this with uh academic text legal contracts um works of fiction whatever you want and it will summarize it uh pretty pretty concisely um once you get to the end you see like you know it's basically just summarizing you know gutenberg um so on and so forth uh but yeah up until that point it's nice and nice and concise um alice falls asleep by a river and has a curious dream in which he's put on trial for stealing the queen's tarts the evidence against her is entirely circumstantial but the jury finds her guilty and she is sentenced to death however before the sins can be carried out alice wakes up and realizes it was all just a dream alice is sitting on the riverbank with her sister and she notices a white rabbit running by she follows the rabbit down a hole finds herself in wonderland she has a series of adventures looks like it's repeating the end um interesting uh yeah so there you have it though that's that's pretty much all there is to it um i'll i guess we'll just do a get status get add git commit am um done and done and get push so yeah feel free to use this um what you can do because i already hear people like asking about word documents or pdfs and all that i've done is python or powershell or whatever you just save those as dot text files and that works just fine um so like basically it'll it'll just remove all the formatting um because gpt3 doesn't understand the xml background of a of a word document microsoft word or how to read a pdf file it only reads plain text but even then you'll still see that like it'll do a pretty good job you could change the prompt to like change this to d um change it back to like concise um and it'll get even shorter you'll get a factor of like 10 to one um but what i what you know at the beginning of this video i showed you you're at risk of losing important details if you say concise summary and then what you could do so say for instance um you wanted to run this again i could you could modify the script to like run it again so that you just treat the output as the next input and then you could you could uh you could make it even shorter i'm not going to worry about that right now because literally all you would do if you want to try this you just copy the output to the input whoops and then just run it again um or you add another loop again i'm not going to worry about that you can play with that if you want to but yeah there you have it um i think i think i'll call it a day thanks for watching

Original Description

The Kickstarter for my Post-Labor Economics book is live! https://www.kickstarter.com/projects/daveshap/labor-zero
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from David Shapiro · David Shapiro · 31 of 60

1 Raven MVP Demo 2021-04-02
Raven MVP Demo 2021-04-02
David Shapiro
2 Get Started with Raven AGI
Get Started with Raven AGI
David Shapiro
3 Coding Raven's Encyclopedia Service (v.1)
Coding Raven's Encyclopedia Service (v.1)
David Shapiro
4 Prototype AGI demo - Natural Language Cognitive Architecture "NLCA" running on GPT-3
Prototype AGI demo - Natural Language Cognitive Architecture "NLCA" running on GPT-3
David Shapiro
5 Raven Release 1 Deep Dive
Raven Release 1 Deep Dive
David Shapiro
6 Fine-tuning GPT-3 to generate questions about anything
Fine-tuning GPT-3 to generate questions about anything
David Shapiro
7 Fine-tuning GPT-3 for benevolent and trustworthy AGI
Fine-tuning GPT-3 for benevolent and trustworthy AGI
David Shapiro
8 Implementing Natural Language Cognitive Architecture with GPT-3 and the "nexus" concept
Implementing Natural Language Cognitive Architecture with GPT-3 and the "nexus" concept
David Shapiro
9 5 Tips and Misconceptions about Finetuning GPT-3
5 Tips and Misconceptions about Finetuning GPT-3
David Shapiro
10 How to create synthetic datasets with GPT-3
How to create synthetic datasets with GPT-3
David Shapiro
11 What is a heuristic imperative? What imperatives should we give AGI?
What is a heuristic imperative? What imperatives should we give AGI?
David Shapiro
12 Talking Philosophy with GPT-3
Talking Philosophy with GPT-3
David Shapiro
13 Talking Boundaries and Consent with GPT-3
Talking Boundaries and Consent with GPT-3
David Shapiro
14 Convergence and acceleration towards AGI (or Artificial Cognitive Entities)
Convergence and acceleration towards AGI (or Artificial Cognitive Entities)
David Shapiro
15 GPT-3 for Writing Dialog
GPT-3 for Writing Dialog
David Shapiro
16 Co-writing flash fiction with GPT-3
Co-writing flash fiction with GPT-3
David Shapiro
17 From zero to finetuned model in 1 hour with GPT-3. Generate a movie script from any premise!
From zero to finetuned model in 1 hour with GPT-3. Generate a movie script from any premise!
David Shapiro
18 GPT-3 Working Session: Finetune an information companion chatbot in 30 minutes (RESEARCH ONLY)
GPT-3 Working Session: Finetune an information companion chatbot in 30 minutes (RESEARCH ONLY)
David Shapiro
19 What is "toxic stoicism"? Talking philosophy with GPT-3
What is "toxic stoicism"? Talking philosophy with GPT-3
David Shapiro
20 Billion-dollar GPT-3 startup! Fix education with an expert tutor chatbot!
Billion-dollar GPT-3 startup! Fix education with an expert tutor chatbot!
David Shapiro
21 Finetune GPT-3 to write an entire coherent novel (part 1)
Finetune GPT-3 to write an entire coherent novel (part 1)
David Shapiro
22 Concepts in Neuroscience and Cognition - Deficits of GPT-3 and the path to AGI and ACE
Concepts in Neuroscience and Cognition - Deficits of GPT-3 and the path to AGI and ACE
David Shapiro
23 Finetuning GPT-3 to be a master tutor that can handle any topic and hostile students
Finetuning GPT-3 to be a master tutor that can handle any topic and hostile students
David Shapiro
24 Testing "Theory of Mind" in GPT-3 - making fully aligned ACOG (Artificial Cognitive Entities)
Testing "Theory of Mind" in GPT-3 - making fully aligned ACOG (Artificial Cognitive Entities)
David Shapiro
25 Finetune GPT-3 to write an entire coherent novel (part 2)
Finetune GPT-3 to write an entire coherent novel (part 2)
David Shapiro
26 Finetune multiple cognitive tasks with GPT-3 on medical texts (and reduce hallucination)
Finetune multiple cognitive tasks with GPT-3 on medical texts (and reduce hallucination)
David Shapiro
27 Finetune GPT-3 to write a novel - Part 3 (IT WORKS!!!) ...at least a little bit
Finetune GPT-3 to write a novel - Part 3 (IT WORKS!!!) ...at least a little bit
David Shapiro
28 How will we know when we've invented AGI? How will we know it is complete?
How will we know when we've invented AGI? How will we know it is complete?
David Shapiro
29 Finetuning a Creative Writing Coach in GPT-3 - Part 1
Finetuning a Creative Writing Coach in GPT-3 - Part 1
David Shapiro
30 Finetune GPT-3 to write a coherent novel - Part 4 (success! with minor bugs...)
Finetune GPT-3 to write a coherent novel - Part 4 (success! with minor bugs...)
David Shapiro
Recursively summarize text of any length with GPT-3
Recursively summarize text of any length with GPT-3
David Shapiro
32 Finetuning a Creative Writing Coach in GPT-3 - Part 2
Finetuning a Creative Writing Coach in GPT-3 - Part 2
David Shapiro
33 Increasingly Verbose Bot with GPT-3 - Expand any word or phrase into a whole paragraph
Increasingly Verbose Bot with GPT-3 - Expand any word or phrase into a whole paragraph
David Shapiro
34 Metaprompting with GPT-3 to dynamically generate arguments
Metaprompting with GPT-3 to dynamically generate arguments
David Shapiro
35 I'm taking a short break from research and YouTube
I'm taking a short break from research and YouTube
David Shapiro
36 Are LaMDA or GPT-3 sentient? No, but...
Are LaMDA or GPT-3 sentient? No, but...
David Shapiro
37 Can GPT-3 generate training data? Short answer? Yes! Here's why that's a legit methodology...
Can GPT-3 generate training data? Short answer? Yes! Here's why that's a legit methodology...
David Shapiro
38 DALLE2 Style Tags Tutorial - "Elven archer in a sunny forest" with different tags
DALLE2 Style Tags Tutorial - "Elven archer in a sunny forest" with different tags
David Shapiro
39 Many of you have asked for it: Join my new research Discord! Link in description
Many of you have asked for it: Join my new research Discord! Link in description
David Shapiro
40 Answer complex questions from an arbitrarily large set of documents with vector search and GPT-3
Answer complex questions from an arbitrarily large set of documents with vector search and GPT-3
David Shapiro
41 Fixing "goldfish memory" with GPT-3 and external sources of information in a chatbot - part 1
Fixing "goldfish memory" with GPT-3 and external sources of information in a chatbot - part 1
David Shapiro
42 Fixing "goldfish memory" with GPT-3 and external sources of information in a chatbot - part 2
Fixing "goldfish memory" with GPT-3 and external sources of information in a chatbot - part 2
David Shapiro
43 Python & GPT-3 for Absolute Beginners #1 - Setting up your environment
Python & GPT-3 for Absolute Beginners #1 - Setting up your environment
David Shapiro
44 Python & GPT-3 for Absolute Beginners #2 - Your first chatbot
Python & GPT-3 for Absolute Beginners #2 - Your first chatbot
David Shapiro
45 Python & GPT-3 for Absolute Beginners #3 - What the heck are embeddings?
Python & GPT-3 for Absolute Beginners #3 - What the heck are embeddings?
David Shapiro
46 Introducing the RAVEN MVP - a general purpose AI companion (with a live DEMO)
Introducing the RAVEN MVP - a general purpose AI companion (with a live DEMO)
David Shapiro
47 I needed SQLITE but for vectors so I wrote it myself. Now it's on PyPI - introducing VDBLITE
I needed SQLITE but for vectors so I wrote it myself. Now it's on PyPI - introducing VDBLITE
David Shapiro
48 Prompt Engineering 101: Autocomplete, Zero-shot, One-shot, and Few-shot prompting
Prompt Engineering 101: Autocomplete, Zero-shot, One-shot, and Few-shot prompting
David Shapiro
49 Prompt Engineering 101: Introduction to CODEX
Prompt Engineering 101: Introduction to CODEX
David Shapiro
50 Prompt Engineering 101: Summarizing, Extraction, and Rewriting
Prompt Engineering 101: Summarizing, Extraction, and Rewriting
David Shapiro
51 Summarize product reviews with GPT-3 fast and easy, get product insights and improvements fast!
Summarize product reviews with GPT-3 fast and easy, get product insights and improvements fast!
David Shapiro
52 Finetuning GPT-3 101: Synthesizing Training Data
Finetuning GPT-3 101: Synthesizing Training Data
David Shapiro
53 Finetuning GPT-3 101: Augmenting Training Data
Finetuning GPT-3 101: Augmenting Training Data
David Shapiro
54 Finetuning GPT-3 101: Using Your Finetuned Model
Finetuning GPT-3 101: Using Your Finetuned Model
David Shapiro
55 Modeling different viewpoints with GPT-3 for automatic debates
Modeling different viewpoints with GPT-3 for automatic debates
David Shapiro
56 Finetune a perfect email generator in GPT-3 - take any input and generate a great email
Finetune a perfect email generator in GPT-3 - take any input and generate a great email
David Shapiro
57 Research Update: Nexus microservice for Artificial Cognition + microservices architecture (MARAGI)
Research Update: Nexus microservice for Artificial Cognition + microservices architecture (MARAGI)
David Shapiro
58 Research Update: Microservices! Text-based simulation, Embeddings, and Nexus
Research Update: Microservices! Text-based simulation, Embeddings, and Nexus
David Shapiro
59 It's alive! The first 3 microservices are up and running!
It's alive! The first 3 microservices are up and running!
David Shapiro
60 What is a Microservice? What does it have to do with AGI?
What is a Microservice? What does it have to do with AGI?
David Shapiro

This video teaches how to use GPT-3 for recursive summarization of text of any length, allowing users to create concise summaries of large documents. The process involves breaking down input documents into chunks, summarizing each chunk, and recursively summarizing the output until a desired level of brevity is achieved. By following the steps outlined in the video, users can create executable scripts for text summarization and modify them to change the level of conciseness.

Key Takeaways
  1. Open a file for input
  2. Break down the input document into chunks using text wrap
  3. Summarize each chunk
  4. Recursively summarize the output until desired level of brevity
  5. Save the final result to a file
  6. Import necessary libraries
  7. Define a prompt for GPT-3
  8. Write a concise summary
  9. Send each chunk to GPT-3 for summarization
  10. Print out the summary
💡 The video demonstrates how to use GPT-3 for recursive summarization of text of any length, allowing users to create concise summaries of large documents and modify the scripts to change the level of conciseness.

Related AI Lessons

Claude AI vs ChatGPT: Which One Is Actually Better in 2026?
Compare Claude AI and ChatGPT based on real-world usage and benchmarking to determine which one is better in 2026
Medium · AI
Claude AI vs ChatGPT: Which One Is Actually Better in 2026?
Compare Claude AI and ChatGPT to determine which AI model is better for your needs in 2026
Medium · Programming
IntelliBooks: Classic RAG vs Graph RAG vs Agentic RAG – Choosing the Right AI Retrieval Architecture for Enterprise AI
Learn to choose the right AI retrieval architecture for enterprise AI between Classic RAG, Graph RAG, and Agentic RAG
Dev.to AI
Fluid, natural voice translation with Gemini 3.5 Live Translate
Learn about Gemini 3.5 Live Translate, a new voice translation technology that enables fluid and natural conversations across languages
Dev.to AI
Up next
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Dave Ebbelaar (LLM Eng)
Watch →