Recursively summarize text of any length with GPT-3

David Shapiro · Beginner ·🧠 Large Language Models ·4y ago

Skills: LLM Foundations90%Prompt Craft80%Prompting Basics70%Fine-tuning LLMs60%Advanced Prompting50%

Key Takeaways

The video demonstrates how to use GPT-3 for recursive summarization of text of any length, utilizing OpenAI API key and Python scripting to break down input documents into chunks and summarize them until a desired level of brevity is achieved. The process involves using tools like GPT-3, text wrap, and OpenAI API key to create an executable script for recursive summarization.

Full Transcript

hey everybody david shapiro here with a quick video well i think it's gonna be quick um i've had a few requests for something that's pretty similar um also let me make sure the sound is good okay cool you can hear me um something that's pretty similar so one person asked for something to uh like summarize um documentation another person asked for like summarizing notes of um some type you know basically creating executive summaries um this is already a solved problem but there's enough people asking for how to do this i figured why not make a video on it um so we're gonna do recursive summary summarizer um public add a readme and license mit um so basically all we're going to do is create a loop so there'll be an input document um and then uh we'll break that down into chunks using using the module text wrap and from there we will um we'll just summarize each chunk and put it together and what you can do is you can rehearse like you assemble all those chunks and you can recursively summarize it again and again until you end up with you know something that's unrecognizable basically okay so get clone summarizer okay so let's open up my c drive recursive summarizer there we go let's open another one add my get ignore and my openai api key just start with some boilerplate stuff and then in auto muse i did use a recursive summaries hey look at that um i did not use text wrap in this one i think it was in book to chunks yeah okay so import text wrap so basically import text wrap so what this does is um you give it just a block of a string and it'll break it into um chunks of strings that are more or less the same size um so let's go to recursive summarizer and let's just this will probably be like one thing so we'll just do um recursively summarize dot pi um and so then what we'll do is i'll have a let's start with a book let's see what's the shortest one we have here alice in wonderland cool we'll start with that um alice in wonderland we'll go back to here so we'll copy that we'll just call this input so that whatever you do you'll have input.txt and then output.txt so this is a technique that i've used um i had a contract a um an operating agreement for a company that i had to read and it was an 80 page document i was like i don't want to read 80 pages it was like 60 000 words 80 pages so i'd use this technique and i summarized it down to like 15 000 words so i made it a quarter as long obviously i'm not going to show you people on the public you know this private you know legal contract but i can show you the same principle and we'll go from there okay so we've got input recursively summarize text uh so on and so forth i'll copy my open file function just because it's super useful [Music] and then also save file because also useful okay so whoops if name equals main so this just says this is our main function um we want to what do we want to do who's bugging me okay sorry muted my phone um lost my train of thought right we're going to open a file let me make sure that i do it right book to chunks so chunks equals yes all right so here's here's basically what you do um uh let's see all text equals open file and we'll just do input.txt so whatever you want it'll be this name um you could make this into a command line argument thing um i personally don't like doing that kind of thing um but you're welcome to make this a command line based tool if you want oh yeah we're going to need to do this as well our open ai key okay so all text equals that and then chunks equals let me make sure i do this right text wrap there we go okay so we could just call text wrap and then dot wrap and then we put this here so we're going to do a little bit longer of chunks we're going to do 4 000 um chunks because we're just doing one summary each and then another thing that we're going to need to do is let's see result equals list so we're going to have a list of strings as the final result um [Music] and then let's see for chunk in chunks we will then summarize that so i need to grab my my gpt3 completion function and put that up here and again i just i recycle code all the time you get a function that works you just copy paste it add infinite item so then we'll do import os import um no from time import time sleep because those are two things that i need for that to work oh and then we need a prompt so we go over here let's just go to um here selection all right that's about 4 000 characters so how how do we want to how do we want to summarize this so we'll say we'll start with write a concise summary of the following and then we do concise summary and we'll leave this on point seven so that it can be creative um okay so it says in this passage roger chillingworth and reverend dimmesdale discuss the secrecy of some sinners dimmesdale argues that some men keep their secrets because they hope to redeem themselves while while chillingworth suggests that they are simply afraid of being found out the conversation is interrupted by the sound of pearl's laughter and they watch as she plays in the cemetery okay that seems good to me um let's see before roger chillingworth can answer they heard the clear wild laughter of a young child's voice yeah i'm taking a handful of these um let's so i had this idea because okay this is a good concise summary but um let's see write it write a concise summary of the following be sure to preserve preserve um important details so then we'll add concise summary with details so let's see how much how different this is so let's copy this so that we can save it so this it goes from it went from 4 000 characters to 376 so that's a factor of more than 10 um in terms of reduction but if we say let's keep some details let's see how that is hester print and her daughter pearl are walking through the burial ground when pearl starts skipping and dancing around irreverently hester doesn't stop her but merely tells her to behave more decorously brolin starts arranging burrs along the lines of the scarlet letter on hester's bosom is that really what happened that seems like it's an entirely different i don't know that any of that actually happened where did it get this burr oh okay oh interesting so it kind of it it summarized like the details in the final bit hmm okay um i don't necessarily like that summary because the first one like you look how different these are so the first one was in this passage roger chillingworth etc etc um but we miss out on the details of the burs right so it's like okay um so the wording that i used in another time was using a moderate summary of the following so moderate summary means like okay compress it but not too much so we'll just say moderate summary interesting so it's kind of ignoring the beginning in both of these because even this one this moderate summary is more like the second one so like hester prin and her daughter pearl are walking through a graveyard but then you see this one like it's almost like we need to have both of these so we want what i'm trying to do here is get something that that feels like a good summary um [Music] so let's say let's write a detailed summary um detailed summary that looks a little bit better okay so this this captures both okay so if we say detailed summary it's about twice as long but it looks like it got all of the details that we want um so let's say uh it the le the selection that we did is 838 characters so that's still more than a factor of four because we went from 4 000 characters to less than a thousand so okay so it's a quarter as long um i like that i think we're going to stick with this as as our as our prompt um okay so we're gonna do this we're gonna all right here i'll just copy all this in okay write a detailed summary of the following summary and then we kill all that and we say this is our new prompt prompt prompt dot text okay so then what we do is for each chunk here let me close some of this excessive stuff okay so for each for chunk and chunk we're going to say prompt equals open file dot text dot replace um summary summary sorry with chunk so basically what that'll do is each of these four thousand character chunks will get put in here and then we'll send it up to gpth3 to summarize and i hope i don't run out of tokens we'll see all right pardon me i went for a really long bike ride earlier so i'm still rehydrating um okay gpt3 completion [Music] um token limit 1000 yeah because uh we'll probably that's fine all right so then summary equals gpt three completion prompt okay so that'll give us our summary we'll print out the summary just so that we can watch it going and then we'll do result dot append summary um that should be good and then once it's all done we'll do let's see how is it that you join a list let's see if i can remember python um l equals one one two three and then you do was i think join l uh what i'm trying to do is like join um join it all into a string um let's see python join list of strings into single string join list of strings that's what i did expected string instance and found oh that's what i did wrong okay so we do i did do it right i just wrong data type okay so then we just add a space dot join l okay that'll out of space cool all right that's what i wanted um all right so then we do uh save file save file and then the content will be uh let's see um space dot join result and then it'll be output dot text and so that'll just um actually here let's do let's do double new line i think that'll be better because then there will be a vertical space between each um each section so we can see where the summary boundaries of the summarization happened um that'll make it a little bit easier to see i think um yeah okay so that's good and yeah okay then let's do another thing so i i sometimes do this import re and so then text will equal text equals re sub and i'll do s um white space plus so this is like if it adds in too much um too much white space so like uh oh i already closed it um if there's too much two minute too much vertical white space or too many new lines this will this will compress the output into a single single line so resub this is regex sub which is substitute so we substitute anything that is more than one white space with just a single normal space and so white space is vertical new lines tabs anything like that and then this is what we're acting on um so that'll make it nice and compressed and pretty um i think that's it i think so uh let's see how long is this gonna be because the input is 171 kilobytes and it's about four kilobytes each so how many chunks um so 171 kilobytes divided by four kilobytes so that'll be about 42 sections that shouldn't be too bad 42 instances um and really what i should do is save it as we go just so i can show you and then i need to add gpt3 logs folder so that's where i have this function save it out to you see right here gpt3 logs um let's go ahead and run it heck with it let's see how it does um cd recursive summarizer python recursive recursively summarize open ai is not defined what do you mean i gotta import openai always forget something import open ai and away it goes so i can probably make these chunks a little bit longer like five thousand um there we go she escapes by climbing a tree excellent alice falls down a rabbit hole and finds herself in a long dark tunnel yeah okay this is great um all right so this is running it looks like it's doing just fine i'm gonna go ahead and pause the video so you don't have to watch this run through you know 40 40 iterations or whatever but it looks like it's doing pretty well so let's let's pause it and then we'll be back in just a second okay and we're back it didn't take too terribly long but we're done it was 42 um 42 chunks total and i predicted 42.75 so spot on um when it's encoded as utf-8 it's about one to one in terms of characters and um and uh or a thousand characters is roughly one kilobyte uh put it that way um okay so all that being said um here's the output so you see where we've got the double new line so you can kind of see each each section um and all told the length is 45 000 characters um this is alice in wonderland and the input was 174 000 characters so we go from um let's see 174 down to 45 so let's see 45 divided by what did i just say 174 okay so that's right that's almost exactly a quarter um so it's a quarter the length um you could do this with anything so like i said i've gotten questions about like can you do this with um with uh uh like text uh like um academic texts yes you can do this with uh academic text legal contracts um works of fiction whatever you want and it will summarize it uh pretty pretty concisely um once you get to the end you see like you know it's basically just summarizing you know gutenberg um so on and so forth uh but yeah up until that point it's nice and nice and concise um alice falls asleep by a river and has a curious dream in which he's put on trial for stealing the queen's tarts the evidence against her is entirely circumstantial but the jury finds her guilty and she is sentenced to death however before the sins can be carried out alice wakes up and realizes it was all just a dream alice is sitting on the riverbank with her sister and she notices a white rabbit running by she follows the rabbit down a hole finds herself in wonderland she has a series of adventures looks like it's repeating the end um interesting uh yeah so there you have it though that's that's pretty much all there is to it um i'll i guess we'll just do a get status get add git commit am um done and done and get push so yeah feel free to use this um what you can do because i already hear people like asking about word documents or pdfs and all that i've done is python or powershell or whatever you just save those as dot text files and that works just fine um so like basically it'll it'll just remove all the formatting um because gpt3 doesn't understand the xml background of a of a word document microsoft word or how to read a pdf file it only reads plain text but even then you'll still see that like it'll do a pretty good job you could change the prompt to like change this to d um change it back to like concise um and it'll get even shorter you'll get a factor of like 10 to one um but what i what you know at the beginning of this video i showed you you're at risk of losing important details if you say concise summary and then what you could do so say for instance um you wanted to run this again i could you could modify the script to like run it again so that you just treat the output as the next input and then you could you could uh you could make it even shorter i'm not going to worry about that right now because literally all you would do if you want to try this you just copy the output to the input whoops and then just run it again um or you add another loop again i'm not going to worry about that you can play with that if you want to but yeah there you have it um i think i think i'll call it a day thanks for watching

Original Description

The Kickstarter for my Post-Labor Economics book is live! https://www.kickstarter.com/projects/daveshap/labor-zero

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from David Shapiro · David Shapiro · 31 of 60

← Previous Next →

Raven MVP Demo 2021-04-02

Raven MVP Demo 2021-04-02

Get Started with Raven AGI

Get Started with Raven AGI

Coding Raven's Encyclopedia Service (v.1)

Coding Raven's Encyclopedia Service (v.1)

Prototype AGI demo - Natural Language Cognitive Architecture "NLCA" running on GPT-3

Prototype AGI demo - Natural Language Cognitive Architecture "NLCA" running on GPT-3

Raven Release 1 Deep Dive

Raven Release 1 Deep Dive

Fine-tuning GPT-3 to generate questions about anything

Fine-tuning GPT-3 to generate questions about anything

Fine-tuning GPT-3 for benevolent and trustworthy AGI

Fine-tuning GPT-3 for benevolent and trustworthy AGI

Implementing Natural Language Cognitive Architecture with GPT-3 and the "nexus" concept

Implementing Natural Language Cognitive Architecture with GPT-3 and the "nexus" concept

5 Tips and Misconceptions about Finetuning GPT-3

5 Tips and Misconceptions about Finetuning GPT-3

How to create synthetic datasets with GPT-3

How to create synthetic datasets with GPT-3

What is a heuristic imperative? What imperatives should we give AGI?

What is a heuristic imperative? What imperatives should we give AGI?

Talking Philosophy with GPT-3

Talking Philosophy with GPT-3

Talking Boundaries and Consent with GPT-3

Talking Boundaries and Consent with GPT-3

Convergence and acceleration towards AGI (or Artificial Cognitive Entities)

Convergence and acceleration towards AGI (or Artificial Cognitive Entities)

GPT-3 for Writing Dialog

GPT-3 for Writing Dialog

Co-writing flash fiction with GPT-3

Co-writing flash fiction with GPT-3

From zero to finetuned model in 1 hour with GPT-3. Generate a movie script from any premise!

From zero to finetuned model in 1 hour with GPT-3. Generate a movie script from any premise!

GPT-3 Working Session: Finetune an information companion chatbot in 30 minutes (RESEARCH ONLY)

GPT-3 Working Session: Finetune an information companion chatbot in 30 minutes (RESEARCH ONLY)

What is "toxic stoicism"? Talking philosophy with GPT-3

What is "toxic stoicism"? Talking philosophy with GPT-3

Billion-dollar GPT-3 startup! Fix education with an expert tutor chatbot!

Billion-dollar GPT-3 startup! Fix education with an expert tutor chatbot!

Finetune GPT-3 to write an entire coherent novel (part 1)

Finetune GPT-3 to write an entire coherent novel (part 1)

Concepts in Neuroscience and Cognition - Deficits of GPT-3 and the path to AGI and ACE

Concepts in Neuroscience and Cognition - Deficits of GPT-3 and the path to AGI and ACE

Finetuning GPT-3 to be a master tutor that can handle any topic and hostile students

Finetuning GPT-3 to be a master tutor that can handle any topic and hostile students

Testing "Theory of Mind" in GPT-3 - making fully aligned ACOG (Artificial Cognitive Entities)

Testing "Theory of Mind" in GPT-3 - making fully aligned ACOG (Artificial Cognitive Entities)

Finetune GPT-3 to write an entire coherent novel (part 2)

Finetune GPT-3 to write an entire coherent novel (part 2)

Finetune multiple cognitive tasks with GPT-3 on medical texts (and reduce hallucination)

Finetune multiple cognitive tasks with GPT-3 on medical texts (and reduce hallucination)

Finetune GPT-3 to write a novel - Part 3 (IT WORKS!!!) ...at least a little bit

Finetune GPT-3 to write a novel - Part 3 (IT WORKS!!!) ...at least a little bit

How will we know when we've invented AGI? How will we know it is complete?

How will we know when we've invented AGI? How will we know it is complete?

Finetuning a Creative Writing Coach in GPT-3 - Part 1

Finetuning a Creative Writing Coach in GPT-3 - Part 1

Finetune GPT-3 to write a coherent novel - Part 4 (success! with minor bugs...)

Finetune GPT-3 to write a coherent novel - Part 4 (success! with minor bugs...)

Recursively summarize text of any length with GPT-3

Recursively summarize text of any length with GPT-3

Finetuning a Creative Writing Coach in GPT-3 - Part 2

Finetuning a Creative Writing Coach in GPT-3 - Part 2

Increasingly Verbose Bot with GPT-3 - Expand any word or phrase into a whole paragraph

Increasingly Verbose Bot with GPT-3 - Expand any word or phrase into a whole paragraph

Metaprompting with GPT-3 to dynamically generate arguments

Metaprompting with GPT-3 to dynamically generate arguments

I'm taking a short break from research and YouTube

I'm taking a short break from research and YouTube

Are LaMDA or GPT-3 sentient? No, but...

Are LaMDA or GPT-3 sentient? No, but...

Can GPT-3 generate training data? Short answer? Yes! Here's why that's a legit methodology...

Can GPT-3 generate training data? Short answer? Yes! Here's why that's a legit methodology...

DALLE2 Style Tags Tutorial - "Elven archer in a sunny forest" with different tags

DALLE2 Style Tags Tutorial - "Elven archer in a sunny forest" with different tags

Many of you have asked for it: Join my new research Discord! Link in description

Many of you have asked for it: Join my new research Discord! Link in description

Answer complex questions from an arbitrarily large set of documents with vector search and GPT-3

Answer complex questions from an arbitrarily large set of documents with vector search and GPT-3

Fixing "goldfish memory" with GPT-3 and external sources of information in a chatbot - part 1

Fixing "goldfish memory" with GPT-3 and external sources of information in a chatbot - part 1

Fixing "goldfish memory" with GPT-3 and external sources of information in a chatbot - part 2

Fixing "goldfish memory" with GPT-3 and external sources of information in a chatbot - part 2

Python & GPT-3 for Absolute Beginners #1 - Setting up your environment

Python & GPT-3 for Absolute Beginners #1 - Setting up your environment

Python & GPT-3 for Absolute Beginners #2 - Your first chatbot

Python & GPT-3 for Absolute Beginners #2 - Your first chatbot

Python & GPT-3 for Absolute Beginners #3 - What the heck are embeddings?

Python & GPT-3 for Absolute Beginners #3 - What the heck are embeddings?

Introducing the RAVEN MVP - a general purpose AI companion (with a live DEMO)

Introducing the RAVEN MVP - a general purpose AI companion (with a live DEMO)

I needed SQLITE but for vectors so I wrote it myself. Now it's on PyPI - introducing VDBLITE

I needed SQLITE but for vectors so I wrote it myself. Now it's on PyPI - introducing VDBLITE

Prompt Engineering 101: Autocomplete, Zero-shot, One-shot, and Few-shot prompting

Prompt Engineering 101: Autocomplete, Zero-shot, One-shot, and Few-shot prompting

Prompt Engineering 101: Introduction to CODEX

Prompt Engineering 101: Introduction to CODEX

Prompt Engineering 101: Summarizing, Extraction, and Rewriting

Prompt Engineering 101: Summarizing, Extraction, and Rewriting

Summarize product reviews with GPT-3 fast and easy, get product insights and improvements fast!

Summarize product reviews with GPT-3 fast and easy, get product insights and improvements fast!

Finetuning GPT-3 101: Synthesizing Training Data

Finetuning GPT-3 101: Synthesizing Training Data

Finetuning GPT-3 101: Augmenting Training Data

Finetuning GPT-3 101: Augmenting Training Data

Finetuning GPT-3 101: Using Your Finetuned Model

Finetuning GPT-3 101: Using Your Finetuned Model

Modeling different viewpoints with GPT-3 for automatic debates

Modeling different viewpoints with GPT-3 for automatic debates

Finetune a perfect email generator in GPT-3 - take any input and generate a great email

Finetune a perfect email generator in GPT-3 - take any input and generate a great email

Research Update: Nexus microservice for Artificial Cognition + microservices architecture (MARAGI)

Research Update: Nexus microservice for Artificial Cognition + microservices architecture (MARAGI)

Research Update: Microservices! Text-based simulation, Embeddings, and Nexus

Research Update: Microservices! Text-based simulation, Embeddings, and Nexus

It's alive! The first 3 microservices are up and running!

It's alive! The first 3 microservices are up and running!

What is a Microservice? What does it have to do with AGI?

What is a Microservice? What does it have to do with AGI?

This video teaches how to use GPT-3 for recursive summarization of text of any length, allowing users to create concise summaries of large documents. The process involves breaking down input documents into chunks, summarizing each chunk, and recursively summarizing the output until a desired level of brevity is achieved. By following the steps outlined in the video, users can create executable scripts for text summarization and modify them to change the level of conciseness.

Key Takeaways

Open a file for input
Break down the input document into chunks using text wrap
Summarize each chunk
Recursively summarize the output until desired level of brevity
Save the final result to a file
Import necessary libraries
Define a prompt for GPT-3
Write a concise summary
Send each chunk to GPT-3 for summarization
Print out the summary

💡 The video demonstrates how to use GPT-3 for recursive summarization of text of any length, allowing users to create concise summaries of large documents and modify the scripts to change the level of conciseness.

🔒 Pro feature: Ask AI to explain this lesson →

More on: LLM Foundations

View skill →

Getting Started with Vertex AI Gemini 1.5 Flash

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

How to use the ChatGPT API with Python!!

How to use the ChatGPT API with Python!!

Nicholas Renotte

Gemini 2.5: Create an interactive plot of economic data

Gemini 2.5: Create an interactive plot of economic data

Google DeepMind

LangChain Chatbots: Building a Personalized AI Assistant

LangChain Chatbots: Building a Personalized AI Assistant

Analytics Vidhya

Auto-generating meeting notes with Python

Auto-generating meeting notes with Python

Related AI Lessons

Sub-10ms AI Workflows: Accelerating sim.ai with On-Device Semantic Search using Moss

Learn how to accelerate AI workflows with on-device semantic search using Moss, achieving sub-10ms response times and improving user experience

Medium · Machine Learning

Stop Guessing: Guaranteed Structured Output from LLMs in Node.js

Learn to guarantee structured output from LLMs in Node.js and stop parsing JSON manually

Dev.to · Hardik Mehta

Spring AI Tutorial — Your First REST Endpoint with OpenAI (2026)

Build a REST endpoint with Spring Boot 3 and OpenAI to create an LLM-powered API, leveraging the power of AI in your applications

Notes: Memory, Context, and Large Language Models (LLMs)

Learn how memory and context work in Large Language Models (LLMs) and potential improvements

Dev.to · Vladimir Panov

5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems

Dave Ebbelaar (LLM Eng)