Building a Thought Summarization App with Whisper and GPT3
Key Takeaways
The video demonstrates how to build a thought summarization app using Whisper and GPT3, leveraging Python, pi audio, and wave packages for audio handling, and OpenAI's API key for GPT3 access. The app allows users to record or upload an audio file, which is then transcribed and summarized using Whisper and GPT3.
Full Transcript
I like taking notes but writing can be slow for example when you're brainstorming an idea you want to be able to register your thoughts as they appear without having to stop to write it down that's why a lot of writers record their thoughts on their brainstorm because they want to avoid that delay that could potentially kill the idea a delay that doesn't take into account the free flow dynamics of thinking itself but even when you record your thoughts you still have to go back to that audio file and listen to it so you potentially have to go through hours and hours of useless words and rambling thoughts just to find that one idea you are looking for since writing is part of my daily routine I decided to take some time and think about how I could fix this issue of thought delay my idea is simple I'm gonna write an app that allows me to automatically transcribe and summarize my thoughts as captured in audio files so as I can review them later and just go straight into the main ideas and avoid having to parse through potentially hours and hours of useless information so to write this app we're going to need the Python programming language the pi audio in wave packages to handle the recordings from the microphone in our computer we're going to need whisper from openai which is a state-of-the-art speech to text model that's going to handle the transcription of our recordings we're going to use gpd3 the large language model also released by openai which will allow us to do text summarization on the transcriptions of those recordings so let's get started so let's get started writing our app uh to start things off let's import the packages if you want to get the setup installation for this project check out my medium article on this topic the link is in the description and also the link for the GitHub repository with all the source code for this project is also in the description so to start things off we're going to need a few packages we're going to import The Whisper package for speech to text the Open Eye package that will allow us to access the gpt3 model for tax memorization the pi audio package to handle recording in writing an audio file the wave package to actually which is actually what handles the writing the audio file Pi audio is for the microphone stuff and then OS for doing file management in Python so I already loaded this cell into the perfect now we're going to set up our API key from an OPI if you don't have an API key you can just go to open Ai and get your API Key by signing up and my API key for security reasons is um exported as an environment variable in my computer so that it's not available in the script when I put this on GitHub Etc and yours should be as well with this now we're going to set up our whisper model and we're going to load it I already loaded this model and this is what's going to be doing the transcription of our audio file audio file now we're going to set up a few variables for the pi audio part where we you know for recording with the microphone we're going to need a few things like the channels the format the rate which is the number of samples collected per second in the audio I'm doing fairly high audio quality resolution the chunk and the path for the temporary audio file path I'm also going to need the number of iterations required to collect you know let's say five seconds of Five Seconds of audio and now I guess that is all and now we can start writing our app so we're gonna be up one thing missing here perfect okay so let's add let's add D yeah we're gonna set up so I'm just gonna copy these lines so what we're gonna need is we're gonna instantiate the pi audio class and now we're gonna start the stream I'm just gonna copy because I already have this done in my right so I'm just kind of like and gradually copying it here so that's easier to understand so I'm setting up the stream for recording the the audio then I create a list called frames I do a loop over I do a loop over the uh of a range of zero all the way up to the rate divided by the chunk multiplied by the number that represents you know how many iterations you need for X number of seconds and uh when I do this loop at each time I will read a chunk of audio and append that to my frames list finally when I'm done when it's done doing this so let's do that for now so for now uh this is going to read five seconds of volume let's see if this works so hello my name is Lucas and I'm recording my thoughts about writing great that I stopped uh I should run this which is to terminate the recording actually that should go on top as well one second we're gonna have this work up there I think that looks better yeah and now we're going to do the part of writing down our audio file so for that we're going to have to use the wave package so we're going to say WF is equal to wave dot open and then we're going to write it to the temporary audio file path that we had set up in the beginning and then we're going to set up a few things I'm just going to copy these which is the channel sample with frame rate Etc and I'm going to replace this with the print perfect and this is the temp where audio file path and now we are done so let's see perfect so that actually worked perfectly and now before we go into the transcription transcription bar which is which is right here uh let's record something a little bit more elaborate so so that you guys can get the idea of what would it be a thought summarization at right um so I'm gonna do a 10 seconds recording of me thinking about an idea for an article let's say this article thought summarization why are thoughts important so let's run this again and now I'm gonna run and I'm gonna you know elaborate on why recording and summarizing your thoughts is important recording and summarizing your thoughts is extremely important because it allows you to register your ideas as they appear and that's important and maybe 10 seconds is too quick so let's do 20 seconds I think 20 seconds a little bit better so let's do that again and okay so we ran this and now let's do that again recording and summarizing your thoughts is important because it allows you to have a timeline of your ideas in the evolution of your systems for thinking about things in general and that's relevant because in the future you might want to know what you thought about stuff okay all right that wasn't great but it is what it is so I'm gonna write the audio file I'm going to check that the other fire is doing well perfect it's working just right so what we can do now is we're going to use the model we instantiated in the beginning The Whisper model and we're going to hit the transcribe function to transcribe this audio and after we've done that we're going to access the output of that transcription with the text the text key inside this result object which I'm pretty sure is a dictionary and then we're going to write that to the terminal see transcription and then perfect recording summarized I thought it's important because it allows you to have a timeline of your ideas in the evolution of your systems for thinking about things that's relevant because in the future you might want to know what you thought about this stuff yeah it's not exactly a super elaborate idea but it was just for illustrating the point of how you can do that okay perfect finally what we're going to do is we're going to set up a prompt for our Tech summarization part now if you don't know what a prompt is look up prompt engineering and you know with Chad GPT and models like that we have a there's a lot of attention being geared towards prompt engineering right now because it's super interesting and the idea basically is how you talk to a large language model to get the best possible output so we're gonna say look I want you to summarize this text and then I'm going to feed in the string of my recording my transcription which is right here and so I'm going to say this and then I'm going to set up my access to the open eye API so I just say you know open an eye and then I access the text DaVinci zero zero three model which I I'm pretty sure is the best model available I think it's the GI GPT it's not charging PT sorry it's gpt3 so yeah for those of you who think gpt3 and charge PT is exactly the same it's not exactly the same because this model is actually not going to be as great but chargept has an API coming soon which is what I've heard so wait for that and now we're gonna get the response oh okay so now I sent the prompt and I can access the actual text summarizing that initial recording here all right so in this case the summary is a little bit kind of like exactly the same as my recording not a big fan of that but probably because since it was like a 20 seconds recording it it didn't have enough depth to actually have something to summarize so let's try it again this time I'm going to give 30 seconds I'm going to change the name of this thing it's not a number it's thought delay that's what I like calling this thing because it's the delay it's like how long you're going to record yourself for that's in the end that's what this thing is doing that's what this variable is doing it's like the delay in seconds how long you're recording lasts I think I just gave it a complicated name and now I'm going to run this again and let's see what should I think about actually there is one thing that yeah maybe yeah it's um it's a good idea so what I'm gonna do is I'm gonna read something from Ralph Waldo Emerson it's actually that's even better uh let's do something improvised hopefully I can quickly read this in 30 seconds so I'm reading a book called self-reliance and other essays by Ralph Waldo Emerson and there's a passage that I really like so why not record me saying that Passage and then getting a summary from gpd3 I think that's pretty that's pretty fun let's give it 35 seconds so that I have enough time to say all the stuff that's in the paragraph so I'm reading page 58 from this book self-reliance and other essays by doing his work he makes the need felt which he can supply and creates The Taste by which he's enjoyed by doing his own work he unfolds himself it is the vice of our public speaking that it has not abandonment somewhere not only every eurater but every man should let out all the length of all the Reigns should find or make a Frank and Hearty expression of what force and meaning is in him the common experience is that the man fits himself as well as he can do to the customary details of that work or trade he falls into intends it as a dog turns a spit all right I couldn't read all the stuff because 35 seconds wasn't enough but I think it had like it had enough meat so that we can get a decent summary out of it so let's now again rewrite down our auto file to get our transcription this time is going to be a bit bigger because we're talking about 35 seconds but look at that super quick and this is pretty good makes the neat foug you can supply create some taste by which is enjoy by doing his own work he unfolds himself the device of our public so I'm going to say summarize this let's change the prompt just just because I've had this passage by Ralph Waldo Emerson and now I access DPI I ask for a summary in I then print that summary to the terminal and look at that the summary from gpt3 was Rafael Emerson encourages everyone to embrace their work and express themselves fully he believes that when people do their own work and let out all the Rings they're able to unfold themselves and make their needs felt alright so gpt3 does make a lot of usage of the same type of vocabulary in the original text he argues there is a vice to not have abandonment in public speaking and that everyone should find a way to express their force in me but it's a pretty good summary I'd say um it is a bit long but I think that these are the kind of things that we kind of have to tweak in the process right we just would start with we would start with something you know 35 seconds that we're gonna do a minute and two minutes and then we see what is the right amount of time that you should record yourself thinking about something so that's worth it to get a summary from that but besides doing this what I would like to do as well is to give the same text to Chachi PT since we're here right why not so I'm gonna use I'm going to give charging PT the same output of the transcription just to see what rgbt comes up with and I'm going to say the same thing I said to the API so I'm going to say summarize I'm going to say summarize this passage by Rafael Emerson let's see what we get from Changi PT and then he says zamerson argues that a person should do their work to the best of their abilities allows them to express themselves and create a need for their work to be enjoyed believe that people should be honest Express their food potential rather than being limited by society and societal Norms this is true not just for public speaking but for everyone in their chosen trade or work not much better it's a much better summary and it's interesting that it makes use of words and vocabulary that wasn't necessarily present in the original text which I really like and yeah it pretty much captures the essence of that passage so I would say that both work they're both you know meaningful and charge PT has an API coming soon so you know that's going to be that's gonna be great uh so let's just wrap everything up into an app so I've already done that so instead of just writing down which I'm just going to show you the app already written and we're going to run it and just try to perhaps I'll read the second passage or I'll just feed that same temporary WAV file okay so this is the app and the app has all the stuff that we had in the in the notebook but just transformed and refactored for streamlit so to Showcase that what we're going to do is I'm going to run this I'm going to come here I'm going to say trimlet run thought summarization app dot pi hopefully there's not a lot of huge mistakes in this thing I hope there's no big mistakes but let's see thought summarization app I'm not exactly Super Creative but okay so I have the number input this might not be the best way to get numbers into this thing but whatever uh so I'll set the rate here so all the numbers that we were setting before in the notebook cell we're setting up now in streamlit and now chunk and the thought delay I think I'm gonna put like 45 seconds and perfect of okay I could just hit I have two buttons one for recording the audio and one for transcribing or summarizing the audit now the way that it is right now it does load a file that if I don't load and I don't say transcribe summarize this file just give it a path and it probably should do better yeah on the next iteration of this app I'll change this up so that you know I have more control over like record uh transcribe this audio that I'll do I think I was just I don't know why I didn't put this thing here but okay so let's say temporary files or path it's gonna write yeah perfect okay so let's work through the parts of the app just like in Notebook we have the same dependencies we set up our API key set up a title for the app right I could change this if I hit on streamlit uh let's see what is the always we run so if I change something here like thought summarization app by Lucas and I said oh sorry about by Lucas and I hit this he's gonna ask on the streamlit app okay so you want to rerun this always rerun and I hit always rerun and I can get changes in that thing really quickly so now if I say something like if I change it back it will automatically put that thing into streamlab because streamlin has this caching system that's pretty cool they're perfect now I'm loading the model just like I did before and then it says okay whisper model loaded I am hmm yeah before I was loading from a file path but this is what I'm going to do I'm gonna set up the audio file directly from the streamlit file uploader because why not yes why not so here's what we're gonna do yeah yeah yeah yeah temporary file the path but when I hit transcribe I'm gonna say yeah I'm gonna say exactly and if audio file is not none and then you can do the button and then the summarization bit which I'm doing yeah I am doing all at the same time okay perfect so now I change this app to be like this check it out uh and I'm just gonna give the audio file yeah to this thing yeah let's see if that works okay so I made one change because before I was doing from an audiophile path and live on YouTube not really live but whatever right now I changed that to the file uploader because I want to be able to you know browse the file find the way file and then give it to transcribe so I'm going to record the audio and the second thing I want to do as well yeah I want to be able to when I record your audio I want to save it according to a name so temporary audio file path instead of doing this it's going to be a SD Dot s d dot sidebar text input here it's input let's see if that works let's see if that works yeah that looks like it works yeah and then I'm gonna say audio file name perfect and I can put here this and I'm going to record an audio of like x amount of seconds and let's say I'm gonna you know read the same passage by Emerson maybe I'll try to read the entire paragraph and I'll see if my transcription works okay so let's see okay perfect by doing his work he makes the need felt which he can supply and creates The Taste by which he's enjoyed by doing his own work he unfolds himself it is the vice of our public speaking that he has not abandonment somewhere not only every old Raider but every man should let out all the length of all the reins should find or make a Frank and Hearty expression of what force and meaning is in him the common experience is that the man fits himself as well as he can do to the customary details of that work or trade he falls into and tends it as a doctor in the spit then is he a part of the machine he moves the man is lost until he can manage to communicate himself to others in his full stature in proportion he does not yet find his vocation okay so finish reading the I read really slow I'm very sorry about that but okay now we have we have a that looks modifier reading it output always that looks good I can now browse that file and let's see if I can find it uh thoughts or addition app hello so let's transition up should be written yeah public always oh yeah yeah it's right here and now I load it okay and now I'm going to transcribe and summarize that audio hopefully this will work nope because I got an uploaded file fair enough [Music] um so foreign [Music] okay so if that didn't work uh let's see if I just hit stream audio file I'm not sure the stream that allows this but let's see so I'm gonna say transcribe and summarize based on the file that I inputted let's see now I get a fail to load audio error yeah that's anyway that's fine that's fine so we'll go back to just without the other file and then I'll look into how to integrate that and maybe that would be something that I do in the future now I can just go back to this and uh this just may be temporary on your final path yeah perfect so now we have the output.way file recorded so I can just hit transcribe and summarize and let's see what happens perfect is transcribing the audio so this is the yeah this is the original text and now the final summary there we go perfect I'll manage to do his work express yourself fully to be successful to be successful should be able to communicate yourself to others it should not be lost in the machine but instead find a way to stand out and be noticed perfect that's our summary for that little chunk of of a thought of actually this was a passage from a book and so let's review just the app itself so that this little uh let's say kind of like little mistake in the middle didn't throw you off so just as before we were doing so we import the packages we set up the API key set up a title for the app we load the whisper model we did write yeah whisper model loaded we set up a bunch of number inputs like Channel format Ray chunk all of all of it for recording stuff with pi audio and then we have a couple of buttons we have one for recording and audio and then one for transcribing summarizing the transcription and summarization are done on top of a um of a file name that should be placed here so when we record uh when we record the audio it will get the uh the fire path that's in this text selection and then same thing when we transcribe and summarize it's going to get what's well it's going to search for a file that's described in this text selection and then it will show you the original text and then the summary and in the future I want to extend this a little bit you know I was thinking about how to use maybe embeddings and search so that if you had a database of like little thought summaries you could potentially do in context search of those texts so that could be interesting and pretty much I think this could be a cool way to complement stuff like brainstorming maybe if you had in your phone like when you record your thoughts and ideas and stuff and you could just like run these models to your audio files and just get the gist and get the main core things on each audio I think that could be really interesting and also there's something to say about this idea of like when the Under Fire is not very long so it's like a few minutes a few because if it's too long maybe when you record an audio it's not going to be as rich as it could be you know because if it's like a two hour recording there's a bunch of ideas there right and how is the model gonna know which idea you want I mean sure you can maybe like like chunk cut that audio into 30 second pieces and stuff but when you do that that's a problem in text as well you could be cutting something that's meaningful so I like this idea of like little chunks of other files and that's it thanks for watching if you like this video don't forget to like And subscribe and see you next time cheers
Original Description
In this video, I'll show you how to build a simple thought summarization app to allow you to record (or upload an audio file) a short audio of an idea or thought for example, and the app will transcribe and summarize that recording using Whisper, GPT3, and Streamlit.
- Subscribe!: https://www.youtube.com/channel/UCu8WF59Scx9f3H1N_FgZUwQ
- Join Medium: https://lucas-soares.medium.com/membership
- Tiktok: https://www.tiktok.com/@enkrateialucca?lang=en
- Twitter: https://twitter.com/LucasEnkrateia
- LinkedIn: https://www.linkedin.com/in/lucas-soares-969044167/
Productivity Products:
- Kindle Oasis: https://amzn.to/3IUtaOh
- Seagate Portable 2TB External Hard Drive HDD: https://amzn.to/3QSZ8wd
- Sony WH-1000XM5 Wireless with Noise Cancelling: https://amzn.to/3HfJvM8
(Affiliate links to support this channel :) )
Music from Epidemic Sound: www.epidemicsound.com
Stock footage from Pixabay: pixabay.com
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Automata Learning Lab · Automata Learning Lab · 29 of 60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
▶
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
A Quick Tutorial on NLP Basics
Automata Learning Lab
Automating your Digital Morning Routine with Python
Automata Learning Lab
Exploring Problem Solving with Python and Jupyter Notebook #1
Automata Learning Lab
Summarize Papers with Python and GPT-3
Automata Learning Lab
An Experiment Tracking Tutorial with Mlflow and Keras
Automata Learning Lab
Automating Google Forms Submissions with Python
Automata Learning Lab
Productivity Tracking With Python and the Notion API
Automata Learning Lab
When your Machine Learning Model Fails Do This ;p
Automata Learning Lab
Machine Learning Tip#1 Practical Deep Learning Course
Automata Learning Lab
Machine Learning Tips: Deep Learning Monitor
Automata Learning Lab
Machine Learning Tips#5 MLOPs specialization in Coursera #machinelearning
Automata Learning Lab
Automatically Changing Desktop Wallpaper with Python and the Nasa Image API
Automata Learning Lab
Building an Image Classifier to Filter Out Unused Images From Your Photo Album with Machine Learning
Automata Learning Lab
Automating VS Code Snippets with Python
Automata Learning Lab
How to Set Up a Machine Learning Environment with Conda and Pip-Tools
Automata Learning Lab
9 Google Search Tips for Machine Learning
Automata Learning Lab
Thinking Tools
Automata Learning Lab
Automating Car Search with Python and Data Science
Automata Learning Lab
Generating Images from Text with Stable Diffusion and Hugging Face
Automata Learning Lab
A Practical Introduction to Data Science using the Spaceship Titanic Dataset from Kaggle
Automata Learning Lab
Jiu Jitsu App with Python and Streamlit
Automata Learning Lab
2 Apps for Coding In The Ipad Pro
Automata Learning Lab
From Tensorflow to Pytorch?
Automata Learning Lab
Building an Audio Transcription App with OpenAI Whisper and Streamlit
Automata Learning Lab
Productivity Tracking with Python Short Summary
Automata Learning Lab
Automating Expense Reports with Python
Automata Learning Lab
ChatGPT, Angry Pandas and AI Code
Automata Learning Lab
7 Strategies To Learn Anything Using ChatGPT
Automata Learning Lab
Building a Thought Summarization App with Whisper and GPT3
Automata Learning Lab
Visualize a Neural Net Learning Polynomial Functions
Automata Learning Lab
Automating Notion with Python
Automata Learning Lab
Pose Tracking for Jiu Jitsu - Update #jiujitsu #machinelearning
Automata Learning Lab
Update to my Pose Tracking for Jiu Jitsu Project #machinelearning #jiujitsu #ai #deeplearning
Automata Learning Lab
ChatGPT API Released by OpenAI
Automata Learning Lab
ChatGPT API Response Format #machinelearning #ai #datascience
Automata Learning Lab
Beyond Stable Diffusion with Composer | Automata Learning Lab Paper Series #1
Automata Learning Lab
Beyond Diffusion Models with Composer #machinelearning #ai
Automata Learning Lab
Machine Learning for Jiu Jitsu
Automata Learning Lab
Prompt Engineering Basics #machinelearning #gpt4 #chatgpt
Automata Learning Lab
Visual ChatGPT: Integrating Images with ChatGPT Paper Series#2
Automata Learning Lab
Visual ChatGPT #machinelearning #ai #artificialintelligence
Automata Learning Lab
LERF - Language Embeddings + NERF for Querying 3D Spaces #machinelearning #ai
Automata Learning Lab
Summarize Papers with Python and ChatGPT
Automata Learning Lab
Large Language Models can use Tools Now! #artificialintelligence #machinelearning #ai
Automata Learning Lab
Sparks of AGI in GPT4? #machinelearning #ai #agi #artificialintelligence
Automata Learning Lab
Toolformer: LLMs can use Tools! #chatgpt #llms #gpt4 #gpt3 #artificialintelligence
Automata Learning Lab
Talking to Your Notes with LangChain #artificialintelligence #llms #gpt4 #chatgpt
Automata Learning Lab
How to Talk to a PDF using LangChain and ChatGPT
Automata Learning Lab
Query Your Own Notes With LangChain
Automata Learning Lab
HuggingGPT #machinelearning #artificialintelligence #huggingface #gpt4 #chatgpt
Automata Learning Lab
Do as I Can Not as I Say Paper #artificialintelligence #llms #reinforcementlearning
Automata Learning Lab
Automating Anki Flashcards with OpenAI and GPT-4
Automata Learning Lab
Building A PDF Summarization App with Gradio and LangChain
Automata Learning Lab
Auto-GPT #artificialintelligence #gpt4 #llms #autogpt
Automata Learning Lab
DocGPT - Chat with Github #artificialintelligence #gpt4 #chatgpt
Automata Learning Lab
LLMs for Research and Planning #artificialintelligence #gpt4 #llms
Automata Learning Lab
How I Use ChatGPT for Interactive Language Learning
Automata Learning Lab
Building an Audio Transcription App with Gradio and Whisper
Automata Learning Lab
Summarizing and Querying Multiple Papers with LangChain
Automata Learning Lab
Mojo - The New AI Programming Language?
Automata Learning Lab
More on: LLM Foundations
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
Claude AI vs ChatGPT: Which One Is Actually Better in 2026?
Medium · AI
Claude AI vs ChatGPT: Which One Is Actually Better in 2026?
Medium · Programming
IntelliBooks: Classic RAG vs Graph RAG vs Agentic RAG – Choosing the Right AI Retrieval Architecture for Enterprise AI
Dev.to AI
Fluid, natural voice translation with Gemini 3.5 Live Translate
Dev.to AI
🎓
Tutor Explanation
DeepCamp AI