OpenAI’s Whisper is AMAZING!
Skills:
LLM Foundations50%
Key Takeaways
The video demonstrates the use of OpenAI's Whisper model for speech recognition, transcription, and translation, using a Python notebook with a simple interface to record and transcribe audio.
Full Transcript
seriously is that enough to pull this off did you just saw the latest hot open source model released by open AI transcribing this guy it's called whisper and it's really really good well in reality what you just saw didn't happen in real time that was just me and this video but I did put together some code so you can see how good this model is and try it for yourself for context whisper is the speech recognition model that you can use for transcription and translation South Florida is one of the most beautiful places in the continental United States I'm gonna run that text through the model but we can make it even more interesting we'll see if this model can transcribe the audio and translate it into English so let's bring the computer and get started alright before we get started I want you to assume that the average human has a life expect expectancy of 80 years and sleeps 8 hours every single day that gives us a total of 467 000 hours where we are awake in contrast open AI used 680 000 hours of data to try and Whisper that is around 45 more listening time that we get in our entire lifetime so no wonder whisper is really really good now the model doesn't specialize in any particular task but open AI claims that it makes around 50 percent fewer errors across many different sample data sets that's just nuts final thing I'll say before I shut up and take a look at the code whisper is not only English which is huge about a third of the data set that openai used to train whisper is non-english so you can use the model to transcribe from a bunch of different languages all right let's look at the code which is surprisingly short by the way you'll find a link to this notebook in the description below alright so first I'm going to install whisper directly from their GitHub repo and radio and the reason I'm using radio is to create a very simple interface where we can record the audio directly from my computer and transcribe it and translate it that interface is based on a notebook that hugging face created and published online I took it simplified it a little bit added a couple more things and that's what you get here for the model itself open AI offers several options and I copied the table from their GitHub repo so here you have the list of different models that you can load on your computer depending on how much memory you want to use how fast you need the results to be or whether or not you need multilingual support personally I'm using the medium model but I found that the base model which is way smaller works very very well as well to start things off I loaded that model and then I created a couple of functions and I want you to notice how simple these functions are first there is a transcribe function that's going to receive a file and that's the audio file that's the recording and then we're gonna call the transcribe function of the whisper model that is here however that I'm passing a list of options and one of those options is the task that I want to do within my function and in this particular case the task is transcribing the audio the translate function is very simple as well and it's almost a match of the previous function except here the task is going to be translate now remember right now the whisper model only supports translating into English so you can start with any language and translate into English finally this is the gradual interface very very simple interface you can see it here it's got a couple the button want to transcribe my audio one to translate my audio and then you get here an area where we are going to display the text the result of that transcription or translation and the code is very straightforward you get the capturing of the audio here the component that's going to capture that audio here you have a couple of buttons the transcribe and the translate button and notice how I'm connecting the click event on these two lines I'm connecting the click event to the two functions that I created before therefore when you click on the transcribe button we're gonna call the transcribe function we're gonna pass the audio and we're gonna receive the result and display it in the text box that we added to the interface very simple stuff so this is everything we need for our example I added a final cell to my notebook where I'm calling directly the transcribe and translate function and this is useful if you want it for example to record your audio from your phone and then send it to your computer as a file you can upload the file and then access the functions directly pass in the file name so if you want to use that you have it there let's give this a try South Florida is one of the most beautiful places in the continental United States right so that's my audio South Florida is one of the most beautiful places in the continental United States sounds good let's click transcribe here and that was perfect that was fast that was beautiful all right let's try something else let's do it now in Spanish okay that sounds good let's transcribe it that was perfect that was very good and now let's translate it's gonna take the same audio translated into English the progress of Science in the last decade is incredible that was amazing okay so here you have it this morning I just saw people putting together samples where they are transcribing and translating YouTube videos music to get out the lyrics it's really really cool the community is coming together they're starting to build super cool things with this is open source so you can use it right away you really need to give it a try remember the link to this notebook is in the description below so how about it go nuts build something cool and I'll see you in the next one no no wait wait wait wait you made it all the way here so please like the video below subscribe to my channel and now for real I'll see you in the next one
Original Description
I ran OpenAI’s Whisper model in a notebook and used it to transcribe and translate my voice.
Link to the notebook: https://colab.research.google.com/drive/1mTxa7I5jJ9dx-rHTmyl_9FhUMHh2q6PL?usp=sharing
🔔 Subscribe for more stories: https://www.youtube.com/@underfitted?sub_confirmation=1
📚 My 3 favorite Machine Learning books:
• Deep Learning With Python, Second Edition — https://amzn.to/3xA3bVI
• Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow — https://amzn.to/3BOX3LP
• Machine Learning with PyTorch and Scikit-Learn — https://amzn.to/3f7dAC8
Twitter: https://twitter.com/svpino
Disclaimer: Some of the links included in this description are affiliate links where I'll earn a small commission if you purchase something. There's no cost to you.
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Underfitted · Underfitted · 12 of 60
1
2
3
4
5
6
7
8
9
10
11
▶
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Test-Time Augmentation In Machine Learning.
Underfitted
Don't Replace Missing Values In Your Dataset.
Underfitted
Introduction to Adversarial Validation In Machine Learning.
Underfitted
Introduction To Autoencoders In Machine Learning.
Underfitted
Active Learning. The Secret of Training Models Without Labels.
Underfitted
Early Stopping. The Most Popular Regularization Technique In Machine Learning.
Underfitted
The Confusion Matrix in Machine Learning
Underfitted
3 Tips to Build a Career in Machine Learning (Unconventional Advice)
Underfitted
I can predict cars CRASHING. And it's 99% accurate!
Underfitted
A Critical Skill People Learn Too LATE: Learning Curves In Machine Learning.
Underfitted
The BEST Machine Learning Interview Strategy.
Underfitted
OpenAI’s Whisper is AMAZING!
Underfitted
5 Lessons You’re NOT Taught in School
Underfitted
TensorFlow On Apple Silicon. Step-by-Step Instructions
Underfitted
Generating Images From Text. Stable Diffusion, Explained
Underfitted
The Wrong Batch Size Will Ruin Your Model
Underfitted
8 Mistakes Holding Your Career Back | Machine Learning
Underfitted
AI Just Solved a 53-Year-Old Problem! | AlphaTensor, Explained
Underfitted
Bias and Variance, Simplified
Underfitted
Should You Stop Splitting Your Data Like This?
Underfitted
The Function That Changed Everything
Underfitted
This Model Caused A Nuclear Disaster
Underfitted
Will Your Code Write Itself?
Underfitted
The Simplest Encoding You’ve Never Heard Of
Underfitted
Superhuman AI Cracked An Impossible Game! | DeepNash, Explained
Underfitted
Can you become a Data Scientist without a Ph.D?
Underfitted
How to 10x your productivity with ChatGPT?
Underfitted
Cheating the Prisoner's Dilemma
Underfitted
We integrated OpenAI's Whisper with Spot
Underfitted
The Machine Learning School program
Underfitted
We integrated ChatGPT with our robots
Underfitted
Solving complex tasks using a Large Language Model (LLM)
Underfitted
5 problems when using a Large Language Model
Underfitted
We just discovered faster sorting algorithms!
Underfitted
The 3 most important updates to OpenAI's API.
Underfitted
People are divided! Does GPT-4 understand what it says?
Underfitted
How much should you charge hourly as a Machine Learning freelancer?
Underfitted
Building a RAG application from scratch using Python, LangChain, and the OpenAI API
Underfitted
Building a RAG application using open-source models (Asking questions from a PDF using Llama2)
Underfitted
How to evaluate an LLM-powered RAG application automatically.
Underfitted
Step by step no-code RAG application using Langflow.
Underfitted
I built a simple game using Langchain. Here is a step by step tutorial.
Underfitted
I used the first AI Software Engineer for a week. This is happening.
Underfitted
I deployed a recommendation model. Testing Models In Production using Interleaving Experiments.
Underfitted
How to run PyTorch, TensorFlow, and JAX on your Mac (Apple Silicon)
Underfitted
How to train a model to generate image embeddings from scratch
Underfitted
Building an AI assistant that listens and sees the world (Step by step tutorial)
Underfitted
Why are vector databases so FAST?
Underfitted
A Machine Learning roadmap (the one I recommend to my students)
Underfitted
How to build a real-time AI assistant (with voice and vision)
Underfitted
An introduction to Mojo (for Python developers)
Underfitted
How does Lexical Scoping in Mojo 🔥 works (under 3 minutes)
Underfitted
Building a CI workflow for those who hate it (using GitHub Actions)
Underfitted
How to run Python Code in Mojo 🔥
Underfitted
AI will not take your job. Here is what I think will happen instead.
Underfitted
How to fine-tune a model using LoRA (step by step)
Underfitted
Late initialization in Mojo🔥 (Python doesn't support this)
Underfitted
The $1,000,000 problem AI can't solve
Underfitted
A gentle introduction to RAG (using open-source models)
Underfitted
Automating feedback using ChatGPT and Zapier
Underfitted
More on: LLM Foundations
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
Want to get started with deep learning
Reddit r/deeplearning
Building a Deepfake Detector From Scratch — What Nobody Tells You
Medium · Deep Learning
Unfolding the Meandering Path: High-Dimensional Invariance and the Flat 2D Plane of Neural…
Medium · Deep Learning
Implementing Neural Style Transfer from Scratch: The Project That Started It All
Medium · Deep Learning
🎓
Tutor Explanation
DeepCamp AI