OpenAI’s Whisper is AMAZING!

Underfitted · Beginner ·🧬 Deep Learning ·3y ago

Key Takeaways

The video demonstrates the use of OpenAI's Whisper model for speech recognition, transcription, and translation, using a Python notebook with a simple interface to record and transcribe audio.

Full Transcript

seriously is that enough to pull this off did you just saw the latest hot open source model released by open AI transcribing this guy it's called whisper and it's really really good well in reality what you just saw didn't happen in real time that was just me and this video but I did put together some code so you can see how good this model is and try it for yourself for context whisper is the speech recognition model that you can use for transcription and translation South Florida is one of the most beautiful places in the continental United States I'm gonna run that text through the model but we can make it even more interesting we'll see if this model can transcribe the audio and translate it into English so let's bring the computer and get started alright before we get started I want you to assume that the average human has a life expect expectancy of 80 years and sleeps 8 hours every single day that gives us a total of 467 000 hours where we are awake in contrast open AI used 680 000 hours of data to try and Whisper that is around 45 more listening time that we get in our entire lifetime so no wonder whisper is really really good now the model doesn't specialize in any particular task but open AI claims that it makes around 50 percent fewer errors across many different sample data sets that's just nuts final thing I'll say before I shut up and take a look at the code whisper is not only English which is huge about a third of the data set that openai used to train whisper is non-english so you can use the model to transcribe from a bunch of different languages all right let's look at the code which is surprisingly short by the way you'll find a link to this notebook in the description below alright so first I'm going to install whisper directly from their GitHub repo and radio and the reason I'm using radio is to create a very simple interface where we can record the audio directly from my computer and transcribe it and translate it that interface is based on a notebook that hugging face created and published online I took it simplified it a little bit added a couple more things and that's what you get here for the model itself open AI offers several options and I copied the table from their GitHub repo so here you have the list of different models that you can load on your computer depending on how much memory you want to use how fast you need the results to be or whether or not you need multilingual support personally I'm using the medium model but I found that the base model which is way smaller works very very well as well to start things off I loaded that model and then I created a couple of functions and I want you to notice how simple these functions are first there is a transcribe function that's going to receive a file and that's the audio file that's the recording and then we're gonna call the transcribe function of the whisper model that is here however that I'm passing a list of options and one of those options is the task that I want to do within my function and in this particular case the task is transcribing the audio the translate function is very simple as well and it's almost a match of the previous function except here the task is going to be translate now remember right now the whisper model only supports translating into English so you can start with any language and translate into English finally this is the gradual interface very very simple interface you can see it here it's got a couple the button want to transcribe my audio one to translate my audio and then you get here an area where we are going to display the text the result of that transcription or translation and the code is very straightforward you get the capturing of the audio here the component that's going to capture that audio here you have a couple of buttons the transcribe and the translate button and notice how I'm connecting the click event on these two lines I'm connecting the click event to the two functions that I created before therefore when you click on the transcribe button we're gonna call the transcribe function we're gonna pass the audio and we're gonna receive the result and display it in the text box that we added to the interface very simple stuff so this is everything we need for our example I added a final cell to my notebook where I'm calling directly the transcribe and translate function and this is useful if you want it for example to record your audio from your phone and then send it to your computer as a file you can upload the file and then access the functions directly pass in the file name so if you want to use that you have it there let's give this a try South Florida is one of the most beautiful places in the continental United States right so that's my audio South Florida is one of the most beautiful places in the continental United States sounds good let's click transcribe here and that was perfect that was fast that was beautiful all right let's try something else let's do it now in Spanish okay that sounds good let's transcribe it that was perfect that was very good and now let's translate it's gonna take the same audio translated into English the progress of Science in the last decade is incredible that was amazing okay so here you have it this morning I just saw people putting together samples where they are transcribing and translating YouTube videos music to get out the lyrics it's really really cool the community is coming together they're starting to build super cool things with this is open source so you can use it right away you really need to give it a try remember the link to this notebook is in the description below so how about it go nuts build something cool and I'll see you in the next one no no wait wait wait wait you made it all the way here so please like the video below subscribe to my channel and now for real I'll see you in the next one

Original Description

I ran OpenAI’s Whisper model in a notebook and used it to transcribe and translate my voice. Link to the notebook: https://colab.research.google.com/drive/1mTxa7I5jJ9dx-rHTmyl_9FhUMHh2q6PL?usp=sharing 🔔 Subscribe for more stories: https://www.youtube.com/@underfitted?sub_confirmation=1 📚 My 3 favorite Machine Learning books: • Deep Learning With Python, Second Edition — https://amzn.to/3xA3bVI • Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow — https://amzn.to/3BOX3LP • Machine Learning with PyTorch and Scikit-Learn — https://amzn.to/3f7dAC8 Twitter: https://twitter.com/svpino Disclaimer: Some of the links included in this description are affiliate links where I'll earn a small commission if you purchase something. There's no cost to you.
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Underfitted · Underfitted · 12 of 60

1 Test-Time Augmentation In Machine Learning.
Test-Time Augmentation In Machine Learning.
Underfitted
2 Don't Replace Missing Values In Your Dataset.
Don't Replace Missing Values In Your Dataset.
Underfitted
3 Introduction to Adversarial Validation In Machine Learning.
Introduction to Adversarial Validation In Machine Learning.
Underfitted
4 Introduction To Autoencoders In Machine Learning.
Introduction To Autoencoders In Machine Learning.
Underfitted
5 Active Learning. The Secret of Training Models Without Labels.
Active Learning. The Secret of Training Models Without Labels.
Underfitted
6 Early Stopping. The Most Popular Regularization Technique In Machine Learning.
Early Stopping. The Most Popular Regularization Technique In Machine Learning.
Underfitted
7 The Confusion Matrix in Machine Learning
The Confusion Matrix in Machine Learning
Underfitted
8 3 Tips to Build a Career in Machine Learning (Unconventional Advice)
3 Tips to Build a Career in Machine Learning (Unconventional Advice)
Underfitted
9 I can predict cars CRASHING. And it's 99% accurate!
I can predict cars CRASHING. And it's 99% accurate!
Underfitted
10 A Critical Skill People Learn Too LATE: Learning Curves In Machine Learning.
A Critical Skill People Learn Too LATE: Learning Curves In Machine Learning.
Underfitted
11 The BEST Machine Learning Interview Strategy.
The BEST Machine Learning Interview Strategy.
Underfitted
OpenAI’s Whisper is AMAZING!
OpenAI’s Whisper is AMAZING!
Underfitted
13 5 Lessons You’re NOT Taught in School
5 Lessons You’re NOT Taught in School
Underfitted
14 TensorFlow On Apple Silicon. Step-by-Step Instructions
TensorFlow On Apple Silicon. Step-by-Step Instructions
Underfitted
15 Generating Images From Text. Stable Diffusion, Explained
Generating Images From Text. Stable Diffusion, Explained
Underfitted
16 The Wrong Batch Size Will Ruin Your Model
The Wrong Batch Size Will Ruin Your Model
Underfitted
17 8 Mistakes Holding Your Career Back | Machine Learning
8 Mistakes Holding Your Career Back | Machine Learning
Underfitted
18 AI Just Solved a 53-Year-Old Problem! | AlphaTensor, Explained
AI Just Solved a 53-Year-Old Problem! | AlphaTensor, Explained
Underfitted
19 Bias and Variance, Simplified
Bias and Variance, Simplified
Underfitted
20 Should You Stop Splitting Your Data Like This?
Should You Stop Splitting Your Data Like This?
Underfitted
21 The Function That Changed Everything
The Function That Changed Everything
Underfitted
22 This Model Caused A Nuclear Disaster
This Model Caused A Nuclear Disaster
Underfitted
23 Will Your Code Write Itself?
Will Your Code Write Itself?
Underfitted
24 The Simplest Encoding You’ve Never Heard Of
The Simplest Encoding You’ve Never Heard Of
Underfitted
25 Superhuman AI Cracked An Impossible Game! | DeepNash, Explained
Superhuman AI Cracked An Impossible Game! | DeepNash, Explained
Underfitted
26 Can you become a Data Scientist without a Ph.D?
Can you become a Data Scientist without a Ph.D?
Underfitted
27 How to 10x your productivity with ChatGPT?
How to 10x your productivity with ChatGPT?
Underfitted
28 Cheating the Prisoner's Dilemma
Cheating the Prisoner's Dilemma
Underfitted
29 We integrated OpenAI's Whisper with Spot
We integrated OpenAI's Whisper with Spot
Underfitted
30 The Machine Learning School program
The Machine Learning School program
Underfitted
31 We integrated ChatGPT with our robots
We integrated ChatGPT with our robots
Underfitted
32 Solving complex tasks using a Large Language Model (LLM)
Solving complex tasks using a Large Language Model (LLM)
Underfitted
33 5 problems when using a Large Language Model
5 problems when using a Large Language Model
Underfitted
34 We just discovered faster sorting algorithms!
We just discovered faster sorting algorithms!
Underfitted
35 The 3 most important updates to OpenAI's API.
The 3 most important updates to OpenAI's API.
Underfitted
36 People are divided! Does GPT-4 understand what it says?
People are divided! Does GPT-4 understand what it says?
Underfitted
37 How much should you charge hourly as a Machine Learning freelancer?
How much should you charge hourly as a Machine Learning freelancer?
Underfitted
38 Building a RAG application from scratch using Python, LangChain, and the OpenAI API
Building a RAG application from scratch using Python, LangChain, and the OpenAI API
Underfitted
39 Building a RAG application using open-source models (Asking questions from a PDF using Llama2)
Building a RAG application using open-source models (Asking questions from a PDF using Llama2)
Underfitted
40 How to evaluate an LLM-powered RAG application automatically.
How to evaluate an LLM-powered RAG application automatically.
Underfitted
41 Step by step no-code RAG application using Langflow.
Step by step no-code RAG application using Langflow.
Underfitted
42 I built a simple game using Langchain. Here is a step by step tutorial.
I built a simple game using Langchain. Here is a step by step tutorial.
Underfitted
43 I used the first AI Software Engineer for a week. This is happening.
I used the first AI Software Engineer for a week. This is happening.
Underfitted
44 I deployed a recommendation model. Testing Models In Production using Interleaving Experiments.
I deployed a recommendation model. Testing Models In Production using Interleaving Experiments.
Underfitted
45 How to run PyTorch, TensorFlow, and JAX on your Mac (Apple Silicon)
How to run PyTorch, TensorFlow, and JAX on your Mac (Apple Silicon)
Underfitted
46 How to train a model to generate image embeddings from scratch
How to train a model to generate image embeddings from scratch
Underfitted
47 Building an AI assistant that listens and sees the world (Step by step tutorial)
Building an AI assistant that listens and sees the world (Step by step tutorial)
Underfitted
48 Why are vector databases so FAST?
Why are vector databases so FAST?
Underfitted
49 A Machine Learning roadmap (the one I recommend to my students)
A Machine Learning roadmap (the one I recommend to my students)
Underfitted
50 How to build a real-time AI assistant (with voice and vision)
How to build a real-time AI assistant (with voice and vision)
Underfitted
51 An introduction to Mojo (for Python developers)
An introduction to Mojo (for Python developers)
Underfitted
52 How does Lexical Scoping in Mojo 🔥 works (under 3 minutes)
How does Lexical Scoping in Mojo 🔥 works (under 3 minutes)
Underfitted
53 Building a CI workflow for those who hate it (using GitHub Actions)
Building a CI workflow for those who hate it (using GitHub Actions)
Underfitted
54 How to run Python Code in Mojo 🔥
How to run Python Code in Mojo 🔥
Underfitted
55 AI will not take your job. Here is what I think will happen instead.
AI will not take your job. Here is what I think will happen instead.
Underfitted
56 How to fine-tune a model using LoRA (step by step)
How to fine-tune a model using LoRA (step by step)
Underfitted
57 Late initialization in Mojo🔥 (Python doesn't support this)
Late initialization in Mojo🔥 (Python doesn't support this)
Underfitted
58 The $1,000,000 problem AI can't solve
The $1,000,000 problem AI can't solve
Underfitted
59 A gentle introduction to RAG (using open-source models)
A gentle introduction to RAG (using open-source models)
Underfitted
60 Automating feedback using ChatGPT and Zapier
Automating feedback using ChatGPT and Zapier
Underfitted

The video teaches how to use OpenAI's Whisper model for speech recognition, transcription, and translation, and provides a simple Python notebook interface to record and transcribe audio. The model is shown to be highly accurate and can be used for a variety of languages.

Key Takeaways
  1. Install Whisper from GitHub
  2. Load the Whisper model
  3. Create a transcribe function
  4. Create a translate function
  5. Record audio using the notebook interface
  6. Transcribe audio using the transcribe function
  7. Translate audio using the translate function
💡 The Whisper model is highly accurate and can be used for a variety of languages, making it a powerful tool for speech recognition and translation.

Related AI Lessons

Want to get started with deep learning
Get started with deep learning by leveraging resources like Andrew Karpathy's playlist and frameworks such as TensorFlow or PyTorch
Reddit r/deeplearning
Building a Deepfake Detector From Scratch — What Nobody Tells You
Learn to build a deepfake detector from scratch and understand the challenges involved in detecting AI-generated fake media
Medium · Deep Learning
Unfolding the Meandering Path: High-Dimensional Invariance and the Flat 2D Plane of Neural…
Learn about high-dimensional invariance and its relation to the flat 2D plane of neural networks, and how to apply these concepts to improve model performance
Medium · Deep Learning
Implementing Neural Style Transfer from Scratch: The Project That Started It All
Learn to implement Neural Style Transfer from scratch and understand its significance in deep learning
Medium · Deep Learning
Up next
Image Classification with ml5.js
The Coding Train
Watch →