Code your own YouTube AI assistant in Python

Data Professor · Beginner ·🧠 Large Language Models ·1y ago

Key Takeaways

This video demonstrates how to build a Python workflow to extract knowledge from YouTube videos using AssemblyAI's LeMU and Anthropic's Claude 3.5 Sonnet large language model.

Full Transcript

in this video we're going to build a python workflow to answer questions from YouTube videos automatically why is this useful you might ask well firstly you'll be able to save time by finding key information from the video number two you'll be able to learn efficiently by quickly grasping or extracting the main points from the video's content thirdly you'll be able to boost your productivity by automating the process of content research which typically take days or weeks in the order to perform and so you still see that this project condenses Cutting Edge AI into a few lines of python code and so without further Ado let's dive in all right so here is the question answering of YouTube video using assembly ai's lammer model and you can follow along in this Jupiter notebook and the links to this will be provided in the video description so we're going to use a assembly AI for performing the processing and analyzing of audio data and the documentation will typically be consulted during the building of this python workflow and so the schematic of what we're building today can be summarized in this illustration so essentially we're going to take a YouTube video where we're going to provide the URL of the video and then it's going to download the audio file and we're doing that using the YT DLP python library and once we have the audio file we're going to read it in using assembly AI which would then convert the audio file into a text transcript file and then the transcript file here will then be used as an input to the large language model and so this will be packaged as the lmer model by assembly Ai and then we're going to Tak in the question prompt as an input and then we're going to generate the output to be the answer to the question being asked and under the hood we're using the cloud 3.5 sonnet and so I think we're ready to begin so firstly you want to go to sign up for an account as I have already signed up I'll be able to access the API key so I just click on copy API key right here and then in the collab notebook you'll be able to put in all of your API keys in the secrets management here so if you click on it it will then be expanded here so you're going to see that I have all of my API Keys conveniently accessible here on the collab so I'm going to activate the API key for the assembly Ai and then instructions for using the API key will be described here so let's begin let's install the prerequisite libraries so here we're going to install the YT DLP which allows you to download the YouTube audio file and then we're also going to install assembly AI so you might notice that in Prior videos I have already generated tutorial videos on using assembly AI for transcribing audio files and so before it was an API access to the assembly AI platform but for this tutorial we're doing that using the python library from assembly Ai and so here we're going to load in the API key into a. settings. API key and that will allow us to access the model so next we're going to import the YT DLP module we're going to define a custom function that will allow us to download an MP3 audio file and so let's do that and then as input we're going to put in the URL of the YouTube video so here let me show you is a YouTube video of Steve Jobs Stanford commencement address in 2005 so the video is 15 minutes long so we're going to put put the URL in and then we're going to run this custom function which will allow us to download the audio and so we're is saving it locally here and let's have a look after a short moment it's downloading all right it's finished and let's have a look here in the directory so this is the audio file it's an MP3 file 20.7 megabits let's proceed to extracting the video title so that's the video title firstly we're going to generate the video title text and then we're going to use that as an input and then you'll be able to hear the audio that was downloaded directly in the collab so please note that this is for educational purpose [Music] only okay and so you're going to see that it works okay and now we're going to proceed to processing and analyzing the audio so in order to perform the question and answering of the YouTube video first we're going to transcribe the audio file meaning that we're going to take the MP3 audio file here and then we're going to convert it into text format which is the transcript and then we're going to do that using the transcribe method from assembly AI python library and then we're saving it as the transcriber variable and then we're using that together we did transcribe method in order to generate the transcript and then this transcript along with the prompt let's run it first so the transcript text file along with the prompt would then be used as input to the lmer model let's have a look what it looks like the transcript here it's probably a object yep so it's an object and then the prompt that we're going to use which is the question prompt is what are the five key messages that Steve Jobs wanted to convey in the speech and so so these two will be used as inputs so here we're going to use the lmer task method on the transcript object that we have just created a few moments ago and so we're going to use the prompt question and also the transcript as the input here let's run it and in a few seconds it should be able to generate the result and so the result will be the answer so there are other parameters that you could also try it out like the max output size which is relative the length of the output response and also you could play around with the temperature which allows the large language model to be creative in generating the response output so let's have a look at the result so it's spitting out this after the output you'll be able to see the number of input tokens that have been used the number of output tokens used 275 and the input is 2956 which is for the 15 Minutes video and now we're going to print the response so is result. response and so these are the recommendations that Steve Jobs has given in his video so the five key messages are connect the dots love what you do learn from setbacks live each day as if it were your last follow your heart and intuition and then he closed the video by saying stay hungry stay foolish and yeah so that's a pretty good summary of the video and you'll be able to see that in only a few seconds you'll be able to get the grasp of the contents of the video and so imagine that you have let's say more than one video 10 video 100 videos that you're going to use as a starting point for your research you could essentially compile and harness this very simple workflow to help you out with your research so you could compile hundreds of videos and then you'll be able to consolidate all of the Lessons Learned into a single Corpus of text so this code cell will allow you to more or less format the response so let's have a look at the output again so I'm just going to copy this and then we're going to print it below and it should wrap the word there you go you don't have to scroll left or scroll right the entire text will be conveniently word wrapped instead of this being on you know the same line but then you have to scroll left right and let's say that you want to delete the generated response from the assembly AI server you could do that by using the purge request data method just run it and then you'll be able to delete it from the server let's have a look at other models that you could try out so currently you're going to see that it has basic Cloud 2 Cloud 35 Cloud 3 and then the one that we've used is the 35 Sonet and there's also Mistral 7B as well and so you're going to see here that in only few lines of code you could generate the response output which is the key messages from the Steve Jobs video let's say that we want to have another prompt so we're going to write a short blog of 500 Words and we're going to use that as the input and let's see let have a look at the output tokens 659 so it's much more than the previous one let's have it look at the blog all right so it's more or less expanding the key messages so that's the title of the blog that's the introductory paragraph and then here are some of the paragraphs on connecting the dots loving what you do and it also goes to summarize the key messages here along with including paragraph So this is pretty cool and all of the references that you'll be able to use if you have any questions it's provided here this is the link to the lmer model here's the specific page on asking questions about your audio data and there's also the processing audio files and if you like this type of video please check out the data Professor YouTube channel so click here to go to the data Professor YouTube channel and so as you can see in only a few lines of code and also a very simple workflow you'll be able to go from a YouTube video URL to audio file to transcript then to the generated response output answer by providing a simple Quest prompt and so this is the beginning you could think of it as a starter code for you to generate something much more complicated and so let me know in the comment section down below how you're going to build out your very own workflow and so this Jupiter notebook is provided in the video description so thanks for watching until the end of the video if you watch this far please drop a fire Emoji so that we know that you're the real one and as always remember to hit that subscribe button turn on notifications and also share with your friends and as always the best way to learn data science or AI is to do data science or AI

Original Description

In this video, we're building a Python workflow that helps you extract knowledge from any YouTube video. In a nutshell, the general workflow includes: 1. Extracting and downloading the audio from a YouTube video 2. Transcribing the audio into text form 3. Answer questions about the video using LeMUR from AssemblyAI, where under the hood Anthropic's Claude 3.5 Sonnet is used as the large language model. AssemblyAI has generously provided API credits for the tutorial and has agreed to provide $50 free credits to viewers of this video: 🔑 Get your AssemblyAI API key https://www.assemblyai.com/?utm_source=youtube&utm_medium=influencer&utm_campaign=dataprofessor_aug24 📖 AssemblyAI Docs: https://www.assemblyai.com/docs/?utm_source=youtube&utm_medium=influencer&utm_campaign=dataprofessor_aug24 🐙 Code https://github.com/dataprofessor/assemblyai/ ✨ Read Blog https://dataprofessor.beehiiv.com/p/i-coded-a-youtube-ai-assistant-that-boosted-my-productivity ---------------------------- Support my work: 👪 Join as Channel Member: https://www.youtube.com/channel/UCV8e2g4IWQqK71bbzGDEI4Q/join ✉️ Newsletter http://newsletter.dataprofessor.org 📖 Join Medium to Read my Blogs https://data-professor.medium.com/membership ☕ Buy me a coffee https://www.buymeacoffee.com/dataprofessor Recommended Resources 📚 Books https://kit.co/dataprofessor 😎 Taro (Tech Career Mentorship) https://www.jointaro.com/r/dataprofessor/ 📜 Google Data Analytics Professional Certificate https://click.linksynergy.com/deeplink?id=PNeWWakF7rI&mid=40328&murl=https%3A%2F%2Fwww.coursera.org%2Fprofessional-certificates%2Fgoogle-data-analytics 🤔 Interview Query https://www.interviewquery.com/?ref=dataprofessor 🖥️ Stock photos, graphics and videos used on this channel https://1.envato.market/c/2346717/628379/4662 Subscribe: 🌟 Coding Professor https://www.youtube.com/channel/UCJzlfIoF8nmWqJIv_iWQVRw?sub_confirmation=1 🌟 Data Professor https://www.youtube.com/dataprofessor?sub_confirmation=1 Disclaimer: Recomm
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Data Professor · Data Professor · 0 of 60

← Previous Next →
1 How a Biologist became a Data Scientist
How a Biologist became a Data Scientist
Data Professor
2 WEKA Tutorial #1.1 - How to Build a Data Mining Model from Scratch
WEKA Tutorial #1.1 - How to Build a Data Mining Model from Scratch
Data Professor
3 WEKA Tutorial #1.2 - How to Build a Data Mining Model from Scratch
WEKA Tutorial #1.2 - How to Build a Data Mining Model from Scratch
Data Professor
4 WEKA Tutorial #1.3 - How to Build a Data Mining Model from Scratch
WEKA Tutorial #1.3 - How to Build a Data Mining Model from Scratch
Data Professor
5 Computational Drug Discovery: Machine Learning for Making Sense of Big Data in Drug Discovery
Computational Drug Discovery: Machine Learning for Making Sense of Big Data in Drug Discovery
Data Professor
6 Quotes #1 on Big Data and Data Science
Quotes #1 on Big Data and Data Science
Data Professor
7 Quotes #2 on Big Data and Data Science
Quotes #2 on Big Data and Data Science
Data Professor
8 Quotes #3 on Big Data and Data Science
Quotes #3 on Big Data and Data Science
Data Professor
9 Quotes #4 on Big Data and Data Science
Quotes #4 on Big Data and Data Science
Data Professor
10 Quotes #5 on Big Data and Data Science
Quotes #5 on Big Data and Data Science
Data Professor
11 Data Science 101: Starting a Data Science / Data Mining Project
Data Science 101: Starting a Data Science / Data Mining Project
Data Professor
12 Data Science 101: CRISP-DM - Data Mining / Data Science in 6 Steps
Data Science 101: CRISP-DM - Data Mining / Data Science in 6 Steps
Data Professor
13 R Programming 101: How to Define Variables
R Programming 101: How to Define Variables
Data Professor
14 R Programming 101: Read and Write CSV files
R Programming 101: Read and Write CSV files
Data Professor
15 Data Science 101: Basic Command-Line for Data Science
Data Science 101: Basic Command-Line for Data Science
Data Professor
16 Strategies for Learning Data Science in 2020 (Data Science 101)
Strategies for Learning Data Science in 2020 (Data Science 101)
Data Professor
17 Building your Data Science Portfolio with GitHub (Data Science 101)
Building your Data Science Portfolio with GitHub (Data Science 101)
Data Professor
18 R Programming 101: Setting up R programming environment (R, RStudio and RStudio.cloud)
R Programming 101: Setting up R programming environment (R, RStudio and RStudio.cloud)
Data Professor
19 Exploratory Data Analysis in R: Towards Data Understanding
Exploratory Data Analysis in R: Towards Data Understanding
Data Professor
20 Exploratory Data Analysis in R: Quick Dive into Data Visualization
Exploratory Data Analysis in R: Quick Dive into Data Visualization
Data Professor
21 Machine Learning in R: Building a Classification Model
Machine Learning in R: Building a Classification Model
Data Professor
22 Machine Learning in R: Repurpose Machine Learning Code for New Data
Machine Learning in R: Repurpose Machine Learning Code for New Data
Data Professor
23 Data Science 101: Deploying your Machine Learning Model
Data Science 101: Deploying your Machine Learning Model
Data Professor
24 Machine Learning in R: Deploy Machine Learning Model using RDS
Machine Learning in R: Deploy Machine Learning Model using RDS
Data Professor
25 Data Pre-processing in R: Handling Missing Data
Data Pre-processing in R: Handling Missing Data
Data Professor
26 Machine Learning in R: Speed up Model Building with Parallel Computing
Machine Learning in R: Speed up Model Building with Parallel Computing
Data Professor
27 Data Science 101: Overview of Machine Learning Model Building Process
Data Science 101: Overview of Machine Learning Model Building Process
Data Professor
28 Web Apps in R: Building your First Web Application in R | Shiny Tutorial Ep 1
Web Apps in R: Building your First Web Application in R | Shiny Tutorial Ep 1
Data Professor
29 Web Apps in R: Build Interactive Histogram Web Application in R | Shiny Tutorial Ep 2
Web Apps in R: Build Interactive Histogram Web Application in R | Shiny Tutorial Ep 2
Data Professor
30 Web Apps in R: Building Data-Driven Web Application in R | Shiny Tutorial Ep 3
Web Apps in R: Building Data-Driven Web Application in R | Shiny Tutorial Ep 3
Data Professor
31 Web Apps in R: Building the Machine Learning Web Application in R | Shiny Tutorial Ep 4
Web Apps in R: Building the Machine Learning Web Application in R | Shiny Tutorial Ep 4
Data Professor
32 Web Apps in R: Build BMI Calculator web application in R for health monitoring | Shiny Tutorial Ep 5
Web Apps in R: Build BMI Calculator web application in R for health monitoring | Shiny Tutorial Ep 5
Data Professor
33 Machine Learning in R: Building a Linear Regression Model
Machine Learning in R: Building a Linear Regression Model
Data Professor
34 What programming language to learn for Data Science? R versus Python
What programming language to learn for Data Science? R versus Python
Data Professor
35 How to Become a Data Scientist (Learning Path and Skill Sets Needed)
How to Become a Data Scientist (Learning Path and Skill Sets Needed)
Data Professor
36 Using Python in R
Using Python in R
Data Professor
37 Interpretable Machine Learning Models
Interpretable Machine Learning Models
Data Professor
38 Making Scatter Plots in R [Data Visualisation in R series]
Making Scatter Plots in R [Data Visualisation in R series]
Data Professor
39 Machine Learning in Python: Building a Classification Model
Machine Learning in Python: Building a Classification Model
Data Professor
40 Compare Machine Learning Classifiers in Python
Compare Machine Learning Classifiers in Python
Data Professor
41 Hyperparameter Tuning of Machine Learning Model in Python
Hyperparameter Tuning of Machine Learning Model in Python
Data Professor
42 Practical Introduction to Google Colab for Data Science
Practical Introduction to Google Colab for Data Science
Data Professor
43 File Handling in Google Colab for Data Science
File Handling in Google Colab for Data Science
Data Professor
44 Pandas for Data Science: Create and Combine DataFrames / Rename Columns
Pandas for Data Science: Create and Combine DataFrames / Rename Columns
Data Professor
45 Machine Learning in Python: Building a Linear Regression Model
Machine Learning in Python: Building a Linear Regression Model
Data Professor
46 Machine Learning in Python: Principal Component Analysis (PCA) for Handling High-Dimensional Data
Machine Learning in Python: Principal Component Analysis (PCA) for Handling High-Dimensional Data
Data Professor
47 How to Plot an ROC Curve in Python | Machine Learning in Python
How to Plot an ROC Curve in Python | Machine Learning in Python
Data Professor
48 Installing conda on Google Colab for Data Science
Installing conda on Google Colab for Data Science
Data Professor
49 Use native R on Google Colab for Data Science
Use native R on Google Colab for Data Science
Data Professor
50 How to Save and Download files from Google Colab
How to Save and Download files from Google Colab
Data Professor
51 Easy Web Scraping in Python using Pandas for Data Science
Easy Web Scraping in Python using Pandas for Data Science
Data Professor
52 Data Science for Computational Drug Discovery using Python (Part 1)
Data Science for Computational Drug Discovery using Python (Part 1)
Data Professor
53 Pandas Profiling for Data Science (Quick and Easy Exploratory Data Analysis)
Pandas Profiling for Data Science (Quick and Easy Exploratory Data Analysis)
Data Professor
54 Exploratory Data Analysis in Python using pandas
Exploratory Data Analysis in Python using pandas
Data Professor
55 Quick tour of PyCaret (a low-code machine learning library in Python)
Quick tour of PyCaret (a low-code machine learning library in Python)
Data Professor
56 How to Upload Files to Google Colab
How to Upload Files to Google Colab
Data Professor
57 How to Install and Use Pandas Profiling on Google Colab
How to Install and Use Pandas Profiling on Google Colab
Data Professor
58 How to Adjust the Style of Pandas DataFrame
How to Adjust the Style of Pandas DataFrame
Data Professor
59 How to use Bamboolib for Data Wrangling in Data Science
How to use Bamboolib for Data Wrangling in Data Science
Data Professor
60 How to use Pandas Profiling on Kaggle
How to use Pandas Profiling on Kaggle
Data Professor

This video teaches you how to build a Python workflow to extract knowledge from YouTube videos using AssemblyAI's LeMU and Anthropic's Claude 3.5 Sonnet large language model. You will learn how to extract and download audio from YouTube videos, transcribe the audio into text form, and answer questions about the video using a large language model.

Key Takeaways
  1. Extract and download audio from a YouTube video
  2. Transcribe the audio into text form using AssemblyAI's API
  3. Use LeMU to answer questions about the video
  4. Integrate the workflow into a Python script
💡 Using a large language model like LeMU can significantly improve the accuracy of question answering and knowledge extraction from YouTube videos.

Related AI Lessons

The 2026 AI Model Release Race: Every Major LLM Launch You Need to Know
Stay updated on the 2026 AI model release race, including major LLM launches like Claude Sonnet 5 and GPT-5.6, to leverage the latest advancements in AI technology
Dev.to AI
Call GPT, Claude, and Gemini from one API key — a 3-step setup
Access GPT, Claude, and Gemini through one API key with a 3-step setup using Modelishub
Dev.to AI
Your LLM Doesn’t Pick Stocks — It Remembers Them
Discover how LLMs remember stock picks rather than making actual predictions, and why this matters for AI-driven investment strategies
Medium · Machine Learning
Word Representation
Learn how word representation works in NLP and its importance in understanding human language, enabling applications like text classification and language translation
Medium · NLP
Up next
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Dave Ebbelaar (LLM Eng)
Watch →