Coding an AI Voice Bot from Scratch: Real-Time Conversation with Python
Key Takeaways
This video demonstrates how to build a real-time AI voice assistant using Python, leveraging AssemblyAI for speech-to-text transcription, OpenAI for text response generation, and 11 Labs for audio generation. The project utilizes various tools such as Port Audio, Brew, and pip, and requires API keys for AssemblyAI, OpenAI, and 11 Labs.
Full Transcript
in this video I'll show you how to build an AI voice bot in Python it will be able to understand realtime audio input and at the same time generate real-time audio responses here's the scenario where our AI voice bot is working at a dental clinic thank you for calling Vancouver dental clinic my name is Sandy how may I assist you hi Sandy my name is Smitha and I would like to book an appointment with the denst tomorrow hello Smitha I can definitely help you with that let me check our schedule for availability could you please tell me your preferred time for the appointment tomorrow of course uh I would like to meet the dentist tomorrow at 12:00 noon great choice Smitha I have an opening at noon tomorrow with the dentist shall I go ahead and book that appointment for you that would be perfect thanks Sandy you're welcome Smitha I have successfully booked your appointment with the dentist for tomorrow at noon please make sure to bring your insurance information there are four steps involved in building our AI voice bot first off is installing all of the necessary python libraries so that is assembly AI open Ai and 11 Labs assembly AI is going to be used for accurate realtime speech to text transcription that means that whatever you're saying in real time is going to be transcribed and once that is transcribed that transcript is passed to open AI where we'll be generating text responses of how a dental assistant would respond once we get that text response from openi we'll pass that text response to 11 Labs where audio is going to be generated and that is exactly how our AI voice bot will work first off let's start by installing all of the necessary python libraries I've went ahead and created a voice virtual environment in my project folder once I've activated that now I'm going to install all the necessary libraries so I'm going to start off with Port audio so Brew install Port audio and also we are going to do pip install assembly AI extras and then we're going to do pip install 11 labs followed by Brew install MP and lastly we're also going to install open aai all of these commands and the code for this project will be in the description box below so do check out the GitHub link first let's start off by importing the python libraries that we've just downloaded so let's start off with assembly AI uh um after that we're going to import 11 Labs specifically we're importing the generate function and stream function and then also let's import open AI once we've done that let's create a class called AI assistant next we're going to initialize this class most importantly we need the API keys for all three of these services that we're using to get an assembly AI API key click on the link in the description box below once you've created API keys for all three of these Services you can declare them here let's do assembly AI do settings dot API key equals to API key and this is where you can enter your assembly AI API key once you've done that let's also Define the open AI API key once you have all three API Keys defined let's go ahead and create an empty transcriber object after we've done this let's also create a list containing the full transcripts of everything that we're saying and also what the AI assistant is saying as well as well so let's do self. full transcript before the conversation starts we want full transcript to only include a single thing which is the prompt that we want to give to openi so let's start writing that prompt we'll first have to define the role as system and and once we've done that we also have to Define content and this is where we write our prompt so let's write you are a receptionist at a dental clinic be resourceful and efficient so that's all our prompt will contain and that is what our full transcript list will contain this full transcript list is actually really important because every time that we communicate to open AI API we will be sending a full transcript of whatever has been said by you and also by the voicebot so it's really important that you follow this specific format next we can move on to step number two which is real-time transcription with assembly AI the first thing you want to do is create a method called start transcription in start transcription we will now create a transcriber object and store it into the transcriber uh variable that we've just created so self. transcriber object equals to assembly I do realtime transcriber we'll also set the sample rate to 16,000 and we want to define a few different methods lastly we want to define something called and aeran Silent threshold and you want to set this to a th000 this defines the time in which the program will actually wait before determining that you have ended a sentence when you're talking in real time so what this code does is it connects your microphone and streams data to assembly AI API next up we'll Define a method called stop transcription what this method does is it closes the transcriber and it sets It To None again next we need to Define these four methods on data on error on open and on close these four methods Define how the realtime transcriber works so let's head on over to assem assembly a documentation in order to do so so inside of assembly a documentation for realtime streaming we're want to look at this first code example what we want to do is copy this four functions right here which we need for our code so go on over and copy this once you've done that let's head on back to vs code and paste this there when you're pasting the code ensure that it's aligned once you've done that we have to make a few changes to the code that we've just pasted first off let's change the parameters to import self in each of these methods once you've done that what I want to do is actually comment out this code in on open because I don't want to actually print out anything in terminal besides the transcripts and instead I'm just going to write return I'm also going to be doing the same thing for the methods called on error and on close we also want to make some changes to the on data method so the on data method is really important because we actually get to Define what we want to do with the real-time transcript which is coming in from assembly ai's API so in the second if statement right here is where we actually receive the final real-time transcript that means that whenever you finish saying a sentence that entire sentence is actually being printed out or sent to you right here instead instead of printing it out what I want to do is send that over to a new method called generate AI response which we will be defining and the parameter for this will be the transcript we are now at step three where we're going to write code to pass the realtime transcript to open ai's API we'll start off by writing a function called generate AI response here the parameters will be self and transcript the very first thing that we're going to do in this method is called the stop transcription method the reason why we're doing that is because we want to pause the realtime transcription stream while we are passing and communicating with openi API so let's do self. stop transcription after which we want to now add our real-time transcript to our full transcript list next we also want to print out our real-time transcript which the user has just said now we're ready to pass this transcript directly to openi API for the model we're going to be making use of GPT 3.5 Turbo and for messages we are going to be passing the full transcript after which let's define a parameter called AI response AI response is equals to response do choices what this line of code does is it retrieves the response from open eyes API and stores it into AI response and at this point what we can do is we can go ahead and generate audio so that's exactly what we'll do we'll do self do generate audio and this is a method that we now have to go ahead and create and we're going to pass AI response as a parameter at this point once we have generated audio we can go ahead and restart the real-time transcription so you can continue having that conversation so what we want to do is now call the start transcription function at this point we're at the last and final step where we'll be generating audio with 11 laps so we're going to create a method called generate audio and the parameters will be self and text this text right here is actually the response from open ai's API and the first thing that we want to do is add that into full transcript next we also want to print out this text saying that it is actually from the AI assistant next we have to write the code to send a request to 11 Labs API and we'll be making use of the generate function that we imported at the beginning of this for voice I'm going to be selecting Rachel but there's a bunch of different voices available on 11 Labs which you can feel free to browse and select the ones that you want and I'm also setting the stream parameter to through and I'm going to call the stream function and pass this audio stream so this is the end of the generate audio method next we're actually going to define the start and end of our project we'll start off by defining the initial greeting that our AI voicebot has to say so we'll say thank you for calling Vancouver Dental Clinic my name is Sandy how may I assist you so this is the initial greeting which our AI voice bot will read out to us before starting our real-time transcription passing it to open Ai and then generating more audio now let us initialize the class AI assistant and the first thing that we want to do is call the generate audio method and pass greeting inside after which we also want to call the start transcription function at this point you can hit save and start running this project now I can go into terminal and run our python file thank you for calling Vancouver dental clinic my name is Sandy how may I assist you hi Sandy I'm Smitha and I'd like to book an appointment with Dr Lee tomorrow hello Smitha I'm happy to help you with that let me check Dr Lee's availability for tomorrow could you please tell me your preferred time for the appointment uh I would like to book it at 3 p.m. tomorrow I'm sorry but Dr Lee is fully booked tomorrow afternoon however we do have availability at 10: a.m. or 1 p.m. would any of these times work for you Smitha yes uh 1 p.m. actually works for me Sandy great I've successfully scheduled your appointment with Dr Lee for tomorrow at 1 p.m. can I have your phone number to confirm the appointment Smitha yes my phone number is 1 2 3 4 5 6 7 8 9 thank you Smitha I have your phone number as 1 123456789 you will receive a confirmation call or text shortly if you have any other questions or need further assistance feel free to let me know thank you for choosing Vancouver dental clinic check out this next video to learn how to transcribe a live phone call in Python using assembly Ai and twio
Original Description
🔑 Get your AssemblyAI API key here: https://www.assemblyai.com/?utm_source=youtube&utm_medium=referral&utm_campaign=yt_smit_17
Learn how to build a real-time AI voice assistant using Python that can handle incoming calls, transcribe speech, generate intelligent responses, and provide a human-like conversational experience. Perfect for call centers, customer support, and virtual receptionist applications.
In this coding tutorial, you'll integrate multiple cutting-edge technologies, including:
1. Assemblyai Speech-to-Text API for accurate real-time transcription.
2. OpenAI's powerful language models for natural language processing (NLP) and response generation.
3. ElevenLabs' AI voice synthesis to convert text responses into natural-sounding audio.
Step-by-step, you'll create a Python application that seamlessly combines these APIs, enabling your AI assistant to listen to incoming audio, comprehend the speech, formulate contextual responses, and communicate back with synthesized voice in real-time.
Github code: https://github.com/smithakolan/AssemblyAI-AI-Voice-Bot/
Timestamps:
00:00 - Intro & Demo of application
01:10 - Outline of application
01:58 - Step 1: download python libraries
06:21 - Step 1: Streaming Speech-to-Text with AssemblyAI
12:11 - Step 3: OpenAI Chat completion
15:32 - Step 4: Generate Human-like audio with Elevenlabs
18:48 - Running our AI Call Assistant
#AIVoiceAssistant #RealTimeSpeechRecognition #NaturalLanguageProcessing #AIVoiceSynthesis #PythonTutorial #CallCenterAutomation #VoiceBot #StreamingSpeechtoText
▬▬▬▬▬▬▬▬▬▬▬▬ CONNECT ▬▬▬▬▬▬▬▬▬▬▬▬
🖥️ Website: https://www.assemblyai.com
🐦 Twitter: https://twitter.com/AssemblyAI
🦾 Discord: https://discord.gg/Cd8MyVJAXd
▶️ Subscribe: https://www.youtube.com/c/AssemblyAI?sub_confirmation=1
🔥 We're hiring! Check our open roles: https://www.assemblyai.com/careers
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
#MachineLearning #DeepLearning
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from AssemblyAI · AssemblyAI · 0 of 60
← Previous
Next →
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Python Speech Recognition in 5 Minutes
AssemblyAI
Python Click Part 1 of 4
AssemblyAI
Python Click Part 2 of 4
AssemblyAI
Python Click Part 3 of 4
AssemblyAI
Python Click Part 4 of 4
AssemblyAI
Deep learning in 5 minutes | What is deep learning?
AssemblyAI
How to make a web app that transcribes YouTube videos with Streamlit | Part 1
AssemblyAI
How to make a web app that transcribes YouTube videos with Streamlit | Part 2
AssemblyAI
Batch normalization | What it is and how to implement it
AssemblyAI
Real-time Speech Recognition in 15 minutes with AssemblyAI
AssemblyAI
Regularization in a Neural Network | Dealing with overfitting
AssemblyAI
Add speech recognition to your Streamlit apps in 5 minutes
AssemblyAI
Transformers for beginners | What are they and how do they work
AssemblyAI
Automatic Chapter Detection With AssemblyAI | Python Tutorial
AssemblyAI
Deep Learning Series Part 1 - What is Deep Learning?
AssemblyAI
Deep Learning Series part 2 - Why is it called “Deep Learning”?
AssemblyAI
Activation Functions In Neural Networks Explained | Deep Learning Tutorial
AssemblyAI
Deep Learning Series part 3 - Deep Learning vs. Machine Learning
AssemblyAI
Deep Learning Series part 4 - Why is Deep Learning better for NLP?
AssemblyAI
Intro to Batch Normalization Part 1
AssemblyAI
Intro to Batch Normalization Part 2
AssemblyAI
Intro to Batch Normalization Part 3 - What is Normalization?
AssemblyAI
Intro to Batch Normalization Part 4
AssemblyAI
Intro to Batch Normalization Part 5
AssemblyAI
Sentiment Analysis for Earnings Calls with AssemblyAI
AssemblyAI
Summarizing my favorite podcasts with Python
AssemblyAI
Introduction to Regularization
AssemblyAI
How/Why Regularization in Neural Networks?
AssemblyAI
Getting Started With Torchaudio | PyTorch Tutorial
AssemblyAI
Types of Regularization
AssemblyAI
Tuning Alpha in L1 and L2 Regularization
AssemblyAI
Dropout Regularization
AssemblyAI
What is GPT-3 and how does it work? | A Quick Review
AssemblyAI
Backpropagation For Neural Networks Explained | Deep Learning Tutorial
AssemblyAI
Jupyter Notebooks Tutorial | How to use them & tips and tricks!
AssemblyAI
Best Free Speech-To-Text APIs and Open Source Libraries
AssemblyAI
Regularization - Early stopping
AssemblyAI
Regularization - Data Augmentation
AssemblyAI
Bias and Variance for Machine Learning | Deep Learning
AssemblyAI
Recurrent Neural Networks (RNNs) Explained - Deep Learning
AssemblyAI
What is BERT and how does it work? | A Quick Review
AssemblyAI
Introduction to Transformers
AssemblyAI
Transformers | What is attention?
AssemblyAI
Transformers | how attention relates to Transformers
AssemblyAI
Transformers | Basics of Transformers
AssemblyAI
Supervised Machine Learning Explained For Beginners
AssemblyAI
Transformers | Basics of Transformers Encoders
AssemblyAI
Transformers | Basics of Transformers I/O
AssemblyAI
How to evaluate ML models | Evaluation metrics for machine learning
AssemblyAI
Unsupervised Machine Learning Explained For Beginners
AssemblyAI
Weight Initialization for Deep Feedforward Neural Networks
AssemblyAI
Q-Learning Explained - Reinforcement Learning Tutorial
AssemblyAI
Should You Use PyTorch or TensorFlow in 2022?
AssemblyAI
What is Layer Normalization? | Deep Learning Fundamentals
AssemblyAI
I created a Python App to study FASTER
AssemblyAI
How to create your FIRST NEURAL NETWORK with TensorFlow!
AssemblyAI
Neural Networks Summary: All hyperparameters
AssemblyAI
Getting Started with OpenAI API and GPT-3 | Beginner Python Tutorial
AssemblyAI
Convert Speech-To-Text In Python in 60 seconds!
AssemblyAI
Gradient Clipping for Neural Networks | Deep Learning Fundamentals
AssemblyAI
More on: LLM Foundations
View skill →Related Reads
📰
📰
📰
📰
Sonnet 5 vs GLM-5.2 vs everyone: how to pick the cheapest LLM API in 2026
Dev.to AI
JSON-Schema masks can block needed tool calls
Dev.to AI
The Invisible Cage: What the Evolution from Claude Sonnet 4.6
Medium · AI
The Best Vector Database in 2026: Qdrant vs Pinecone vs Weaviate vs Milvus vs pgvector
Dev.to · Darshit Radadiya
Chapters (7)
Intro & Demo of application
1:10
Outline of application
1:58
Step 1: download python libraries
6:21
Step 1: Streaming Speech-to-Text with AssemblyAI
12:11
Step 3: OpenAI Chat completion
15:32
Step 4: Generate Human-like audio with Elevenlabs
18:48
Running our AI Call Assistant
🎓
Tutor Explanation
DeepCamp AI