Coding an AI Voice Bot from Scratch: Real-Time Conversation with Python

AssemblyAI · Intermediate ·🧠 Large Language Models ·2y ago

Key Takeaways

This video demonstrates how to build a real-time AI voice assistant using Python, leveraging AssemblyAI for speech-to-text transcription, OpenAI for text response generation, and 11 Labs for audio generation. The project utilizes various tools such as Port Audio, Brew, and pip, and requires API keys for AssemblyAI, OpenAI, and 11 Labs.

Full Transcript

in this video I'll show you how to build an AI voice bot in Python it will be able to understand realtime audio input and at the same time generate real-time audio responses here's the scenario where our AI voice bot is working at a dental clinic thank you for calling Vancouver dental clinic my name is Sandy how may I assist you hi Sandy my name is Smitha and I would like to book an appointment with the denst tomorrow hello Smitha I can definitely help you with that let me check our schedule for availability could you please tell me your preferred time for the appointment tomorrow of course uh I would like to meet the dentist tomorrow at 12:00 noon great choice Smitha I have an opening at noon tomorrow with the dentist shall I go ahead and book that appointment for you that would be perfect thanks Sandy you're welcome Smitha I have successfully booked your appointment with the dentist for tomorrow at noon please make sure to bring your insurance information there are four steps involved in building our AI voice bot first off is installing all of the necessary python libraries so that is assembly AI open Ai and 11 Labs assembly AI is going to be used for accurate realtime speech to text transcription that means that whatever you're saying in real time is going to be transcribed and once that is transcribed that transcript is passed to open AI where we'll be generating text responses of how a dental assistant would respond once we get that text response from openi we'll pass that text response to 11 Labs where audio is going to be generated and that is exactly how our AI voice bot will work first off let's start by installing all of the necessary python libraries I've went ahead and created a voice virtual environment in my project folder once I've activated that now I'm going to install all the necessary libraries so I'm going to start off with Port audio so Brew install Port audio and also we are going to do pip install assembly AI extras and then we're going to do pip install 11 labs followed by Brew install MP and lastly we're also going to install open aai all of these commands and the code for this project will be in the description box below so do check out the GitHub link first let's start off by importing the python libraries that we've just downloaded so let's start off with assembly AI uh um after that we're going to import 11 Labs specifically we're importing the generate function and stream function and then also let's import open AI once we've done that let's create a class called AI assistant next we're going to initialize this class most importantly we need the API keys for all three of these services that we're using to get an assembly AI API key click on the link in the description box below once you've created API keys for all three of these Services you can declare them here let's do assembly AI do settings dot API key equals to API key and this is where you can enter your assembly AI API key once you've done that let's also Define the open AI API key once you have all three API Keys defined let's go ahead and create an empty transcriber object after we've done this let's also create a list containing the full transcripts of everything that we're saying and also what the AI assistant is saying as well as well so let's do self. full transcript before the conversation starts we want full transcript to only include a single thing which is the prompt that we want to give to openi so let's start writing that prompt we'll first have to define the role as system and and once we've done that we also have to Define content and this is where we write our prompt so let's write you are a receptionist at a dental clinic be resourceful and efficient so that's all our prompt will contain and that is what our full transcript list will contain this full transcript list is actually really important because every time that we communicate to open AI API we will be sending a full transcript of whatever has been said by you and also by the voicebot so it's really important that you follow this specific format next we can move on to step number two which is real-time transcription with assembly AI the first thing you want to do is create a method called start transcription in start transcription we will now create a transcriber object and store it into the transcriber uh variable that we've just created so self. transcriber object equals to assembly I do realtime transcriber we'll also set the sample rate to 16,000 and we want to define a few different methods lastly we want to define something called and aeran Silent threshold and you want to set this to a th000 this defines the time in which the program will actually wait before determining that you have ended a sentence when you're talking in real time so what this code does is it connects your microphone and streams data to assembly AI API next up we'll Define a method called stop transcription what this method does is it closes the transcriber and it sets It To None again next we need to Define these four methods on data on error on open and on close these four methods Define how the realtime transcriber works so let's head on over to assem assembly a documentation in order to do so so inside of assembly a documentation for realtime streaming we're want to look at this first code example what we want to do is copy this four functions right here which we need for our code so go on over and copy this once you've done that let's head on back to vs code and paste this there when you're pasting the code ensure that it's aligned once you've done that we have to make a few changes to the code that we've just pasted first off let's change the parameters to import self in each of these methods once you've done that what I want to do is actually comment out this code in on open because I don't want to actually print out anything in terminal besides the transcripts and instead I'm just going to write return I'm also going to be doing the same thing for the methods called on error and on close we also want to make some changes to the on data method so the on data method is really important because we actually get to Define what we want to do with the real-time transcript which is coming in from assembly ai's API so in the second if statement right here is where we actually receive the final real-time transcript that means that whenever you finish saying a sentence that entire sentence is actually being printed out or sent to you right here instead instead of printing it out what I want to do is send that over to a new method called generate AI response which we will be defining and the parameter for this will be the transcript we are now at step three where we're going to write code to pass the realtime transcript to open ai's API we'll start off by writing a function called generate AI response here the parameters will be self and transcript the very first thing that we're going to do in this method is called the stop transcription method the reason why we're doing that is because we want to pause the realtime transcription stream while we are passing and communicating with openi API so let's do self. stop transcription after which we want to now add our real-time transcript to our full transcript list next we also want to print out our real-time transcript which the user has just said now we're ready to pass this transcript directly to openi API for the model we're going to be making use of GPT 3.5 Turbo and for messages we are going to be passing the full transcript after which let's define a parameter called AI response AI response is equals to response do choices what this line of code does is it retrieves the response from open eyes API and stores it into AI response and at this point what we can do is we can go ahead and generate audio so that's exactly what we'll do we'll do self do generate audio and this is a method that we now have to go ahead and create and we're going to pass AI response as a parameter at this point once we have generated audio we can go ahead and restart the real-time transcription so you can continue having that conversation so what we want to do is now call the start transcription function at this point we're at the last and final step where we'll be generating audio with 11 laps so we're going to create a method called generate audio and the parameters will be self and text this text right here is actually the response from open ai's API and the first thing that we want to do is add that into full transcript next we also want to print out this text saying that it is actually from the AI assistant next we have to write the code to send a request to 11 Labs API and we'll be making use of the generate function that we imported at the beginning of this for voice I'm going to be selecting Rachel but there's a bunch of different voices available on 11 Labs which you can feel free to browse and select the ones that you want and I'm also setting the stream parameter to through and I'm going to call the stream function and pass this audio stream so this is the end of the generate audio method next we're actually going to define the start and end of our project we'll start off by defining the initial greeting that our AI voicebot has to say so we'll say thank you for calling Vancouver Dental Clinic my name is Sandy how may I assist you so this is the initial greeting which our AI voice bot will read out to us before starting our real-time transcription passing it to open Ai and then generating more audio now let us initialize the class AI assistant and the first thing that we want to do is call the generate audio method and pass greeting inside after which we also want to call the start transcription function at this point you can hit save and start running this project now I can go into terminal and run our python file thank you for calling Vancouver dental clinic my name is Sandy how may I assist you hi Sandy I'm Smitha and I'd like to book an appointment with Dr Lee tomorrow hello Smitha I'm happy to help you with that let me check Dr Lee's availability for tomorrow could you please tell me your preferred time for the appointment uh I would like to book it at 3 p.m. tomorrow I'm sorry but Dr Lee is fully booked tomorrow afternoon however we do have availability at 10: a.m. or 1 p.m. would any of these times work for you Smitha yes uh 1 p.m. actually works for me Sandy great I've successfully scheduled your appointment with Dr Lee for tomorrow at 1 p.m. can I have your phone number to confirm the appointment Smitha yes my phone number is 1 2 3 4 5 6 7 8 9 thank you Smitha I have your phone number as 1 123456789 you will receive a confirmation call or text shortly if you have any other questions or need further assistance feel free to let me know thank you for choosing Vancouver dental clinic check out this next video to learn how to transcribe a live phone call in Python using assembly Ai and twio

Original Description

🔑 Get your AssemblyAI API key here: https://www.assemblyai.com/?utm_source=youtube&utm_medium=referral&utm_campaign=yt_smit_17 Learn how to build a real-time AI voice assistant using Python that can handle incoming calls, transcribe speech, generate intelligent responses, and provide a human-like conversational experience. Perfect for call centers, customer support, and virtual receptionist applications. In this coding tutorial, you'll integrate multiple cutting-edge technologies, including: 1. Assemblyai Speech-to-Text API for accurate real-time transcription. 2. OpenAI's powerful language models for natural language processing (NLP) and response generation. 3. ElevenLabs' AI voice synthesis to convert text responses into natural-sounding audio. Step-by-step, you'll create a Python application that seamlessly combines these APIs, enabling your AI assistant to listen to incoming audio, comprehend the speech, formulate contextual responses, and communicate back with synthesized voice in real-time. Github code: https://github.com/smithakolan/AssemblyAI-AI-Voice-Bot/ Timestamps: 00:00 - Intro & Demo of application 01:10 - Outline of application 01:58 - Step 1: download python libraries 06:21 - Step 1: Streaming Speech-to-Text with AssemblyAI 12:11 - Step 3: OpenAI Chat completion 15:32 - Step 4: Generate Human-like audio with Elevenlabs 18:48 - Running our AI Call Assistant #AIVoiceAssistant #RealTimeSpeechRecognition #NaturalLanguageProcessing #AIVoiceSynthesis #PythonTutorial #CallCenterAutomation #VoiceBot #StreamingSpeechtoText ▬▬▬▬▬▬▬▬▬▬▬▬ CONNECT ▬▬▬▬▬▬▬▬▬▬▬▬ 🖥️ Website: https://www.assemblyai.com 🐦 Twitter: https://twitter.com/AssemblyAI 🦾 Discord: https://discord.gg/Cd8MyVJAXd ▶️ Subscribe: https://www.youtube.com/c/AssemblyAI?sub_confirmation=1 🔥 We're hiring! Check our open roles: https://www.assemblyai.com/careers ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ #MachineLearning #DeepLearning
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from AssemblyAI · AssemblyAI · 0 of 60

← Previous Next →
1 Python Speech Recognition in 5 Minutes
Python Speech Recognition in 5 Minutes
AssemblyAI
2 Python Click Part 1 of 4
Python Click Part 1 of 4
AssemblyAI
3 Python Click Part 2 of 4
Python Click Part 2 of 4
AssemblyAI
4 Python Click Part 3 of 4
Python Click Part 3 of 4
AssemblyAI
5 Python Click Part 4 of 4
Python Click Part 4 of 4
AssemblyAI
6 Deep learning in 5 minutes | What is deep learning?
Deep learning in 5 minutes | What is deep learning?
AssemblyAI
7 How to make a web app that transcribes YouTube videos with Streamlit | Part 1
How to make a web app that transcribes YouTube videos with Streamlit | Part 1
AssemblyAI
8 How to make a web app that transcribes YouTube videos with Streamlit | Part 2
How to make a web app that transcribes YouTube videos with Streamlit | Part 2
AssemblyAI
9 Batch normalization | What it is and how to implement it
Batch normalization | What it is and how to implement it
AssemblyAI
10 Real-time Speech Recognition in 15 minutes with AssemblyAI
Real-time Speech Recognition in 15 minutes with AssemblyAI
AssemblyAI
11 Regularization in a Neural Network | Dealing with overfitting
Regularization in a Neural Network | Dealing with overfitting
AssemblyAI
12 Add speech recognition to your Streamlit apps in 5 minutes
Add speech recognition to your Streamlit apps in 5 minutes
AssemblyAI
13 Transformers for beginners | What are they and how do they work
Transformers for beginners | What are they and how do they work
AssemblyAI
14 Automatic Chapter Detection With AssemblyAI | Python Tutorial
Automatic Chapter Detection With AssemblyAI | Python Tutorial
AssemblyAI
15 Deep Learning Series Part 1 - What is Deep Learning?
Deep Learning Series Part 1 - What is Deep Learning?
AssemblyAI
16 Deep Learning Series part 2 - Why is it called “Deep Learning”?
Deep Learning Series part 2 - Why is it called “Deep Learning”?
AssemblyAI
17 Activation Functions In Neural Networks Explained | Deep Learning Tutorial
Activation Functions In Neural Networks Explained | Deep Learning Tutorial
AssemblyAI
18 Deep Learning Series part 3 - Deep Learning vs. Machine Learning
Deep Learning Series part 3 - Deep Learning vs. Machine Learning
AssemblyAI
19 Deep Learning Series part 4 - Why is Deep Learning better for NLP?
Deep Learning Series part 4 - Why is Deep Learning better for NLP?
AssemblyAI
20 Intro to Batch Normalization Part 1
Intro to Batch Normalization Part 1
AssemblyAI
21 Intro to Batch Normalization Part 2
Intro to Batch Normalization Part 2
AssemblyAI
22 Intro to Batch Normalization Part 3 - What is Normalization?
Intro to Batch Normalization Part 3 - What is Normalization?
AssemblyAI
23 Intro to Batch Normalization Part 4
Intro to Batch Normalization Part 4
AssemblyAI
24 Intro to Batch Normalization Part 5
Intro to Batch Normalization Part 5
AssemblyAI
25 Sentiment Analysis for Earnings Calls with AssemblyAI
Sentiment Analysis for Earnings Calls with AssemblyAI
AssemblyAI
26 Summarizing my favorite podcasts with Python
Summarizing my favorite podcasts with Python
AssemblyAI
27 Introduction to Regularization
Introduction to Regularization
AssemblyAI
28 How/Why Regularization in Neural Networks?
How/Why Regularization in Neural Networks?
AssemblyAI
29 Getting Started With Torchaudio | PyTorch Tutorial
Getting Started With Torchaudio | PyTorch Tutorial
AssemblyAI
30 Types of Regularization
Types of Regularization
AssemblyAI
31 Tuning Alpha in L1 and L2 Regularization
Tuning Alpha in L1 and L2 Regularization
AssemblyAI
32 Dropout Regularization
Dropout Regularization
AssemblyAI
33 What is GPT-3 and how does it work? | A Quick Review
What is GPT-3 and how does it work? | A Quick Review
AssemblyAI
34 Backpropagation For Neural Networks Explained | Deep Learning Tutorial
Backpropagation For Neural Networks Explained | Deep Learning Tutorial
AssemblyAI
35 Jupyter Notebooks Tutorial | How to use them & tips and tricks!
Jupyter Notebooks Tutorial | How to use them & tips and tricks!
AssemblyAI
36 Best Free Speech-To-Text APIs and Open Source Libraries
Best Free Speech-To-Text APIs and Open Source Libraries
AssemblyAI
37 Regularization - Early stopping
Regularization - Early stopping
AssemblyAI
38 Regularization - Data Augmentation
Regularization - Data Augmentation
AssemblyAI
39 Bias and Variance for Machine Learning | Deep Learning
Bias and Variance for Machine Learning | Deep Learning
AssemblyAI
40 Recurrent Neural Networks (RNNs) Explained - Deep Learning
Recurrent Neural Networks (RNNs) Explained - Deep Learning
AssemblyAI
41 What is BERT and how does it work? | A Quick Review
What is BERT and how does it work? | A Quick Review
AssemblyAI
42 Introduction to Transformers
Introduction to Transformers
AssemblyAI
43 Transformers | What is attention?
Transformers | What is attention?
AssemblyAI
44 Transformers | how attention relates to Transformers
Transformers | how attention relates to Transformers
AssemblyAI
45 Transformers | Basics of Transformers
Transformers | Basics of Transformers
AssemblyAI
46 Supervised Machine Learning Explained For Beginners
Supervised Machine Learning Explained For Beginners
AssemblyAI
47 Transformers | Basics of Transformers Encoders
Transformers | Basics of Transformers Encoders
AssemblyAI
48 Transformers | Basics of Transformers I/O
Transformers | Basics of Transformers I/O
AssemblyAI
49 How to evaluate ML models | Evaluation metrics for machine learning
How to evaluate ML models | Evaluation metrics for machine learning
AssemblyAI
50 Unsupervised Machine Learning Explained For Beginners
Unsupervised Machine Learning Explained For Beginners
AssemblyAI
51 Weight Initialization for Deep Feedforward Neural Networks
Weight Initialization for Deep Feedforward Neural Networks
AssemblyAI
52 Q-Learning Explained - Reinforcement Learning Tutorial
Q-Learning Explained - Reinforcement Learning Tutorial
AssemblyAI
53 Should You Use PyTorch or TensorFlow in 2022?
Should You Use PyTorch or TensorFlow in 2022?
AssemblyAI
54 What is Layer Normalization? | Deep Learning Fundamentals
What is Layer Normalization? | Deep Learning Fundamentals
AssemblyAI
55 I created a Python App to study FASTER
I created a Python App to study FASTER
AssemblyAI
56 How to create your FIRST NEURAL NETWORK with TensorFlow!
How to create your FIRST NEURAL NETWORK with TensorFlow!
AssemblyAI
57 Neural Networks Summary: All hyperparameters
Neural Networks Summary: All hyperparameters
AssemblyAI
58 Getting Started with OpenAI API and GPT-3 | Beginner Python Tutorial
Getting Started with OpenAI API and GPT-3 | Beginner Python Tutorial
AssemblyAI
59 Convert Speech-To-Text In Python in 60 seconds!
Convert Speech-To-Text In Python in 60 seconds!
AssemblyAI
60 Gradient Clipping for Neural Networks | Deep Learning Fundamentals
Gradient Clipping for Neural Networks | Deep Learning Fundamentals
AssemblyAI

This video teaches you how to build a real-time AI voice assistant using Python, covering topics such as speech-to-text transcription, text response generation, and audio generation. By following the steps outlined in the video, you can create a conversational AI system that can handle incoming calls and provide human-like responses.

Key Takeaways
  1. Install necessary Python libraries
  2. Create a voice virtual environment
  3. Activate the virtual environment
  4. Install Port Audio
  5. Install AssemblyAI with extras
  6. Create a method called start transcription
  7. Define a silent threshold
  8. Copy and modify four functions from AssemblyAI documentation
  9. Stop transcription
  10. Define methods for on data, on error, on open, and on close
💡 The key to building a successful real-time AI voice assistant is to integrate multiple AI APIs and fine-tune the models for specific tasks, such as speech-to-text transcription and text response generation.

Related Reads

📰
Sonnet 5 vs GLM-5.2 vs everyone: how to pick the cheapest LLM API in 2026
Learn how to choose the cheapest LLM API in 2026 by comparing pricing models of Sonnet 5 and GLM-5.2
Dev.to AI
📰
JSON-Schema masks can block needed tool calls
Learn how JSON-Schema masks can block needed tool calls in LLM agents and how to sidestep the problem with a two-pass inference hack
Dev.to AI
📰
The Invisible Cage: What the Evolution from Claude Sonnet 4.6
Explore the evolution of AI models like Claude Sonnet 4.6 and its implications on AI development
Medium · AI
📰
The Best Vector Database in 2026: Qdrant vs Pinecone vs Weaviate vs Milvus vs pgvector
Learn how to choose the best vector database for your RAG system in 2026, comparing Qdrant, Pinecone, Weaviate, Milvus, and pgvector
Dev.to · Darshit Radadiya

Chapters (7)

Intro & Demo of application
1:10 Outline of application
1:58 Step 1: download python libraries
6:21 Step 1: Streaming Speech-to-Text with AssemblyAI
12:11 Step 3: OpenAI Chat completion
15:32 Step 4: Generate Human-like audio with Elevenlabs
18:48 Running our AI Call Assistant
Up next
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Dave Ebbelaar (LLM Eng)
Watch →