How to Scale Your AI Agent | Crawl, Walk, Run
Skills:
Agent Foundations90%Tool Use & Function Calling80%Multi-Agent Systems70%Autonomous Workflows70%
Key Takeaways
The video demonstrates Voiceflow's Crawl, Walk, Run concept for scaling AI agents, utilizing tools such as GPTs, custom GPTs, Microsoft Co-Pilot, Zapier, Voice Flow, and Voil, and techniques like Natural Language Understanding (NLU) and Large Language Models to improve accuracy and provide personalized experiences.
Full Transcript
you want an AI agent that can handle really complex queries across all of your different channels like this and not just in one place but across your app email website and phone but to get this you're going to need to know where to start we've created a map to help people start and scale their AI agents when building a voice flow the most powerful assistants can pull information from your databases combine that with other context and use different AI models for different tasks but if you're going to try and do this all at the beginning you're going to fail this map tells tells you how to get started and has three phases that we're going to walk through crawl walk and run and within each one of these phases we're going to walk through three Core Concepts that are going to be important understanding deciding and responding these three concepts are going to help you start simple in each phase and then gradually level up your assistant so you can build something complex before we dive in YouTube says that 75% of people who watch this aren't subscribed so if you want to know how to build the best AI agents in the world go ahead and hit that subscribe button below so if you're starting out on your AI agent Journey our recommendation is to start with one use case and one function launch it into production and then start to slowly increase the complexity over time start in the crawl phase and then work your way up to the walk phase and if you're already in the walk phase you're ahead of 95% of people who are building AI agents today on our YouTube channel you can find tutorials ma to crawl walk and run so that you can learn alongside us as we co-create the future of AI agents together let's not give you a sense of what crawl walk and run actually look like the crawl phase is what you get out of the box you can do this with tools like gpts or custom gpts or Microsoft co-pilot and you're basically just loading in information and then you're asking the bot for questions back and forth so it's kind of like creating an AI Loop for 70% of use cases this is going to be enough you can go ahead and pair some simple actions with tools like zapier but a lot of you won't need to move past this point and that's okay you can think about this like building a website with Squarespace it'll get you up and running quickly and it'll handle of your use cases but if you want to do something more advanced more complex more personalized this isn't going to cut it and that's where you need to move onto a tool like voice flow that specializes in the walk phase and we're going to dive into that next and this is where we get into advanced agent building we're not letting an llm run the conversation instead we're using llm strategically in the conversation we're pulling information about the user and then we're layering on our business rules to determine where they should go and then based on that we're using one or multiple large language models that are best suited to the task of who that user is and what they want to accomplish you can think about this like web flow or photoshop it's a bit more time but the control you have here allows you to create something that's incredibly customized personalized and powerful and actually helps your end user achieve their goal this is where a tool like voil comes into play that may take longer to master but the end result is going to be much more bepoke and customized to what you're looking for so now for running this is the future of AI agents so it's everything we talked about in the last phase but but it's all proactive it's constantly learning dynamically changing updating its information based on what users are asking it based on what's happening in your product and proactively reaching out or taking actions on behalf of customers this is where a lot of our own experiments focus on and where you'll see videos of Nico spinning up new code repos to try out new experiments or new proof of Concepts or using voice flow with newer technology and the cool thing here is that whenever we learn we actually take that and we bring it back into the product to make it easier for you to build the future of these AI agents alongside us as we learn what that looks like so now that we've got an understanding of these three phases we're going to map out what those core components are of deciding understanding and responding so you can understand where your assistant is today and what are some of the steps you can use to start moving it up to the next level there are three concepts that are going to be really important as you build your AI agent the first one is understanding this is how your assistant understands and actually captures intent or information about what a user wants to do for example if you say my name is Daniel you want the assistant to be smart enough to recognize that Daniel is the name and to ignore the my name is part now the intent is what someone means when they say things if I say I'm looking for new pants that means that I want to purchase new products and the product I specifically want to purchase is the pant So within there I want to purchase is the intent the user wants to buy something and pants are the object or the entity that you want to be able to capture from that these are straightforward Concepts but you'll find that as your assistant grows in complexity or as it grows in volume you're going to want to start optimizing this and then a percent impr Improvement and accuracy is actually going to have meaningful results for your business now this concept doesn't necessarily change too much between the stages but it's really important to understand as it's the foundation of any good assistant for this concept we're going to be using two technologies natural language understanding or nlu which is an older form of AI and large language models which is the AI that you're familiar with today we'll link some of the research from our machine learning team below so you can learn more into how these two technologies play together and how to get the best accuracy by actually using both of them in conjunction the second one is deciding now this is the most important concept to understand it's what separates out stage one crawling from stage two of walking now this is where voice fo specializes in it's where you start to combine the business rules and logic that are unique to your business with your AI agent for example if an important customer or maybe a really strong lead is interacting with our AI agent we actually want to understand that and then treat them differently than we would any other person just like a real human would if they were interacting with you in a sales call or in a support recommendation this may mean that we use different sets of knowledge different AI models or even different API calls to give someone access to different services that we can provide in this section as a part of decide we're going to be using variables logic and large language models to determine where do we best route a user to be able to answer their queries effectively the third one is responding now in the craw phase we actually just hand this over entirely to the large language model we give them a guiding prompt and we say Hey you figure out with the history of the user's conversation and what what they're saying now how best to respond to them but in the walk phase we're being a lot more intentional with this we're combining customer information with specific prompts specific knowledge based sources and specific AI models to be able to craft a response that is really specific to this type of user based on what their intent is and what they're hoping to achieve from this conversation what's important to note here is that different AI models are actually better at different kinds of responses and the type of prompt that you craft and the type of context and information that you pass into the a AI model about the user will give wildly different responses and so this is especially useful if you're dealing with really important customers or you're handling highly sensitive matters so to get the highest quality responses you'll need to pick the best option for all of these and then iterate on them to make sure that you're providing the best response to the user so in this section we're going to use Vector databases for reg which is retrieval augmented generation and we have an explainer on how this works on the top Corner in voice Lo this is just called called the knowledge base large language models variables and prompt engineering now let's put these pieces all together in the crawl phase your system looks like this for understanding it's primarily llm driven you're passing a conversation over to a large language model and is determining how best to respond the easiest way to do this is to use out of thebox Builders you can either use chat GPT itself build a custom GPT or use one of the many types of simple chapot builders that have Arisen that are out there the next one is deciding now this is also llm driven in GPT they use something called functions to be able to do this this basically lets you describe generally when a user is saying a certain thing or wants a certain thing you're going to kick off this workflow and so a lot of tools like zapier are great for this or are building this in but here you're still providing some general instructions to the llm and you're just letting them do it again perfect for really simple examples and simple use cases maybe like you want to capture the email of a lead and you can use many again of the other tools that are out there to do this and finally in responding again this is entirely llm driven it may be informed by some of the documents that you're uploading but there's not much context being passed about the user here it's just taking what they've said to it before what it's saying to it right now and what it can find from the information that you've uploaded to be able to answer the question in voice Lo this is super easy and takes about 5 minutes to set up there are plenty other tools that do this as well out there that are also great and good to use like Microsoft copilot or custom gpts but what you want to do is start here launch to production and then start to add on complexity and get into the walk phase where voice was really valuable so consider this to be just a starting point where you want to get something up and out quickly and then start using the data to add on complexity and move from the crawl phase to the walk phase speaking of which in the walk phase your assistant looks like this so in the first stage of understanding this is driven by a large language model natural language understanding or often a combination of both of these things now in the walk phase you prioritize a high degree of accuracy if a lot of people are using your assistant a 5% difference in accuracy and understanding is going to make a huge impact and so you're going to spend a lot of time providing positive and negative examples and tweaking the prompts around your model so that I can understand your context to a high degree of accuracy we'll link to a video of what that looks like in voice flow here or below but as you work with this over time you'll start seeing your accuracy go from about 90% to 95 or 98 or even higher the next one is deciding now this is heavily influenced by your business logic you're pulling information from the user or customer using your AI agent and then based on that information you're writing them down really specific flows in your assistant to be able to provide highly personalized experiences for them based on their properties or the values that are associated with that customer your design has Logics throughout it that's checking user traits like their past purchases their email domain their plan and using that to be able to provide highly personalized experiences for them a really simple example is in our own agent too right at the beginning of that conversation we're able to pull information about the user check their plan typee and check how often they're using voice flow and what they're using in voice flow so we can provide really specific experiences this dictates what options are available for them what models we use what knowledge we use and everything else to answer their question for example in the Discord bot I showed you at the beginning when a user asks a question we're actually pulling the roles from Discord to understand if they are a developer or not and then based on that the answer that we're providing them uses different sets of knowledge to be able to provide them either a highly technical format added answer or a non-technical answer that helps them answer their question in a way that a user can actually understand the third one is responding your agent isn't just one large AI model it's actually a series of smaller more specialized AI models that are being utilized based on what the user needs the responses you're providing to a user are using specific AI models that are best suited to the task and are using specialized prompts to answer the questions with a high degree of accuracy rather than just sending a question to the knowledge base and getting a response you may actually be utilizing the knowledge base as a vector database where you send your question and you retrieve the relevant pieces of information then you may be passing that through additional filters to ensure that things like hallucinations don't exist within the information that you have and then actually incorporating that into the prompt so that the final answer you actually give to your user is very accurate and you can be sure that it doesn't contain any false information check out one of Pete's videos here for an example of how he works with larger customers to be able to handle things like hallucinations by doing what we just described you can see that there's way more optimization here to specialize for really complex answers in sensitive Industries like Finance or Healthcare so that you can be sure that the answers you're giving the user are incredibly accurate and don't contain any false information and this is really important if for any large business that's actually providing an AI answer to its users the ceiling for this is incredibly high now for Voice Low this is exactly where we focus on now it's really easy to get lost here my recommendation is to start small focus on one use case and one function launch that into production and then see how users are interacting with your assistant to slowly improve it over time in our own assistant too we started just with inapp support when we launched a production we spent time looking through user transcripts to see how users were actually interacting with our assistant and then based on that information starting to improve it and increase the complexity of our assistant over time once we were able to do support really well within the app we started expanding to different channels so we started with Toco in apppp then we brought too to Discord to be able to help moderate our Discord and answer questions like the one you saw earlier in the video and then we actually brought it to email as well to handle a lot of our first response on support and this handles a ton of the tickets that come in today before they get escalated to a human if the question is more complex we also started layering on different use cases so we went pretty broad with too support and now we're adding too to our website to handle a lot of sales queries that come in as well and act as a bit as a bdr being able to do some initial screening for users in the Run stage your assistant starts to push all of these boundaries one one of the key things here is that your assistant is able to dynamically learn and update our own assistant too actually updates a knowledge base based on how users are interacting with it and injects new information into the knowledge base every time a really complex question gets answered on Discord by other users or whenever an answer is actually answered by our support team that way it's constantly learning and evolving just like a person would as it gets new information in this stage you would also start to use highly Specialized or fine-tuned models to be able to handle certain tasks and have llm driven proactive responses based on user Behavior now much of this is still being discovered and there's not a lot of assistants or agents that are actually in this bucket today and so that's where we run a ton of experiments like I mentioned earlier with Nico and his videos and whenever we figure out something that is actually valuable or can really enhance the experience of our customers we bring that into the product to make it easier for you to be able to use a core product principle at voice flow is extensibility so you'll see that in a lot of the videos that Nico is doing when he's experimenting on vo with new technologies or new protocols or new frame Works you're actually able to do the same thing as well by leveraging some of the publicly available apis that Bool has to offer for all of our experiments we also can expose our repos so you can Fork them and build alongside us because our goal is to co-create what the future of AI agents looks like alongside you [Music]
Original Description
When mapping out your AI agent strategy, it's hard to imagine the endless possibilities of how your AI agent can scale. Daniel breaks down Voiceflow's Crawl, Walk, Run concept that helps users build advanced AI agents. He'll point out key technologies that AI agents use in each phase of their development, and how to invest in areas that will improve your agent's ability to Understand, Decide, and Respond.
Start building AI Agents with Voiceflow: https://www.voiceflow.com
0:00 Introduction to Crawl, Walk, Run
1:26 Crawl: Out-of-the-box AI Solutions
2:10 Walk: Advanced AI Agent Building
2:54 Run: The Future of AI Agents
3:29 Introduction to Understand, Decide, Respond
3:46 Understand: How Your Agent Captures Intent
4:57 Decide: How Your Agent Processes Information
5:48 Respond: How Your Agent Answers
7:04: Crawling through Understand, Decide, Respond
8:39 Walking through Understand, Decide, Respond
12:54 Running through Understand, Decide, Respond
13:49 Extensibility and the Future of AI Agents
The collaborative platform to build AI agents. Use Voiceflow to design, test, and launch chat or voice AI agents — together, faster, at scale.
Join our Discord community
👾 https://link.voiceflow.com/community
Kickstart your next project with our templates
🚀 https://www.voiceflow.com/marketplace?utm_source=youtube&utm_medium=organic
Our Links 🔗
👉 Start building today: https://www.voiceflow.com/?utm_source=youtube&utm_medium=organic
👉 Subscribe: https://bit.ly/3am22nf
👉 Twitter: https://bit.ly/2xrXZqV
👉 LinkedIn: https://www.linkedin.com/company/voiceflowhq/
👉 Publication: https://www.voiceflow.com/blog?utm_source=youtube&utm_medium=organic
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Voiceflow · Voiceflow · 17 of 60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
▶
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
The Context Window Paradox with LLMs
Voiceflow
Intercom to Voiceflow: why Nick, Head of CX @ Roam made the move #ai #customersupport #chatbot
Voiceflow
Biggest Challenge with Intercom - why Nick moved to Voiceflow #ai #chatbot #automation #intercom
Voiceflow
Save 30 HOURS a week automating your customer support #ai #customersupport #automation #voiceflow
Voiceflow
NLUs vs. LLMs - are NLUs dead? #nlu #llm #gpt #ailearning #agent #voiceflow #largelanguagemodels
Voiceflow
Build a GPT4 Vision AI Assistant #business #gptv #gpt4 #ailearning #ai #developer
Voiceflow
How does an AI model search through information?
Voiceflow
Gamechanging Zendesk app that summarizes your tickets #customersupport #ai #zendesk #voiceflow
Voiceflow
Three reasons why your business shouldn't build a custom LLM
Voiceflow
Do we still need Conversation Designers?
Voiceflow
LLMs have changed Conversation Design forever... #ai #generativeai
Voiceflow
Conversation Designer or Agent Designer? The Future of AI Automation Design #ai #generativeai
Voiceflow
What's New in Voiceflow | March Feature Releases
Voiceflow
Voiceflow AI Agency Panel: Start an AI Agency that's Built to Last
Voiceflow
9 Tips for Starting and Scaling Your AI Agency
Voiceflow
What's New in Voiceflow | April Feature Releases
Voiceflow
How to Scale Your AI Agent | Crawl, Walk, Run
Voiceflow
The most important thing Large Language Models can do
Voiceflow
Three ways to use LLMs in your company
Voiceflow
5 Conversational AI Frameworks for AI Agents
Voiceflow
Voiceflow is a Customizable AI Platform
Voiceflow
Know your AI Agency Customers
Voiceflow
The Future of AI is Custom Interfaces
Voiceflow
The Overnight AI Agency Gambit
Voiceflow
Introducing Tabular Data Support | June Feature Releases
Voiceflow
Getting Started with Voiceflow APIs
Voiceflow
An AI Coach that Drives Leads and Financial Literacy
Voiceflow
Unlocking LLM Accuracy — Let It Cook!
Voiceflow
Speed Up Your AI Agent — Make Concurrent API Calls!
Voiceflow
Save Big with Automation — Cutting Costs Effectively
Voiceflow
Multimodal Projects, LLM Entity Extraction, Cheaper Tokens, and More!
Voiceflow
Add a phone number to your AI agent on Voiceflow
Voiceflow
Top 5 Voice AI Agent Best Practices
Voiceflow
Voiceflow 2024 Recap
Voiceflow
Build Voice AI Agents with no-code in Voiceflow
Voiceflow
[NEW] Structured Prompt Outputs & Variable Pathing
Voiceflow
This AI agency's Project for a Local City Hall Drives over 11,000 Monthly Interactions #aiagency
Voiceflow
Your AI Interface is More Important than the Content | Humans Talking Agents Episode 1
Voiceflow
The Future of AI Automation Agencies | Humans Talking Agents Episode 2
Voiceflow
$1000 Voice AI Competition Kickoff
Voiceflow
How to Build a Successful AI Agency | Voiceflow Panel Event
Voiceflow
AI Models are changing the way we build AI Agents | Humans Talking Agents Episode 3
Voiceflow
Faster Training, Better Intents | RAG Intent Recognition: Explained
Voiceflow
Will voice AI kill call centers? | Humans Talking Agents Episode 4
Voiceflow
Build an AI agent in seconds — here's how.
Voiceflow
Connecting multiple agents into an Agent Network with the new Agent step
Voiceflow
How will Vibe Coding affect software? | Humans Talking Agents Episode 5
Voiceflow
Vibe coding: the end of coding as we know it
Voiceflow
Vibe coding and resolution-based pricing — what will happen to AI companies' pricing models?
Voiceflow
Grow your AI agency: How to get new customers | Voiceflow Workshop Event
Voiceflow
MCP is the key to an agentic internet | Humans Talking Agents Episode 6
Voiceflow
MCP will change agent building forever with new standards for interactions
Voiceflow
Review and improve your AI agent responses with call recording
Voiceflow
4 tips to optimize your voice AI calls in Voiceflow
Voiceflow
Launch AI agents even faster: new prompt generation feature
Voiceflow
Give your AI agents memory
Voiceflow
Can we build an AI Agent for a bank in 5 minutes?
Voiceflow
Automate customer support tickets with AI (step-by-step Voiceflow tutorial)
Voiceflow
How to add custom ElevenLabs voices to Voiceflow
Voiceflow
Can we build an AI agent for Notion in 5 minutes?
Voiceflow
More on: Agent Foundations
View skill →Related Reads
Chapters (11)
Introduction to Crawl, Walk, Run
1:26
Crawl: Out-of-the-box AI Solutions
2:10
Walk: Advanced AI Agent Building
2:54
Run: The Future of AI Agents
3:29
Introduction to Understand, Decide, Respond
3:46
Understand: How Your Agent Captures Intent
4:57
Decide: How Your Agent Processes Information
5:48
Respond: How Your Agent Answers
8:39
Walking through Understand, Decide, Respond
12:54
Running through Understand, Decide, Respond
13:49
Extensibility and the Future of AI Agents
🎓
Tutor Explanation
DeepCamp AI