SLMs, LLMs, and Model Routing in Agents | Amazon Web Services
In part 5 of this 5 part video series on Small Language Models with Arcee AI, Andrew Walko and Nolan Chen define what an Agent is and discuss how SLMs, LLMs, and Intelligent Model Routing can fit in Agentic solutions.
Learn more - https://go.aws/4laWv7r
Subscribe to AWS: https://go.aws/subscribe
Sign up for AWS: https://go.aws/signup
AWS free tier: https://go.aws/free
Explore more: https://go.aws/more
Contact AWS: https://go.aws/contact
Next steps:
Explore on AWS in Analyst Research: https://go.aws/reports
Discover, deploy, and manage software that runs on AWS: https://go.aws/marketplace
Join the AWS Partner Network: https://go.aws/partners
Learn more on how Amazon builds and operates software: https://go.aws/library
Do you have technical AWS questions?
Ask the community of experts on AWS re:Post: https://go.aws/3lPaoPb
Why AWS?
Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—use AWS to be more agile, lower costs, and innovate faster.
#AWS #AmazonWebServices #CloudComputing #SLMs #LLMs #smalllanguagemodels #generativeai #ai #agents #aiagents #strandsagents
What You'll Learn
The video discusses the role of Small Language Models (SLMs), Large Language Models (LLMs), and Intelligent Model Routing in building AI agents, with a focus on Amazon Web Services (AWS) and Arcee AI.
Full Transcript
Hello, I'm Nolan Chen, partner solutions architect at AWS. And I'm Andrew Wilco. I lead the field engineering team at RCAI. Andrew, another topic we hear a lot about these days is AI agents. Can you tell us what is an AI agent? Absolutely. Everyone has a different definition for what an AI agent is. In our definition, it's where you're able to take additional context. So this can be additional information, data, whatever it might be, and be able to supply that to a model which it can then utilize to take action upon that information. And when we really get to true agents, this can all happen autonomously. You build a system or a construct that's able to retrieve the correct information to complete the task, supply that to the model which is then able to conduct its analysis or whatever action it needs to take and then an action can be taken upon that result. Thanks Andrew. So looking at your diagram, it looks like the AI model is at the center of an AI agent. Now, in our previous videos, we talked about small language models, large language models, and model routing. Can you tell us how all those components together help us build these agents? Yeah, absolutely. And in those videos, we talked about when each one would make sense and when we should use each one. And the way to think about it is the same way that you would think about building a team or even a company. If you were putting a company together, you wouldn't have just one type of person that you went to for every request. You would have your marketing team. You would have your development team. You would have your finance team. so on and so forth. And each one of these teams is able to coordinate together, right? And you have many, many more teams that are all interacting together and able to coordinate amongst one another. Agents are the same way. You don't want to have just one model that you use for every agent, every step. And that's really where intelligent model routing, small language models, and large language models all work really well together. So you can use the domainspecific small language models that we talked about before for certain roles. You can use the general large language models for certain roles and for ones that might change where in some cases you need an SLM, some cases you need a large language model, that's when you can use the intelligent model. [Music] routing and each one of these components fits into this type of system. So you have your SLMs, your LLM and in certain cases your intelligent model routing where each one is able to work together in order to build your overarching agentic system. Okay. So I understand why sometimes you want to use a general LLM versus a domainspecific SLM. Go back going back to your agent here. When we look at this model in the middle is the model router in here and when we put model here is it really look is it were we looking at multiple possible models in here and the routers in inside the agent? Yeah, absolutely. And there's a couple of benefits you'd actually achieve from intelligent model routing if you were to put it in that model placeholder. One is overall improved accuracy and that's because you're able to use the right model for the right task. And then the second really big benefit is cost reduction because in certain cases instead of just putting a large language model for every task, you're able to utilize SLMs where it makes sense. And in fact, some of our own customers that have utilized this technique here have seen upwards of a 64% reduction in cost within their systems by utilizing SLMs and intelligent model routing within their aentic networks. Does that mean depending on the prompt that the end user sends in, the same agent could actually be running different models different every time? Absolutely. Awesome. Well, thank you, Andrew. It's a fascinating journey today, not just about SLMs, but also model routing and agent and agentic applications. Absolutely. Thank you, Nolan. And I'm excited to see what uh the industry will keep doing. This is changing day by day and we're continuing to improve. It's been great chatting with you today. Likewise. Thank you very much.
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Amazon Web Services · Amazon Web Services · 60 of 60
← Previous
Next →
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
▶
Agentic AI Design Patterns Introduction and walkthrough | Amazon Web Services
Amazon Web Services
Galileo on modernizing on banking infrastructure | Amazon Web Services
Amazon Web Services
Alliander Speeds Innovation and Energy Transition Using AWS | Amazon Web Services
Amazon Web Services
AWS and Scuderia Ferrari HP streamline F1 power unit assembly | Amazon Web Services
Amazon Web Services
How AWS machine learning supports Scuderia Ferrari HP pit stops | Amazon Web Services
Amazon Web Services
Nasdaq Builds Market Infrastructure of the Future with AWS | Amazon Web Services
Amazon Web Services
AWS Security Hub Exposure Findings | Amazon Web Services
Amazon Web Services
How do I use Session Manager port forwarding to connect to my EC2 instance through RDP?
Amazon Web Services
How do I extend an EBS volume with LVM partitions?
Amazon Web Services
AWS Graviton makes it easy to optimize performance, cost, and sustainability | Amazon Web Services
Amazon Web Services
Run Cloud Adoption Framework workshops with Miro | Amazon Web Services
Amazon Web Services
Getting Started with AWS Cost Optimization Hub | Amazon Web Services
Amazon Web Services
Why did my Amazon SQS messages get sent to a dead-letter queue?
Amazon Web Services
Declarative Policies for EC2 | Amazon Web Services
Amazon Web Services
How do I troubleshoot IAM permission issues for the Billing and Cost Management console?
Amazon Web Services
Integrity at Scale: Inside the Flo Health Mission | Amazon Web Services
Amazon Web Services
Fueling Success: Small shifts, powerful performance | Amazon Web Services
Amazon Web Services
WEX enhances customer experience with AI-powered chatbot | Amazon Web Services
Amazon Web Services
Accelerate troubleshooting with Amazon CloudWatch investigations | Amazon Web Services
Amazon Web Services
Why is my Windows WorkSpace stuck in the starting, rebooting, or stopping status?
Amazon Web Services
Telemetry Pipelines for AI | Amazon Web Services
Amazon Web Services
Getting Control over Security and Observability Data | Amazon Web Services
Amazon Web Services
The Problem with Telemetry Data Volume | Amazon Web Services
Amazon Web Services
Telemetry Pipelines on AWS | Amazon Web Services
Amazon Web Services
What are Telemetry Pipelines? | Amazon Web Services
Amazon Web Services
Using AI for RegEx on Telemetry Pipelines | Amazon Web Services
Amazon Web Services
Multi-Session Support in the AWS Console | Amazon Web Services
Amazon Web Services
How CloudHedge delivers assessment with AWS ISV Tooling Program at no cost?
Amazon Web Services
How customers speed up migration and modernization to AWS with CloudHedge | Amazon Web Services
Amazon Web Services
Chaos Experiment with Amazon ElastiCache | Amazon Web Services
Amazon Web Services
Amazon S3 Access Points: Easily manage access for shared datasets on S3 | Amazon Web Services
Amazon Web Services
ElastiCache Valkey 8.0 - Savings and Efficiency | Amazon Web Services
Amazon Web Services
Pennymac scales document processing with AWS | Amazon Web Services
Amazon Web Services
AWS | Next Level Innovation | Amazon Web Services
Amazon Web Services
Driving Cloud Innovation: Mindtickle's Partnership with AWS Enterprise Support | Amazon Web Services
Amazon Web Services
A Leader's Edge from Executive Insights | Amazon Web Services
Amazon Web Services
How do I create a custom Amazon WorkSpaces image?
Amazon Web Services
Charles Leclerc tests his AI-generated race track | Amazon Web Services
Amazon Web Services
Redington Scales India’s Cloud Access with AWS Partnership | Amazon Web Services
Amazon Web Services
How do I prevent the resources in my CloudFormation stack from getting deleted or updated?
Amazon Web Services
How do I troubleshoot authentication errors when I use RDP to connect to an EC2 Windows instance?
Amazon Web Services
Exploring the Possibilities of Digital Twin & AI at the Edge | Amazon Web Services
Amazon Web Services
Exploring the Possibilities of Digital Twin & AI at the Edge | Amazon Web Services
Amazon Web Services
AWS at the FORMULA 1 AWS GRAN PREMIO DELL'EMILIA-ROMAGNA 2025 | Amazon Web Services
Amazon Web Services
What's new in RCPs | Amazon Web Services
Amazon Web Services
API Caching using Amazon ElastiCache | Amazon Web Services
Amazon Web Services
Pendula: Amazon Nova Customer Testimonial | Amazon Web Services
Amazon Web Services
InDebted : Amazon Nova Customer Testimonial | Amazon Web Services
Amazon Web Services
Amazon DynamoDB global tables with multi-Region strong consistency | Amazon Web Services
Amazon Web Services
Siemens Mobility uses AWS to operate securely, efficiently on a global scale | Amazon Web Services
Amazon Web Services
How do I reuse a knowledge base session in Amazon Bedrock?
Amazon Web Services
EP5: MBZUAI, CMU : Causal AI, Answering The “Why“ and “What if“ Questions | AWS for AI Podcast
Amazon Web Services
Hema scales time to market developing a data mesh on AWS (Technical) - Cloud Adventures
Amazon Web Services
Hema scales time to market developing a data mesh on AWS (Business) - Cloud Adventures
Amazon Web Services
How Langfuse Scaled Their AI Platform with AWS: From Open-Source to Enterprise | Amazon Web Services
Amazon Web Services
SLMs and LLMs: What’s the Difference? | Amazon Web Services
Amazon Web Services
SLMs and LLMs: When to use them? | Amazon Web Services
Amazon Web Services
SLMs on CPU | Amazon Web Services
Amazon Web Services
Intelligent Model Routing | Amazon Web Services
Amazon Web Services
SLMs, LLMs, and Model Routing in Agents | Amazon Web Services
Amazon Web Services
More on: LLM Engineering
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
Adaptive Neuro-Symbolic Planning for wildfire evacuation logistics networks with zero-trust governance guarantees
Dev.to AI
Why Senior Engineers Are Still Crucial
Forbes Innovation
AIoT Is Becoming the Foundation of Smart Manufacturing
Dev.to AI
Germany wants Helsing to build the brain of its next air war
The Next Web AI
🎓
Tutor Explanation
DeepCamp AI