SLMs, LLMs, and Model Routing in Agents | Amazon Web Services

Amazon Web Services · Advanced ·🤖 AI Agents & Automation ·12mo ago

Skills: LLM Engineering90%Agent Foundations80%Multimodal LLMs70%

Key Takeaways

The video discusses the role of Small Language Models (SLMs), Large Language Models (LLMs), and Intelligent Model Routing in building AI agents, with a focus on Amazon Web Services (AWS) and Arcee AI.

Full Transcript

Hello, I'm Nolan Chen, partner solutions architect at AWS. And I'm Andrew Wilco. I lead the field engineering team at RCAI. Andrew, another topic we hear a lot about these days is AI agents. Can you tell us what is an AI agent? Absolutely. Everyone has a different definition for what an AI agent is. In our definition, it's where you're able to take additional context. So this can be additional information, data, whatever it might be, and be able to supply that to a model which it can then utilize to take action upon that information. And when we really get to true agents, this can all happen autonomously. You build a system or a construct that's able to retrieve the correct information to complete the task, supply that to the model which is then able to conduct its analysis or whatever action it needs to take and then an action can be taken upon that result. Thanks Andrew. So looking at your diagram, it looks like the AI model is at the center of an AI agent. Now, in our previous videos, we talked about small language models, large language models, and model routing. Can you tell us how all those components together help us build these agents? Yeah, absolutely. And in those videos, we talked about when each one would make sense and when we should use each one. And the way to think about it is the same way that you would think about building a team or even a company. If you were putting a company together, you wouldn't have just one type of person that you went to for every request. You would have your marketing team. You would have your development team. You would have your finance team. so on and so forth. And each one of these teams is able to coordinate together, right? And you have many, many more teams that are all interacting together and able to coordinate amongst one another. Agents are the same way. You don't want to have just one model that you use for every agent, every step. And that's really where intelligent model routing, small language models, and large language models all work really well together. So you can use the domainspecific small language models that we talked about before for certain roles. You can use the general large language models for certain roles and for ones that might change where in some cases you need an SLM, some cases you need a large language model, that's when you can use the intelligent model. [Music] routing and each one of these components fits into this type of system. So you have your SLMs, your LLM and in certain cases your intelligent model routing where each one is able to work together in order to build your overarching agentic system. Okay. So I understand why sometimes you want to use a general LLM versus a domainspecific SLM. Go back going back to your agent here. When we look at this model in the middle is the model router in here and when we put model here is it really look is it were we looking at multiple possible models in here and the routers in inside the agent? Yeah, absolutely. And there's a couple of benefits you'd actually achieve from intelligent model routing if you were to put it in that model placeholder. One is overall improved accuracy and that's because you're able to use the right model for the right task. And then the second really big benefit is cost reduction because in certain cases instead of just putting a large language model for every task, you're able to utilize SLMs where it makes sense. And in fact, some of our own customers that have utilized this technique here have seen upwards of a 64% reduction in cost within their systems by utilizing SLMs and intelligent model routing within their aentic networks. Does that mean depending on the prompt that the end user sends in, the same agent could actually be running different models different every time? Absolutely. Awesome. Well, thank you, Andrew. It's a fascinating journey today, not just about SLMs, but also model routing and agent and agentic applications. Absolutely. Thank you, Nolan. And I'm excited to see what uh the industry will keep doing. This is changing day by day and we're continuing to improve. It's been great chatting with you today. Likewise. Thank you very much.

Original Description

In part 5 of this 5 part video series on Small Language Models with Arcee AI, Andrew Walko and Nolan Chen define what an Agent is and discuss how SLMs, LLMs, and Intelligent Model Routing can fit in Agentic solutions. Learn more - https://go.aws/4laWv7r Subscribe to AWS: https://go.aws/subscribe Sign up for AWS: https://go.aws/signup AWS free tier: https://go.aws/free Explore more: https://go.aws/more Contact AWS: https://go.aws/contact Next steps: Explore on AWS in Analyst Research: https://go.aws/reports Discover, deploy, and manage software that runs on AWS: https://go.aws/marketplace Join the AWS Partner Network: https://go.aws/partners Learn more on how Amazon builds and operates software: https://go.aws/library Do you have technical AWS questions? Ask the community of experts on AWS re:Post: https://go.aws/3lPaoPb Why AWS? Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—use AWS to be more agile, lower costs, and innovate faster. #AWS #AmazonWebServices #CloudComputing #SLMs #LLMs #smalllanguagemodels #generativeai #ai #agents #aiagents #strandsagents

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Amazon Web Services · Amazon Web Services · 60 of 60

← Previous Next →

Agentic AI Design Patterns Introduction and walkthrough | Amazon Web Services

Agentic AI Design Patterns Introduction and walkthrough | Amazon Web Services

Amazon Web Services

Galileo on modernizing on banking infrastructure | Amazon Web Services

Galileo on modernizing on banking infrastructure | Amazon Web Services

Amazon Web Services

Alliander Speeds Innovation and Energy Transition Using AWS | Amazon Web Services

Alliander Speeds Innovation and Energy Transition Using AWS | Amazon Web Services

Amazon Web Services

AWS and Scuderia Ferrari HP streamline F1 power unit assembly | Amazon Web Services

AWS and Scuderia Ferrari HP streamline F1 power unit assembly | Amazon Web Services

Amazon Web Services

How AWS machine learning supports Scuderia Ferrari HP pit stops | Amazon Web Services

How AWS machine learning supports Scuderia Ferrari HP pit stops | Amazon Web Services

Amazon Web Services

Nasdaq Builds Market Infrastructure of the Future with AWS | Amazon Web Services

Nasdaq Builds Market Infrastructure of the Future with AWS | Amazon Web Services

Amazon Web Services

AWS Security Hub Exposure Findings | Amazon Web Services

AWS Security Hub Exposure Findings | Amazon Web Services

Amazon Web Services

How do I use Session Manager port forwarding to connect to my EC2 instance through RDP?

How do I use Session Manager port forwarding to connect to my EC2 instance through RDP?

Amazon Web Services

How do I extend an EBS volume with LVM partitions?

How do I extend an EBS volume with LVM partitions?

Amazon Web Services

AWS Graviton makes it easy to optimize performance, cost, and sustainability | Amazon Web Services

AWS Graviton makes it easy to optimize performance, cost, and sustainability | Amazon Web Services

Amazon Web Services

Run Cloud Adoption Framework workshops with Miro | Amazon Web Services

Run Cloud Adoption Framework workshops with Miro | Amazon Web Services

Amazon Web Services

Getting Started with AWS Cost Optimization Hub | Amazon Web Services

Getting Started with AWS Cost Optimization Hub | Amazon Web Services

Amazon Web Services

Why did my Amazon SQS messages get sent to a dead-letter queue?

Why did my Amazon SQS messages get sent to a dead-letter queue?

Amazon Web Services

Declarative Policies for EC2 | Amazon Web Services

Declarative Policies for EC2 | Amazon Web Services

Amazon Web Services

How do I troubleshoot IAM permission issues for the Billing and Cost Management console?

How do I troubleshoot IAM permission issues for the Billing and Cost Management console?

Amazon Web Services

Integrity at Scale: Inside the Flo Health Mission | Amazon Web Services

Integrity at Scale: Inside the Flo Health Mission | Amazon Web Services

Amazon Web Services

Fueling Success: Small shifts, powerful performance | Amazon Web Services

Fueling Success: Small shifts, powerful performance | Amazon Web Services

Amazon Web Services

WEX enhances customer experience with AI-powered chatbot | Amazon Web Services

WEX enhances customer experience with AI-powered chatbot | Amazon Web Services

Amazon Web Services

Accelerate troubleshooting with Amazon CloudWatch investigations | Amazon Web Services

Accelerate troubleshooting with Amazon CloudWatch investigations | Amazon Web Services

Amazon Web Services

Why is my Windows WorkSpace stuck in the starting, rebooting, or stopping status?

Why is my Windows WorkSpace stuck in the starting, rebooting, or stopping status?

Amazon Web Services

Telemetry Pipelines for AI | Amazon Web Services

Telemetry Pipelines for AI | Amazon Web Services

Amazon Web Services

Getting Control over Security and Observability Data | Amazon Web Services

Getting Control over Security and Observability Data | Amazon Web Services

Amazon Web Services

The Problem with Telemetry Data Volume | Amazon Web Services

The Problem with Telemetry Data Volume | Amazon Web Services

Amazon Web Services

Telemetry Pipelines on AWS | Amazon Web Services

Telemetry Pipelines on AWS | Amazon Web Services

Amazon Web Services

What are Telemetry Pipelines? | Amazon Web Services

What are Telemetry Pipelines? | Amazon Web Services

Amazon Web Services

Using AI for RegEx on Telemetry Pipelines | Amazon Web Services

Using AI for RegEx on Telemetry Pipelines | Amazon Web Services

Amazon Web Services

Multi-Session Support in the AWS Console | Amazon Web Services

Multi-Session Support in the AWS Console | Amazon Web Services

Amazon Web Services

How CloudHedge delivers assessment with AWS ISV Tooling Program at no cost?

How CloudHedge delivers assessment with AWS ISV Tooling Program at no cost?

Amazon Web Services

How customers speed up migration and modernization to AWS with CloudHedge | Amazon Web Services

How customers speed up migration and modernization to AWS with CloudHedge | Amazon Web Services

Amazon Web Services

Chaos Experiment with Amazon ElastiCache | Amazon Web Services

Chaos Experiment with Amazon ElastiCache | Amazon Web Services

Amazon Web Services

Amazon S3 Access Points: Easily manage access for shared datasets on S3 | Amazon Web Services

Amazon S3 Access Points: Easily manage access for shared datasets on S3 | Amazon Web Services

Amazon Web Services

ElastiCache Valkey 8.0 - Savings and Efficiency | Amazon Web Services

ElastiCache Valkey 8.0 - Savings and Efficiency | Amazon Web Services

Amazon Web Services

Pennymac scales document processing with AWS | Amazon Web Services

Pennymac scales document processing with AWS | Amazon Web Services

Amazon Web Services

AWS | Next Level Innovation | Amazon Web Services

AWS | Next Level Innovation | Amazon Web Services

Amazon Web Services

Driving Cloud Innovation: Mindtickle's Partnership with AWS Enterprise Support | Amazon Web Services

Driving Cloud Innovation: Mindtickle's Partnership with AWS Enterprise Support | Amazon Web Services

Amazon Web Services

A Leader's Edge from Executive Insights | Amazon Web Services

A Leader's Edge from Executive Insights | Amazon Web Services

Amazon Web Services

How do I create a custom Amazon WorkSpaces image?

How do I create a custom Amazon WorkSpaces image?

Amazon Web Services

Charles Leclerc tests his AI-generated race track | Amazon Web Services

Charles Leclerc tests his AI-generated race track | Amazon Web Services

Amazon Web Services

Redington Scales India’s Cloud Access with AWS Partnership | Amazon Web Services

Redington Scales India’s Cloud Access with AWS Partnership | Amazon Web Services

Amazon Web Services

How do I prevent the resources in my CloudFormation stack from getting deleted or updated?

How do I prevent the resources in my CloudFormation stack from getting deleted or updated?

Amazon Web Services

How do I troubleshoot authentication errors when I use RDP to connect to an EC2 Windows instance?

How do I troubleshoot authentication errors when I use RDP to connect to an EC2 Windows instance?

Amazon Web Services

Exploring the Possibilities of Digital Twin & AI at the Edge | Amazon Web Services

Exploring the Possibilities of Digital Twin & AI at the Edge | Amazon Web Services

Amazon Web Services

Exploring the Possibilities of Digital Twin & AI at the Edge | Amazon Web Services

Exploring the Possibilities of Digital Twin & AI at the Edge | Amazon Web Services

Amazon Web Services

AWS at the FORMULA 1 AWS GRAN PREMIO DELL'EMILIA-ROMAGNA 2025 | Amazon Web Services

AWS at the FORMULA 1 AWS GRAN PREMIO DELL'EMILIA-ROMAGNA 2025 | Amazon Web Services

Amazon Web Services

What's new in RCPs | Amazon Web Services

What's new in RCPs | Amazon Web Services

Amazon Web Services

API Caching using Amazon ElastiCache | Amazon Web Services

API Caching using Amazon ElastiCache | Amazon Web Services

Amazon Web Services

Pendula: Amazon Nova Customer Testimonial | Amazon Web Services

Pendula: Amazon Nova Customer Testimonial | Amazon Web Services

Amazon Web Services

InDebted : Amazon Nova Customer Testimonial | Amazon Web Services

InDebted : Amazon Nova Customer Testimonial | Amazon Web Services

Amazon Web Services

Amazon DynamoDB global tables with multi-Region strong consistency | Amazon Web Services

Amazon DynamoDB global tables with multi-Region strong consistency | Amazon Web Services

Amazon Web Services

Siemens Mobility uses AWS to operate securely, efficiently on a global scale | Amazon Web Services

Siemens Mobility uses AWS to operate securely, efficiently on a global scale | Amazon Web Services

Amazon Web Services

How do I reuse a knowledge base session in Amazon Bedrock?

How do I reuse a knowledge base session in Amazon Bedrock?

Amazon Web Services

EP5: MBZUAI, CMU : Causal AI, Answering The “Why“ and “What if“ Questions | AWS for AI Podcast

EP5: MBZUAI, CMU : Causal AI, Answering The “Why“ and “What if“ Questions | AWS for AI Podcast

Amazon Web Services

Hema scales time to market developing a data mesh on AWS (Technical) - Cloud Adventures

Hema scales time to market developing a data mesh on AWS (Technical) - Cloud Adventures

Amazon Web Services

Hema scales time to market developing a data mesh on AWS (Business) - Cloud Adventures

Hema scales time to market developing a data mesh on AWS (Business) - Cloud Adventures

Amazon Web Services

How Langfuse Scaled Their AI Platform with AWS: From Open-Source to Enterprise | Amazon Web Services

How Langfuse Scaled Their AI Platform with AWS: From Open-Source to Enterprise | Amazon Web Services

Amazon Web Services

SLMs and LLMs: What’s the Difference? | Amazon Web Services

SLMs and LLMs: What’s the Difference? | Amazon Web Services

Amazon Web Services

SLMs and LLMs: When to use them? | Amazon Web Services

SLMs and LLMs: When to use them? | Amazon Web Services

Amazon Web Services

SLMs on CPU | Amazon Web Services

SLMs on CPU | Amazon Web Services

Amazon Web Services

Intelligent Model Routing | Amazon Web Services

Intelligent Model Routing | Amazon Web Services

Amazon Web Services

SLMs, LLMs, and Model Routing in Agents | Amazon Web Services

SLMs, LLMs, and Model Routing in Agents | Amazon Web Services

Amazon Web Services

The video explains how SLMs, LLMs, and Intelligent Model Routing can be used to build AI agents, which can autonomously retrieve information, supply it to a model, and take action based on the results. The discussion highlights the benefits of using multiple models and intelligent routing, including improved accuracy and cost reduction.

Key Takeaways

Define the role of AI agents in autonomous systems
Understand the difference between SLMs and LLMs
Implement Intelligent Model Routing in AI agents
Utilize SLMs and LLMs in AI agents
Design autonomous systems with multiple models and intelligent routing

💡 Intelligent Model Routing can improve accuracy and reduce costs by utilizing the right model for the right task, and can lead to significant cost savings, such as a 64% reduction in cost.

🔒 Pro feature: Ask AI to explain this lesson →

More on: LLM Engineering

View skill →

Build an LLM and RAG-based Chat Application using AlloyDB and LangChain

FULLY LOCAL Mistral AI PDF Processing [Hands-on Tutorial]

FULLY LOCAL Mistral AI PDF Processing [Hands-on Tutorial]

Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation

Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation

Ultimate Guide: Deploy Google ADK Agents to Vertex AI & Cloud Run (Step-by-Step Tutorial)

Ultimate Guide: Deploy Google ADK Agents to Vertex AI & Cloud Run (Step-by-Step Tutorial)

Shane | LLM Implementation

How to Make an Asteroids Game Bot (LIVE)

How to Make an Asteroids Game Bot (LIVE)

Using Claude Code + Nano Banana Pro To Create a Dataset of Engineering Drawings

Using Claude Code + Nano Banana Pro To Create a Dataset of Engineering Drawings

Automata Learning Lab

Related Reads

Navigating Claude Code: Subagents Done Right

Learn to navigate Claude Code subagents for efficient agentic pipelines

What’s the best way to trace AI agent decisions and ensure auditability in 2026?

Learn to ensure auditability of AI agent decisions by tracing and explaining their actions, a crucial skill for 2026

What’s the best way to trace AI agent decisions and ensure auditability in 2026?

Learn to ensure auditability of AI agent decisions by tracing their thought process and understanding the context behind their actions

Medium · Machine Learning

What’s the best way to trace AI agent decisions and ensure auditability in 2026?

Learn to trace AI agent decisions for auditability and transparency in 2026

DEXPI + AI - The Future of Industrial Automation

ARC Advisory Group