Faster LLM Function Calling — Dynamic Routes

James Briggs · Intermediate ·🧠 Large Language Models ·2y ago

Skills: LLM Engineering90%Tool Use & Function Calling80%Prompt Craft60%

Key Takeaways

This video demonstrates how to use Semantic Router's dynamic routes to speed up LLM function calling, particularly for AI agents, using OpenAI's GPT-3.5 Turbo as an example, but also supporting Cohere and Llama.cpp for local deployment.

Full Transcript

today we're going to take a look at Dynamic routes in the semantic router library now Dynamic routes expand what we can do with this Library by quite a lot unlike a static route a dynamic route is able to dynamically generate the parameters based on a particular input that can then be taken into whatever you want to do with those parameters so the main use case here is function calling now the good thing with Dynamic routes is that they can generate this output but they're still very fast just like our static routes so they are fundamentally the same object and I think what would be best is to just take a look at how they differ which is not by a huge amount okay so we're going to start in the docks of the semantic router Library I'm going to go over to Dynamic routes and I'm just going to open that notebook in collab it's now on version 0.050 this is actually no longer necessary so I need to remove that so I'm going to insall the library first then I'm going to come down to here and I'm going to initialize a static route now these are just basic static routes and the reason we're loading those first is because we want to see what the difference is between these and a dynamic route so yes we initialize those and then we're going to initialize our route layer now the initialization of a route layer whether you have dynamic or static routes or both is exactly the same it doesn't change and again we can use go here we can use open AI there's also now a new Fast embed encoder as well if you want to run the embedding part locally I'm going to use open AI because we will also want to use the open AI llm as well so API key enter this and there we go okay we do also support that go here llm as well and soon enough you will also have local llms but for now I'm just going to use open AI it's the it's the easiest okay so we can test that it's working and this is purely static routs let's see how we might create a dynamic route so here is how we would set up our Dynamic route you don't need to do it like this directly you can actually soorry this is the actual definition of our Dynamic route what I'm doing before here is creating this the function schema that is required for our Dynamic route so the function schema I can just show you what that looks like maybe quickly so if I run this it looks like this this is our function schema now what I'm doing here with the get schema function here is I am taking an existing function and I'm formatting it in a particular way so we're using the Sphinx do string format here we're adding a lot of description as to how exactly this function should actually be used so finds the current time and a specific time zone okay that's like the description okay what does this function actually do we need this for our Dynamic gr to understand you know what this does and how it should be used then we specify okay we have our time zone the type of our time zone is a string and the description for it is this okay so the time zone to find the current timing it should be a valid time zone from the I time zone database and then we give some examples and then we specify do not put the place name like Rome or New York you must provide this particular IA format so we do that that is then going to okay we provide this format and it's going to give us the time in that particular place now we run that function get time that we just created here we put that through our get schema function here we get our function schema output and then this is what defines the difference between a static route and a dynamic route we simply pass in this function schema to our to our route definition so if I remove that this is now a static route if I add that back in it's a dynamic route okay and that's all there is to it so we have our new Dynamic route I'm going to add it to our route layer and then I'm going to ask a Time related question okay and that should trigger the time or the get time Dynamic route and it should hopefully get the right inputs for that route okay and we see that it does so we have function call and we have these inputs for our function so then I can connect this up to the function that we created so say out equals this and I want to say get time and it's out and it's the function call and then and I see that's the this okay let's see what we get okay six 16 and basically you can expect this to work with any function C that you'd expect an LM to normally be able to handle because we're using an LM here so what we're really doing is we're setting up that kind of like agentic workflow where an agent agent will decide what to do and then generate the input for whatever it decides to do we're taking away the decision part on you know which route to take or which tool to use and we're using semantics to make that decision but we still rely on the llm to generate the function call itself which you know we've seen it it does and it does pretty quickly now that's it for this video I hope this has been useful and interesting so thank you very much for watching and I will see you again in the next one bye

Original Description

LLM function calling can be slow, particularly for AI agents. Using Semantic Router's dynamic routes, we can make this much faster and scale to thousands of tools and functions. Here we see how to use it with OpenAI's GPT-3.5 Turbo, but the library also supports Cohere and Llama.cpp for local deployments. In semantic router there are two types of routes that can be chosen. Both routes belong to the Route object, the only difference between them is that static routes return a Route.name when chosen, whereas dynamic routes use an LLM call to produce parameter input values. For example, a static route will tell us if a query is talking about mathematics by returning the route name (which could be "math" for example). A dynamic route can generate additional values, so it may decide a query is talking about maths, but it can also generate Python code that we can later execute to answer the user's query, this output may look like "math", "import math; output = math.sqrt(64). ⭐ GitHub Repo: https://github.com/aurelio-labs/semantic-router/ 📌 Code: https://github.com/aurelio-labs/semantic-router/blob/main/docs/02-dynamic-routes.ipynb 🔥 Semantic Router Course: https://www.aurelio.ai/course/semantic-router 👋🏼 AI Consulting: https://aurelio.ai 👾 Discord: https://discord.gg/c5QtDB9RAP Twitter: https://twitter.com/jamescalam LinkedIn: https://www.linkedin.com/in/jamescalam/ 00:00 Fast LLM Function Calling 00:56 Semantic Router Setup for LLMs 02:20 Function Calling Schema 04:04 Dynamic Routes for Function Calling 05:51 How we can use Faster Agents

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from James Briggs · James Briggs · 0 of 60

← Previous Next →

Stoic Philosophy Text Generation with TensorFlow

Stoic Philosophy Text Generation with TensorFlow

How to Build TensorFlow Pipelines with tf.data.Dataset

How to Build TensorFlow Pipelines with tf.data.Dataset

Every New Feature in Python 3.10.0a2

Every New Feature in Python 3.10.0a2

How-to Build a Transformer for Language Classification in TensorFlow

How-to Build a Transformer for Language Classification in TensorFlow

How-to use the Kaggle API in Python

How-to use the Kaggle API in Python

Language Generation with OpenAI's GPT-2 in Python

Language Generation with OpenAI's GPT-2 in Python

Text Summarization with Google AI's T5 in Python

Text Summarization with Google AI's T5 in Python

How-to do Sentiment Analysis with Flair in Python

How-to do Sentiment Analysis with Flair in Python

Python Environment Setup for Machine Learning

Python Environment Setup for Machine Learning

Sequential Model - TensorFlow Essentials #1

Sequential Model - TensorFlow Essentials #1

Functional API - TensorFlow Essentials #2

Functional API - TensorFlow Essentials #2

Training Parameters - TensorFlow Essentials #3

Training Parameters - TensorFlow Essentials #3

Input Data Pipelines - TensorFlow Essentials #4

Input Data Pipelines - TensorFlow Essentials #4

6 of Python's Newest and Best Features (3.7-3.9)

6 of Python's Newest and Best Features (3.7-3.9)

Novice to Advanced RegEx in Less-than 30 Minutes + Python

Novice to Advanced RegEx in Less-than 30 Minutes + Python

Building a PlotLy $GME Chart in Python

Building a PlotLy $GME Chart in Python

How-to Use The Reddit API in Python

How-to Use The Reddit API in Python

How to Build Custom Q&A Transformer Models in Python

How to Build Custom Q&A Transformer Models in Python

How to Build Q&A Models in Python (Transformers)

How to Build Q&A Models in Python (Transformers)

How-to Decode Outputs From NLP Models (Python)

How-to Decode Outputs From NLP Models (Python)

Identify Stocks on Reddit with SpaCy (NER in Python)

Identify Stocks on Reddit with SpaCy (NER in Python)

Sentiment Analysis on ANY Length of Text With Transformers (Python)

Sentiment Analysis on ANY Length of Text With Transformers (Python)

Unicode Normalization for NLP in Python

Unicode Normalization for NLP in Python

The NEW Match-Case Statement in Python 3.10

The NEW Match-Case Statement in Python 3.10

Multi-Class Language Classification With BERT in TensorFlow

Multi-Class Language Classification With BERT in TensorFlow

How to Build Python Packages for Pip

How to Build Python Packages for Pip

How-to Structure a Q&A ML App

How-to Structure a Q&A ML App

How to Index Q&A Data With Haystack and Elasticsearch

How to Index Q&A Data With Haystack and Elasticsearch

Q&A Document Retrieval With DPR

Q&A Document Retrieval With DPR

How to Use Type Annotations in Python

How to Use Type Annotations in Python

Extractive Q&A With Haystack and FastAPI in Python

Extractive Q&A With Haystack and FastAPI in Python

Sentence Similarity With Sentence-Transformers in Python

Sentence Similarity With Sentence-Transformers in Python

Sentence Similarity With Transformers and PyTorch (Python)

Sentence Similarity With Transformers and PyTorch (Python)

NER With Transformers and spaCy (Python)

NER With Transformers and spaCy (Python)

Training BERT #1 - Masked-Language Modeling (MLM)

Training BERT #1 - Masked-Language Modeling (MLM)

Training BERT #2 - Train With Masked-Language Modeling (MLM)

Training BERT #2 - Train With Masked-Language Modeling (MLM)

Training BERT #3 - Next Sentence Prediction (NSP)

Training BERT #3 - Next Sentence Prediction (NSP)

Training BERT #4 - Train With Next Sentence Prediction (NSP)

Training BERT #4 - Train With Next Sentence Prediction (NSP)

FREE 11 Hour NLP Transformers Course (Next 3 Days Only)

FREE 11 Hour NLP Transformers Course (Next 3 Days Only)

New Features in Python 3.10

New Features in Python 3.10

Training BERT #5 - Training With BertForPretraining

Training BERT #5 - Training With BertForPretraining

How-to Use HuggingFace's Datasets - Transformers From Scratch #1

How-to Use HuggingFace's Datasets - Transformers From Scratch #1

Build a Custom Transformer Tokenizer - Transformers From Scratch #2

Build a Custom Transformer Tokenizer - Transformers From Scratch #2

3 Traditional Methods for Similarity Search (Jaccard, w-shingling, Levenshtein)

3 Traditional Methods for Similarity Search (Jaccard, w-shingling, Levenshtein)

3 Vector-based Methods for Similarity Search (TF-IDF, BM25, SBERT)

3 Vector-based Methods for Similarity Search (TF-IDF, BM25, SBERT)

Building MLM Training Input Pipeline - Transformers From Scratch #3

Building MLM Training Input Pipeline - Transformers From Scratch #3

Training and Testing an Italian BERT - Transformers From Scratch #4

Training and Testing an Italian BERT - Transformers From Scratch #4

Faiss - Introduction to Similarity Search

Faiss - Introduction to Similarity Search

Angular App Setup With Material - Stoic Q&A #5

Angular App Setup With Material - Stoic Q&A #5

Why are there so many Tokenization methods in HF Transformers?

Why are there so many Tokenization methods in HF Transformers?

Choosing Indexes for Similarity Search (Faiss in Python)

Choosing Indexes for Similarity Search (Faiss in Python)

Locality Sensitive Hashing (LSH) for Search with Shingling + MinHashing (Python)

Locality Sensitive Hashing (LSH) for Search with Shingling + MinHashing (Python)

How LSH Random Projection works in search (+Python)

How LSH Random Projection works in search (+Python)

IndexLSH for Fast Similarity Search in Faiss

IndexLSH for Fast Similarity Search in Faiss

Faiss - Vector Compression with PQ and IVFPQ (in Python)

Faiss - Vector Compression with PQ and IVFPQ (in Python)

Product Quantization for Vector Similarity Search (+ Python)

Product Quantization for Vector Similarity Search (+ Python)

How to Build a Bert WordPiece Tokenizer in Python and HuggingFace

How to Build a Bert WordPiece Tokenizer in Python and HuggingFace

Metadata Filtering for Vector Search + Latest Filter Tech

Metadata Filtering for Vector Search + Latest Filter Tech

Build NLP Pipelines with HuggingFace Datasets

Build NLP Pipelines with HuggingFace Datasets

Composite Indexes and the Faiss Index Factory

Composite Indexes and the Faiss Index Factory

This video shows how to use Semantic Router's dynamic routes to speed up LLM function calling, using OpenAI's GPT-3.5 Turbo as an example, and demonstrates how to create function schemas and deploy LLMs with dynamic routes.

Key Takeaways

Install Semantic Router Library
Initialize Static Routes
Create Function Schema using Sphinx do string format
Define Dynamic Route using Function Schema
Add Dynamic Route to Route Layer
Test Dynamic Route with Time-related Question
Connect Dynamic Route to Function Call

💡 Dynamic routes can speed up LLM function calling by generating parameters based on input, allowing for faster and more scalable deployment of LLMs.

🔒 Pro feature: Ask AI to explain this lesson →

More on: LLM Engineering

View skill →

Build an LLM and RAG-based Chat Application using AlloyDB and LangChain

FULLY LOCAL Mistral AI PDF Processing [Hands-on Tutorial]

FULLY LOCAL Mistral AI PDF Processing [Hands-on Tutorial]

Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation

Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation

Ultimate Guide: Deploy Google ADK Agents to Vertex AI & Cloud Run (Step-by-Step Tutorial)

Ultimate Guide: Deploy Google ADK Agents to Vertex AI & Cloud Run (Step-by-Step Tutorial)

Shane | LLM Implementation

How to Make an Asteroids Game Bot (LIVE)

How to Make an Asteroids Game Bot (LIVE)

Using Claude Code + Nano Banana Pro To Create a Dataset of Engineering Drawings

Using Claude Code + Nano Banana Pro To Create a Dataset of Engineering Drawings

Automata Learning Lab

Related AI Lessons

Building HITL Feedback RAG: Embeddings, Retrieval, and Reranking

Learn to build a Human-in-the-Loop (HITL) Feedback RAG system using embeddings, retrieval, and reranking to improve model performance

Building HITL Feedback RAG: Embeddings, Retrieval, and Reranking

Learn to build a Human-in-the-Loop (HITL) Feedback RAG system using embeddings, retrieval, and reranking to improve LLM performance

The 2026 AI Model Release Race: Every Major LLM Launch You Need to Know

Stay updated on the 2026 AI model release race, including major LLM launches like Claude Sonnet 5 and GPT-5.6, to leverage the latest advancements in AI technology

Call GPT, Claude, and Gemini from one API key — a 3-step setup

Access GPT, Claude, and Gemini through one API key with a 3-step setup using Modelishub

Chapters (5)

Fast LLM Function Calling

0:56 Semantic Router Setup for LLMs

2:20 Function Calling Schema

4:04 Dynamic Routes for Function Calling

5:51 How we can use Faster Agents

5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems

Dave Ebbelaar (LLM Eng)