Speculative Decoding: The Easiest Way to Speed Up LLMs

Name: Speculative Decoding: The Easiest Way to Speed Up LLMs
Uploaded: 2026-02-19T21:33:30+00:00
Channel: FriendliAI
Description: N-gram speculative decoding is how you can instantly speed up your AI inference. In this video, we break down N-Gram Speculative Decoding — one of the s...

FriendliAI · Beginner ·🧠 Large Language Models ·1mo ago

N-gram speculative decoding is how you can instantly speed up your AI inference. In this video, we break down N-Gram Speculative Decoding — one of the simplest and most effective tricks to speed up large language model inference without adding extra parameters or needing a bigger GPU. If you're building with LLMs, inference speed is everything. Slow generation means bad user experience, higher costs, and wasted compute. N-Gram Speculative Decoding uses simple pattern matching to predict multiple tokens at once, letting your model skip ahead instead of generating one token at a time. If y…

Watch on YouTube ↗ (saves to browser)

Next Up

5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems

Dave Ebbelaar (LLM Eng)