Inference Characteristics of Streaming Speech Recognition

Name: Inference Characteristics of Streaming Speech Recognition
Uploaded: 2025-08-11T19:32:11+00:00
Channel: Efficient NLP
Description: Try Voice Writer - speak your thoughts and let AI handle the grammar: https://voicewriter.io Deploying streaming speech recognition (ASR) is quite diffe...

Efficient NLP · Beginner ·🧠 Large Language Models ·7mo ago

Try Voice Writer - speak your thoughts and let AI handle the grammar: https://voicewriter.io Deploying streaming speech recognition (ASR) is quite different from serving an LLM. In this video, I break down the unique challenges, architecture, and surprising behaviors of Kyutai’s Moshi streaming ASR model and its Rust inference server: GPU/CPU memory patterns and the economics of streaming vs. batch ASR. Kyutai STT offline in browser: https://github.com/lucky-bai/wasm-speech-streaming 0:00 - Introduction 1:50 - Kyutai Moshi architecture 4:03 - Short quiz 4:45 - CPU and GPU memory experiments…

Watch on YouTube ↗ (saves to browser)

Chapters (7)

Introduction

1:50 Kyutai Moshi architecture

4:03 Short quiz

4:45 CPU and GPU memory experiments

6:08 Streaming inference server

8:56 Cost calculation

10:10 Differences vs LLM inference

Next Up

5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems

Dave Ebbelaar (LLM Eng)